Train-test split & Cross-Validation
Train-test split : Let's say we have a dataset with 100 samples. Now, if we train the model on all 100 samples and test the results from this 100 samples only, then even if the data is overfitted, we won't know! So, to avoid this, we use train-test split In train-test split, we divide the data into 2 parts: 1) Training Set : We use 70-80% data as a training set. So, for above example we won't train all 100 samples but we will train the model on 80 samples. 2) Test set : The test set usually contains the remaining 20-30% of the data. So, for above example we will test model for the rest 20 samples. The test set is used to evaluate how well the model generalizes to new, unseen data . The performance metrics (such as accuracy, precision, recall, etc.) are calculated based on the model's predictions on the test set . Cross-validation : Divides the dataset into multiple folds. Some of the folds as training set & some of the folds as testing set. This process is repeate...