Overfitting refers to a model that models the training data too well. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data.
An overfitted model uses more of the noise, which increases its performance in the case of known noise (training data) and decreases its performance in the case of novel noise (test data).
Cross-validation is a powerful preventative measure against overfitting.
The idea is clever .i.e. use your initial training data to generate multiple mini train-test splits. Use these splits to tune your model.
Train with more data
It won’t work every time, but training with more data can help algorithms detect the signal better. In the earlier example of modeling height vs. age in children, it’s clear how sampling more schools will help your model.
Regularization refers to a broad range of techniques for artificially forcing your model to be simpler. The method will depend on the type of learner you’re using. For example, you could prune a decision tree, use dropout on a neural network, or add a penalty parameter to the cost function in regression.
When you’re training an algorithm iteratively you can measure how well each iteration of the model performs.
Up until a certain number of iterations, new iterations improve the model. After that point, however, the model’s ability to generalize can weaken as it begins to overfit the training data.
Contact Us at:
firstname.lastname@example.org to get any other help in machine learning.