What is regularization?
Regularization is the set of techniques that stop a model from memorizing the training data and help it generalize to new data. Dropout randomly disables units during training, weight decay penalizes large weights, and early stopping halts before the model starts overfitting.
Why it matters
A deep network has enough capacity to memorize its training set perfectly and fail on everything else. Regularization is what makes the difference between a model that works on real inputs and one that only works on what it has seen. It is essential to training useful deep models.
What to learn
- Overfitting in deep networks and why capacity invites it
- Dropout and how it forces robustness
- Weight decay (L2 regularization)
- Early stopping on validation loss
- Data augmentation as regularization
- Batch normalization's regularizing effect
- Reading the train-validation gap
Common pitfall
Leaving dropout active during evaluation, or adding so much regularization that the model can no longer fit the data — underfitting. Dropout must be turned off at inference (frameworks do this when you switch to eval mode), and regularization is a dial: enough to close the train-validation gap, not so much that both losses stay high.
Resources
Primary (free):
- Google — Regularization · docs
- PyTorch — Dropout · docs
- Deep Learning Book — Regularization · docs
Practice
Take an overfitting model — training loss far below validation loss — and add regularization: dropout, weight decay, and early stopping. Watch the train-validation gap shrink. Then over-regularize on purpose and watch it underfit. Done when you can tune regularization to close the gap without crushing performance.
Outcomes
- Recognize overfitting from the train-validation gap.
- Apply dropout, weight decay, and early stopping.
- Switch dropout off correctly at inference.
- Tune regularization between over- and underfitting.