Regularization · AI / ML · Code with Animation

What is regularization?

Regularization is the set of techniques that stop a model from memorizing the training data and help it generalize to new data. Dropout randomly disables units during training, weight decay penalizes large weights, and early stopping halts before the model starts overfitting.

Why it matters

A deep network has enough capacity to memorize its training set perfectly and fail on everything else. Regularization is what makes the difference between a model that works on real inputs and one that only works on what it has seen. It is essential to training useful deep models.

What to learn

Overfitting in deep networks and why capacity invites it
Dropout and how it forces robustness
Weight decay (L2 regularization)
Early stopping on validation loss
Data augmentation as regularization
Batch normalization's regularizing effect
Reading the train-validation gap

Common pitfall

Leaving dropout active during evaluation, or adding so much regularization that the model can no longer fit the data — underfitting. Dropout must be turned off at inference (frameworks do this when you switch to eval mode), and regularization is a dial: enough to close the train-validation gap, not so much that both losses stay high.

Resources

Primary (free):

Google — Regularization · docs
PyTorch — Dropout · docs
Deep Learning Book — Regularization · docs

Practice

Take an overfitting model — training loss far below validation loss — and add regularization: dropout, weight decay, and early stopping. Watch the train-validation gap shrink. Then over-regularize on purpose and watch it underfit. Done when you can tune regularization to close the gap without crushing performance.

Outcomes

Recognize overfitting from the train-validation gap.
Apply dropout, weight decay, and early stopping.
Switch dropout off correctly at inference.
Tune regularization between over- and underfitting.