Training loops · AI / ML · Code with Animation

What is a training loop?

The training loop is the cycle that teaches a network: feed a batch forward, compute the loss, backpropagate the gradients, and update the weights — repeated over the dataset for many epochs. Around it sits validation, checkpointing, and the monitoring that tells you whether it is working.

Why it matters

Everything in deep learning happens inside this loop, and most training problems show up here: a loss that will not drop, that explodes, or that overfits. Understanding each part — and watching the right signals — is what lets you train models that actually converge instead of staring at a flat loss curve.

What to learn

The epoch and batch structure
Forward pass, loss, backward, optimizer step
Learning rate and its outsized effect
Train versus validation loss curves
Checkpointing the best model
Learning rate schedules
Reading the loss curve to diagnose problems

Common pitfall

Not watching validation loss while training. Training loss can keep dropping while the model overfits and validation loss climbs — but you only see that if you measure it. Track both curves every epoch, and stop or checkpoint when validation loss stops improving, not when training loss does.

Resources

Primary (free):

PyTorch — Optimization loop · docs
Andrej Karpathy — Recipe for training neural nets · article
Google — Reducing loss · docs

Practice

Write a full training loop for a small model: iterate epochs and batches, compute and log both training and validation loss each epoch, and checkpoint the best validation model. Plot the two loss curves. Done when you can read the curves and say whether the model is underfitting, overfitting, or training well.

Outcomes

Write a complete epoch-and-batch training loop.
Tune the learning rate and use a schedule.
Track training and validation loss together.
Diagnose training problems from the loss curves.