Deep learningAdvanced8h

Training loops.

Loss, optimizers, batches, and the training cycle.

What is a training loop?

The training loop is the cycle that teaches a network: feed a batch forward, compute the loss, backpropagate the gradients, and update the weights — repeated over the dataset for many epochs. Around it sits validation, checkpointing, and the monitoring that tells you whether it is working.

Why it matters

Everything in deep learning happens inside this loop, and most training problems show up here: a loss that will not drop, that explodes, or that overfits. Understanding each part — and watching the right signals — is what lets you train models that actually converge instead of staring at a flat loss curve.

What to learn

  • The epoch and batch structure
  • Forward pass, loss, backward, optimizer step
  • Learning rate and its outsized effect
  • Train versus validation loss curves
  • Checkpointing the best model
  • Learning rate schedules
  • Reading the loss curve to diagnose problems

Common pitfall

Not watching validation loss while training. Training loss can keep dropping while the model overfits and validation loss climbs — but you only see that if you measure it. Track both curves every epoch, and stop or checkpoint when validation loss stops improving, not when training loss does.

Resources

Primary (free):

Practice

Write a full training loop for a small model: iterate epochs and batches, compute and log both training and validation loss each epoch, and checkpoint the best validation model. Plot the two loss curves. Done when you can read the curves and say whether the model is underfitting, overfitting, or training well.

Outcomes

  • Write a complete epoch-and-batch training loop.
  • Tune the learning rate and use a schedule.
  • Track training and validation loss together.
  • Diagnose training problems from the loss curves.
Back to AI / ML roadmap