What is model evaluation?
Model evaluation is how you measure whether a model is actually good, using the right metric and an honest test setup. It is the discipline of not fooling yourself — making sure the number you report reflects real-world performance, not a leak or a lucky split.
Why it matters
A model is only as trustworthy as its evaluation. The wrong metric or a subtle data leak produces a great score and a useless model that fails in production. Rigorous evaluation is what separates real ML work from demos, and it is heavily probed in interviews.
What to learn
- Train, validation, and test splits
- Cross-validation
- Classification metrics: precision, recall, F1, ROC-AUC
- Regression metrics: MAE, RMSE, R-squared
- The accuracy trap on imbalanced data
- Data leakage and how it inflates scores
- The confusion matrix
Common pitfall
Reporting accuracy on imbalanced data. If 99% of cases are negative, a model that always predicts "negative" is 99% accurate and completely worthless. On imbalanced problems use precision, recall, and F1, and look at the confusion matrix, because a single accuracy number hides exactly the failures that matter.
Resources
Primary (free):
- scikit-learn — Model evaluation · docs
- Google — Classification metrics · docs
- StatQuest — ROC and AUC · video
Practice
Train a classifier on an imbalanced dataset. Report its accuracy, then its precision, recall, F1, and confusion matrix. Notice how accuracy looks good while recall reveals the model misses the rare class. Done when you can explain why accuracy was misleading here.
Outcomes
- Split data into train, validation, and test correctly.
- Choose metrics that fit the task and class balance.
- Read a confusion matrix and ROC curve.
- Spot and prevent data leakage.