Classical MLIntermediate8h

Feature engineering.

Turning raw data into signal a model can use.

What is feature engineering?

Feature engineering is transforming raw data into inputs that expose the signal to a model: scaling numbers, encoding categories, creating ratios, extracting parts of a date. For classical ML, it is often where most of the performance gains actually come from.

Why it matters

A simple model with great features usually beats a complex model with raw data. Feature engineering is where domain knowledge meets ML, and it is a large part of the practical work. Even in the deep learning era, classical ML problems live or die on their features.

What to learn

  • Scaling and normalization, and why models need them
  • Encoding categorical variables
  • Handling dates, text, and missing values as features
  • Creating interaction and ratio features
  • Binning and transformations
  • Feature selection and removing noise
  • Fitting transforms on train only, then applying to test

Common pitfall

Fitting a scaler or encoder on the full dataset before splitting, so information from the test set leaks into training. The scaler "sees" the test data's statistics. Fit every transformation on the training set only, then apply it to validation and test — usually with a pipeline that enforces this automatically.

Resources

Primary (free):

Practice

Take a dataset with numeric and categorical columns. Build a pipeline that scales the numbers and encodes the categories, fit it on the training split only, and apply it to the test split. Add one engineered feature and check whether it helps. Done when no transform is fit on test data.

Outcomes

  • Scale numbers and encode categories appropriately.
  • Engineer new features from raw columns.
  • Select features and drop noise.
  • Fit transforms on train only to prevent leakage.
Back to AI / ML roadmap