Classical MLIntermediate8h

Unsupervised learning.

Clustering and dimensionality reduction without labels.

What is unsupervised learning?

Unsupervised learning finds structure in data that has no labels. Clustering groups similar items; dimensionality reduction compresses many features into a few while keeping the important variation. There is no "right answer" to train against — the model discovers patterns on its own.

Why it matters

Most real-world data is unlabeled, and labeling is expensive. Unsupervised methods let you segment customers, detect anomalies, and explore data before you have labels. Dimensionality reduction is also key to visualizing and understanding high-dimensional data, including embeddings later in the track.

What to learn

  • Clustering: k-means and its assumptions
  • Choosing the number of clusters
  • Hierarchical clustering
  • Dimensionality reduction: PCA
  • Visualization with t-SNE or UMAP
  • Anomaly detection
  • Evaluating results without ground-truth labels

Common pitfall

Trusting clusters as if they were objective truth. Clustering will always return groups, even in random data, and the result depends heavily on your choices — the number of clusters, the distance metric, the scaling. Validate that clusters are meaningful and stable rather than assuming the algorithm found real structure.

Resources

Primary (free):

Practice

Take an unlabeled dataset, scale the features, and run k-means with a few different cluster counts. Use PCA to reduce it to two dimensions and plot the clusters. Judge whether the groups look meaningful. Done when you can argue whether the clustering found real structure or just split noise.

Outcomes

  • Cluster data and choose a sensible number of groups.
  • Reduce dimensionality with PCA and visualize it.
  • Detect anomalies in unlabeled data.
  • Question whether discovered clusters are actually meaningful.
Back to AI / ML roadmap