Unsupervised learning · AI / ML · Code with Animation

What is unsupervised learning?

Unsupervised learning finds structure in data that has no labels. Clustering groups similar items; dimensionality reduction compresses many features into a few while keeping the important variation. There is no "right answer" to train against — the model discovers patterns on its own.

Why it matters

Most real-world data is unlabeled, and labeling is expensive. Unsupervised methods let you segment customers, detect anomalies, and explore data before you have labels. Dimensionality reduction is also key to visualizing and understanding high-dimensional data, including embeddings later in the track.

What to learn

Clustering: k-means and its assumptions
Choosing the number of clusters
Hierarchical clustering
Dimensionality reduction: PCA
Visualization with t-SNE or UMAP
Anomaly detection
Evaluating results without ground-truth labels

Common pitfall

Trusting clusters as if they were objective truth. Clustering will always return groups, even in random data, and the result depends heavily on your choices — the number of clusters, the distance metric, the scaling. Validate that clusters are meaningful and stable rather than assuming the algorithm found real structure.

Resources

Primary (free):

scikit-learn — Clustering · docs
scikit-learn — Decomposition (PCA) · docs
StatQuest — PCA · video

Practice

Take an unlabeled dataset, scale the features, and run k-means with a few different cluster counts. Use PCA to reduce it to two dimensions and plot the clusters. Judge whether the groups look meaningful. Done when you can argue whether the clustering found real structure or just split noise.

Outcomes

Cluster data and choose a sensible number of groups.
Reduce dimensionality with PCA and visualize it.
Detect anomalies in unlabeled data.
Question whether discovered clusters are actually meaningful.