Unsupervised Machine Learning

26/05/2025

🧠 What Is Unsupervised Machine Learning?

Unsupervised learning is when a computer learns from data without any labels or correct answers.
It tries to find patterns or groupings on its own.

🎓 Think of it like this:

Imagine giving a bunch of toys to a child without telling them the names or types of toys.

The child starts grouping similar toys together—like cars in one pile, dolls in another—just based on how they look or feel.
That's what unsupervised learning does—it finds structure in the data without being told what the right answers are.

🧪 In machine learning terms:

  • You give the algorithm just the input data—no labels.

  • The algorithm looks for:

    • Groups of similar things (called clustering).

    • Unusual data points (called anomaly detection).

    • Hidden patterns or features (called dimensionality reduction).

✅ Simple Real-Life Examples:

  1. Customer segmentation

    • An e-commerce site groups customers based on their behavior (what they browse or buy), without knowing their names or preferences.

    • This helps in personalized marketing.

  2. Music or movie recommendations

    • Platforms like Spotify or Netflix group users with similar tastes, even if they don't know who you are or what you like exactly.

  3. Anomaly detection in credit card transactions

    • The system learns what a "normal" transaction looks like.

    • If it sees something very different (e.g., a large overseas purchase), it flags it as suspicious.

  4. Grouping similar news articles

    • News platforms cluster similar news stories, even without knowing their exact topics.

📌 In short:

Unsupervised learning means the machine tries to understand and organize data on its own, without being told what the data means.

Unsupervised Learning Techniques

1. Clustering

Clustering is an unsupervised machine learning technique that groups similar data points together based on patterns in the data. In marketing, businesses use clustering to segment customers into groups such as frequent buyers, occasional shoppers, or discount seekers—based on their shopping behavior, spending, and preferences. This helps them tailor marketing campaigns and improve customer engagement. Similarly, Netflix uses clustering to group users with similar viewing habits. For example, users who watch a lot of action movies may be clustered together, while others who prefer romantic comedies or documentaries form different groups. When a new user behaves like a certain cluster, Netflix recommends shows that are popular within that group. In both cases, clustering helps uncover hidden patterns and enables personalized experiences without needing labeled data. 

2. Dimensionality Reduction

Sometimes data has too many features (dimensions), and that can make it hard to understand or visualize. Dimensionality reduction simplifies this data by keeping only the most important information and discarding the rest.

One widely used technique is Principal Component Analysis (PCA). It takes complex data and transforms it into a smaller set of variables (called components) that still capture the essence of the original data. Another technique, t-SNE, is used for visualizing complex, high-dimensional data in 2D or 3D space—it's popular in machine learning research and data visualization. Deep learning also uses autoencoders, which are special neural networks that learn how to compress data and then reconstruct it.

This technique is useful when dealing with datasets with lots of variables—like reducing the number of features in a customer database while preserving the key trends.

Imagine you have a dataset of fruits, and for each fruit, you've measured five things: weight, size, redness, yellowness, and sweetness. Each fruit is like a point in a five-dimensional space—hard to imagine and even harder to visualize or analyze directly. Some of these features may overlap in the information they give. For example, fruits that are heavier are usually also larger, so "weight" and "size" are telling you almost the same thing. Similarly, "redness" and "yellowness" are both types of color, and only one is dominant in a given fruit. PCA looks at all five features across the whole dataset and figures out the directions (called principal components) where the data varies the most. It then combines the original features to create two new ones that capture as much of this variation as possible. These two new features aren't simply picked—they're mathematical combinations of the original five. The first new feature might represent a blend of weight and size (maybe call it "mass"), and the second could be a mix of color and sweetness (maybe call it "flavor"). By reducing the data to these two new features, PCA makes it easier to analyze and visualize the fruits in just two dimensions—while still keeping most of the differences that separate apples, oranges, and bananas.

3. Anomaly Detection

Anomaly detection (also called outlier detection) is about finding data points that are unusual or don't fit with the rest.

For example, if someone makes ten small credit card purchases every month and suddenly makes a massive purchase in another country, that would stand out as an anomaly. Algorithms like Isolation Forest or One-Class SVM are designed to detect these kinds of outliers. Autoencoders can also be used to detect anomalies by learning how to reconstruct "normal" data and flagging anything it can't reconstruct well.

Anomaly detection is especially important in fraud detection, cybersecurity, and quality control.

Anomaly detection is used to identify unusual or unexpected patterns in data that don't fit with what is considered normal. For example, in credit card fraud detection, if a person usually makes small purchases near their home during the day, and suddenly a large transaction occurs in a foreign country at 2 AM, it stands out as suspicious. Even though no one labeled it as fraud, an anomaly detection system can flag it because it's very different from the person's normal spending behavior. This technique is widely used in areas like banking, cybersecurity, healthcare, and manufacturing to detect fraud, intrusions, health risks, or equipment failures—helping prevent problems before they escalate. 

4. Association Rule Learning

Association rule learning is a machine learning technique used to discover relationships between items in large datasets. It helps answer questions like: "If someone buys item A, how likely are they to buy item B as well?" A common example is market basket analysis in supermarkets or online stores. Suppose data shows that many people who buy bread also buy butter—the system learns a rule like "bread ⇒ butter." This doesn't mean bread causes butter purchases, but it shows a strong pattern or correlation between the two items.

Retailers use this technique to place related products together, recommend items, or design combo offers. Association rules are defined using three main concepts: support (how often items appear together), confidence (how often the rule is true), and lift (how much more likely two items are bought together compared to randomly). Beyond retail, this technique is also used in healthcare (e.g., finding common symptoms that occur together) or web analytics (e.g., pages users visit in sequence).

In short, association rule learning finds "if-then" patterns in data that help understand and predict behavior without requiring labeled data.

MCQs on Unsupervised Machine Learning

1. What is the main goal of unsupervised learning?

A) To predict future values
B) To learn from labeled data
C) To find hidden patterns or structure in data
D) To train on data with answers

Answer: C
Explanation: Unsupervised learning tries to discover patterns, groupings, or structure in data without using labeled outputs.

2. Which of the following is a common task in unsupervised learning?

A) Classification
B) Regression
C) Clustering
D) Time Series Forecasting

Answer: C
Explanation: Clustering is a key unsupervised learning task where similar data points are grouped together.

3. In unsupervised learning, the data provided to the algorithm is:

A) Labeled
B) Unlabeled
C) Sorted
D) Pre-classified

Answer: B
Explanation: Unsupervised learning uses unlabeled data—it doesn't know the correct output in advance.

4. Which of the following algorithms is used for clustering?

A) Linear Regression
B) K-Means
C) Naive Bayes
D) Logistic Regression

Answer: B
Explanation: K-Means is a popular clustering algorithm used in unsupervised learning.

5. Principal Component Analysis (PCA) is mainly used for:

A) Clustering
B) Classification
C) Dimensionality reduction
D) Anomaly detection

Answer: C
Explanation: PCA reduces the number of features while retaining important information.

6. Which of the following is not an unsupervised learning algorithm?

A) DBSCAN
B) Decision Tree
C) Hierarchical Clustering
D) Autoencoder

Answer: B
Explanation: Decision Trees are used in supervised learning, where labeled data is required.

7. Which technique can be used to detect outliers in data?

A) K-Means
B) Isolation Forest
C) PCA
D) Naive Bayes

Answer: B
Explanation: Isolation Forest is commonly used in unsupervised anomaly (outlier) detection.

8. Association rule learning is commonly used in:

A) Fraud detection
B) Speech recognition
C) Market basket analysis
D) Image classification

Answer: C
Explanation: Association rules (like "People who buy bread often buy butter") are used in market basket analysis.

9. t-SNE is mainly used for:

A) Prediction
B) Image classification
C) Data visualization
D) Regression analysis

Answer: C
Explanation: t-SNE is a technique for visualizing high-dimensional data in 2D or 3D.

10. Which of the following best describes clustering?

A) Splitting data based on output values
B) Grouping similar items without predefined labels
C) Predicting numerical values
D) Sorting data in ascending order

Answer: B
Explanation: Clustering is about automatically grouping similar data points without using labels.