Confusion Matrix

12/06/2025

A confusion matrix is a table used to evaluate the performance of a classification model. It summarizes the predictive results and shows the number of correct and incorrect predictions made by the model. The matrix itself displays the true positives, false positives, true negatives, and false negatives, providing insight into how well the model is performing across different classes.

Our confusion matrix illustrates the performance of a classification model. It clearly shows the rates of:

True Positives: Correctly predicted positive observations.
False Positives: Incorrectly predicted positive observations.
False Negatives: Incorrectly predicted negative observations.
True Negatives: Correctly predicted negative observations.

This matrix is a vital tool in understanding how well our model is performing.

Understanding Accuracy, Recall, and Precision

In the realm of data science and machine learning, accuracy, recall, and precision are pivotal metrics that help us gauge the performance of our models. Accuracy is the ratio of correctly predicted instances to the total instances. It gives us an overall effectiveness of the model. The formula for accuracy is: Accuracy = (True Positives + True Negatives) / (Total Instances).

Recall and Precision

Recall, also known as sensitivity, measures the ability of a model to find all the relevant cases (True Positives) in the dataset. The formula is: Recall = True Positives / (True Positives + False Negatives). Precision, on the other hand, quantifies the accuracy of the positive predictions. The formula is: Precision = True Positives / (True Positives + False Positives). Understanding these metrics is essential for evaluating your models comprehensively.

The F1 Score is a measure used in statistical analysis and machine learning that combines precision and recall into a single metric. It is calculated as the harmonic mean of precision and recall, providing a balance between the two metrics. An F1 Score closer to 1 indicates a strong model performance, while a score closer to 0 suggests poor performance. This metric is particularly useful in scenarios where there is an uneven class distribution, such as in binary classification problems.

Understanding F1 Score

The F1 Score is a measure of a test's accuracy that considers both the precision and the recall of the test to compute the score. It is defined as the harmonic mean of precision and recall, providing a balanced measure that is particularly useful when you have an uneven class distribution. The formula for the F1 Score is given by:

F1 Score = 2 × (Precision × Recall) / (Precision + Recall)

Understanding Optimum F1 Score in a Confusion Matrix

In classification problems, especially with imbalanced datasets, accuracy alone can be misleading. This is where the F1 Score becomes a powerful evaluation metric.

What is F1 Score?

The F1 Score is the harmonic mean of Precision and Recall, providing a balance between the two:

F1 = 2 × (Precision × Recall) / (Precision + Recall)

Precision: How many predicted positives are actually correct
Recall: How many actual positives are correctly identified

What is the Optimum F1 Score?

The best possible F1 Score is 1.0, which indicates perfect precision and recall (no false positives and no false negatives).

However, in real-world scenarios, achieving an F1 score of 1 is rare. The "optimum" value depends on the problem context.

F1 Score Interpretation

F1 Score Range	Interpretation
0.90 – 1.00	Excellent (near perfect model)
0.80 – 0.89	Very Good
0.70 – 0.79	Good / Acceptable
0.60 – 0.69	Needs Improvement
Below 0.60	Weak Model Performance

Why "Optimum" Depends on Use Case

Medical Diagnosis: Higher recall is critical → moderate F1 acceptable
Spam Detection: Higher precision needed → higher F1 expected
Fraud Detection: Imbalanced data → F1 between 0.6–0.75 can be strong

Key Insight

There is no fixed threshold for an "optimum" F1 score. Instead, the goal is to:

Adjust the classification threshold
Maximize F1 score based on business requirements
Balance Precision and Recall effectively

Final Takeaway:
The optimum F1 Score is 1.0, but in practice, a score above 0.8 is considered strong. Always optimize F1 based on your specific use case rather than aiming for a fixed number.

F1 Score Explained with Confusion Matrix, Interactive Demo & Example

Confusion Matrix Diagram

F1 Score depends on TP, FP, FN:

Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
F1 = 2 × (Precision × Recall) / (Precision + Recall)

Interactive Threshold vs F1 Score

Move the slider to simulate how threshold affects Precision, Recall, and F1:

Threshold: 0.50

Precision: 0.75

Recall: 0.75

F1 Score: 0.75

Real Dataset Example (Step-by-Step)

Consider the following confusion matrix:

	Predicted Positive	Predicted Negative
Actual Positive	TP = 40	FN = 10
Actual Negative	FP = 20	TN = 30

Step 1: Calculate Precision

Precision = 40 / (40 + 20) = 40 / 60 = 0.67

Step 2: Calculate Recall

Recall = 40 / (40 + 10) = 40 / 50 = 0.80

Step 3: Calculate F1 Score

F1 = 2 × (0.67 × 0.80) / (0.67 + 0.80) = 0.73

👉 Final F1 Score = 0.73 (Good Model Performance)

Real-World Applications of Confusion Matrix

A confusion matrix is more than just a table—it is a powerful tool to evaluate how a machine learning model performs in real-world scenarios. Below are practical applications across industries.

🏥 Medical Diagnosis

Use Case: Disease detection (Cancer, COVID-19)

TP: Disease correctly detected
FN: Disease missed ⚠️
FP: Healthy marked as sick

Focus: High Recall (avoid FN)

📧 Spam Detection

Use Case: Email filtering

TP: Spam correctly filtered
FP: Important email marked spam ❌
FN: Spam in inbox

Focus: High Precision

💳 Fraud Detection

Use Case: Banking transactions

TP: Fraud detected
FN: Fraud missed ⚠️
FP: Legit transaction blocked 😡

Focus: Balance Precision & Recall

🔐 Cybersecurity

Use Case: Intrusion detection systems

TP: Attack detected
FN: Attack missed ⚠️
FP: False alert (alert fatigue)

Focus: Minimize FN + Control FP

🎥 Face Recognition

Use Case: Biometrics, phone unlock

TP: Correct person identified
FP: Wrong person accepted 🚨
FN: Correct person rejected

Focus: High Precision (security)

🛒 Recommendation Systems

Use Case: Netflix, Amazon

TP: Recommended & liked
FP: Recommended but not liked
FN: Missed good recommendation

Focus: User satisfaction balance

🚗 Self-Driving Cars

Use Case: Pedestrian & obstacle detection

TP: Obstacle correctly detected
FN: Missed obstacle ⚠️ (dangerous)
FP: False obstacle (unnecessary braking)

Focus: High Recall (safety-critical)

🎯 Final Insight

A confusion matrix is not just about numbers—it helps you understand what kind of mistakes your model makes and how those mistakes impact real-world decisions.

ROC Curve and AUC Explained (With Intuition & Example)

In machine learning classification problems, evaluating model performance goes beyond accuracy. The ROC Curve (Receiver Operating Characteristic) and AUC (Area Under the Curve) are powerful tools to measure how well a model distinguishes between classes.

📈 What is ROC Curve?

The ROC curve is a graphical representation of a model’s performance across different classification thresholds.

X-axis: False Positive Rate (FPR)
Y-axis: True Positive Rate (TPR / Recall)

Key Formulas

TPR (Recall): TP / (TP + FN)
FPR: FP / (FP + TN)

Each point on the ROC curve represents a different threshold. A good model pushes the curve toward the top-left corner.

📊 What is AUC?

AUC stands for Area Under the ROC Curve. It measures how well the model separates positive and negative classes.

Interpretation:
AUC represents the probability that the model ranks a random positive instance higher than a random negative one.

AUC Value	Performance
1.0	Perfect Model
0.9 – 0.99	Excellent
0.8 – 0.89	Very Good
0.7 – 0.79	Good
0.5	Random Model

🔄 ROC-AUC vs F1 Score

Metric	Focus	Best Use Case
ROC-AUC	Ranking ability	Balanced datasets
F1 Score	Precision + Recall	Imbalanced datasets

🎯 Final Insight

ROC-AUC helps evaluate how well your model separates classes across all thresholds, while F1 Score helps you choose the best threshold based on precision-recall balance.

40 Multiple Choice Questions (MCQs) based on Confusion Matrix

🔷 Confusion Matrix Basics (Q1–Q10)

Q1. In a binary classification, which component of the confusion matrix represents the correctly predicted positive cases?
A) True Negative
B) False Positive
C) True Positive
D) False Negative
Explanation: C. True Positive (TP) = predicted positive and actually positive.

Q2. What does a False Negative mean?
A) Model predicted negative, and it's actually negative
B) Model predicted positive, and it's actually negative
C) Model predicted negative, but it's actually positive
D) Model predicted correctly
Explanation: C. FN = missed positive case.

Q3. Which of the following represents the total number of correct predictions?
A) TP + FP
B) TP + TN
C) TN + FN
D) FP + FN
Explanation: B. Correct = TP + TN.

Q4. A high number of False Positives affects:
A) Recall
B) Accuracy
C) Precision
D) F1 Score
Explanation: C. Precision = TP / (TP + FP)

Q5. Which of the following is NOT part of the confusion matrix?
A) True Negative
B) True Positive
C) False Negative
D) True Unknown
Explanation: D "True Unknown" exists in standard matrix.

Q6. A confusion matrix is mostly used to evaluate:
A) Regression models
B) Clustering
C) Classification models
D) Outlier detection
Explanation: C It applies to classification performance.

Q7. Which of the following increases Recall?
A) Reducing False Positives
B) Increasing True Positives
C) Reducing True Negatives
D) Increasing False Negatives
Explanation: B, Recall= TP / (TP + FN)

Q8. If a model predicts all samples as positive, which metric will be highest?
A) Accuracy
B) Precision
C) Recall
D) F1 Score
Explanation: C. Recall = TP / (TP + FN) → FN = 0 → Recall = 1.

Q9. A confusion matrix of 2x2 is used for:
A) Multi-label classification
B) Binary classification
C) Regression
D) NLP only
Explanation: B. 2x2 matrix suits binary classification.

Q10. Which value in a confusion matrix indicates Type I error?
A) FN
B) TP
C) FP
D) TN
Explanation: C. FP = False Positive = Type I error.

🔶 Accuracy (Q11–Q20)

Q11. Accuracy formula is:
A) TP / (TP + FP)
B) TP / (TP + FN)
C) (TP + TN) / Total
D) FP / (FP + TN)
Explanation: C.Accuracy = correct predictions / total.

Q12. If a model has TP = 80, TN = 10, FP = 5, FN = 5, what is Accuracy?
A) 90%
B) 80%
C) 85%
D) 75%
Explanation: A (80+10)/(80+10+5+5) = 90/100 = 90%

Q13. Accuracy can be misleading when:
A) Classes are balanced
B) Model is perfect
C) Dataset is small
D) Dataset is imbalanced
Explanation: D Accuracy can hide poor class performance in imbalance.

Q14. If all predictions are wrong, Accuracy is:
A) 1
B) 0
C) 0.5
D) Cannot say
Explanation: B No correct predictions = 0 accuracy.

Q15. What happens to Accuracy if FP and FN both increase?
A) Increases
B) Decreases
C) No change
D) Becomes 100%
Explanation: B More errors → less accuracy.

Q16. What is the total sample size in confusion matrix: TP=50, FP=10, FN=5, TN=35?
A) 95
B) 100
C) 90
D) 85
Explanation: Sum = 50 + 10 + 5 + 35 = 100.

Q17. What is Accuracy if TP=0, TN=100, FP=0, FN=0?
A) 0%
B) 50%
C) 100%
D) Undefined
Explanation: C. All true negatives, so perfect accuracy.

Q18. High accuracy in fraud detection can still be misleading due to:
A) Data type
B) Imbalanced classes
C) Too many features
D) High recall
Explanation: B. Fraud cases are rare → imbalance.

Q19. What is the major drawback of Accuracy?
A) Not interpretable
B) Ignores correct predictions
C) Fails in imbalanced data
D) Hard to compute
Explanation: C. Accuracy doesn't reflect minority class performance.

Q20. A model gives 99% accuracy on 99% class-0 and 1% class-1. It always predicts class-0. Is this good?
A) Yes
B) No
C) Maybe
D) Depends on algorithm
Explanation: B. Model ignores class-1; bad despite 99% accuracy.

🔶 Precision (Q21–Q30)

Q21. Precision formula is:
A) TP / (TP + FP)
B) TP / (TP + FN)
C) TP / Total
D) TN / Total
Explanation: A) Measures correctness of positive predictions.

Q22. Precision focuses on:
A) Actual positives
B) Predicted positives
C) Actual negatives
D) False negatives
Explanation: B. Precision = correct positive predictions among all predicted positive.

Q23. TP = 30, FP = 10. Precision = ?
A) 0.75
B) 0.66
C) 0.60
D) 0.50
Explanation: A. 30 / (30+10) = 0.75

Q24. Precision is low when:
A) FP is high
B) TP is high
C) TN is high
D) FN is high
Explanation: A .More FP lowers precision.

Q25. If precision is 1, what does it imply?
A) No false negatives
B) All predicted positives are true
C) No true positives
D) All actual positives are detected
Explanation: B. Precision 1 → FP = 0

Q26. Which domain requires high precision most?
A) Disease detection
B) Spam detection
C) Face recognition
D) Weather forecasting
Explanation: B Better to avoid false alarms in spam detection.

Q27. Precision penalizes:
A) False Negatives
B) False Positives
C) True Positives
D) True Negatives
Explanation: B. FP lowers precision.

Q28. If FP = 0, Precision is:
A) 0
B) 0.5
C) 1
D) Depends
Explanation: C. TP / (TP + 0) = 1

Q29. High precision implies:
A) Few FP
B) Many FN
C) Many TN
D) Many FP
Explanation: A. Precision inversely proportional to FP.

Q30. Which metric is best when cost of false positive is high?
A) Recall
B) Precision
C) Accuracy
D) Specificity
Explanation: B. Avoiding FP is key when it's costly.

🔶 Recall (Q31–Q40)

Q31. Recall formula is:
A) TP / (TP + FN)
B) TP / (TP + FP)
C) TP / Total
D) TP / (TP + TN)
Explanation: A. Measures coverage of actual positives.

Q32. If TP = 80, FN = 20, recall = ?
A) 0.80
B) 0.90
C) 0.85
D) 0.80
Explanation: D. 80 / (80 + 20) = 0.80

Q33. Recall is also called:
A) Sensitivity
B) Specificity
C) Precision
D) F1 Score
Explanation: A. Recall = Sensitivity = True Positive Rate.

Q34. A high FN affects:
A) Recall
B) Precision
C) Accuracy
D) Specificity
Explanation: A. More FN → lower recall.

Q35. In medical diagnosis, which is more important?
A) Precision
B) Accuracy
C) Recall
D) Specificity
Explanation: C. Better to catch all cases, even with some FP.

Q36. Recall is 1 if:
A) TP = 0
B) FN = 0
C) FP = 0
D) TN = 0
Explanation: B. FN = 0 → Recall = 1

Q37. If model predicts only positive, recall becomes:
A) High
B) Low
C) Undefined
D) Zero
Explanation: A. FN = 0 → Recall is high.

Q38. Recall penalizes:
A) False Positives
B) False Negatives
C) True Negatives
D) Precision
Explanation: B. Recall drops with FN.

Q39. F1 Score is:
A) Average of Precision and Recall
B) Max(Precision, Recall)
C) Harmonic mean of Precision and Recall
D) TP + TN / Total
Explanation: C. F1 = 2*(P*R)/(P+R)

Q40. When precision = 1 and recall = 0, F1 Score is:
A) 1
B) 0
C) 0.5
D) Undefined
Explanation: B. Harmonic mean of 1 and 0 is 0.

Confusion Matrix

Understanding Accuracy, Recall, and Precision

Recall and Precision

Understanding F1 Score

Understanding Optimum F1 Score in a Confusion Matrix

What is F1 Score?

What is the Optimum F1 Score?

F1 Score Interpretation

Why "Optimum" Depends on Use Case

Key Insight

F1 Score Explained with Confusion Matrix, Interactive Demo & Example

Confusion Matrix Diagram

Interactive Threshold vs F1 Score

Real Dataset Example (Step-by-Step)

Step 1: Calculate Precision

Step 2: Calculate Recall

Step 3: Calculate F1 Score

Real-World Applications of Confusion Matrix

🏥 Medical Diagnosis

📧 Spam Detection

💳 Fraud Detection

🔐 Cybersecurity

🎥 Face Recognition

🛒 Recommendation Systems

🚗 Self-Driving Cars

🎯 Final Insight

ROC Curve and AUC Explained (With Intuition & Example)

📈 What is ROC Curve?

Key Formulas

📊 What is AUC?

🔄 ROC-AUC vs F1 Score

🎯 Final Insight

40 Multiple Choice Questions (MCQs) based on Confusion Matrix

🔷 Confusion Matrix Basics (Q1–Q10)

🔶 Accuracy (Q11–Q20)

🔶 Precision (Q21–Q30)

🔶 Recall (Q31–Q40)

© 2013 -2026- PM Expert. All Rights Reserved. The certification names are the trademarks of their respective owners

Advanced settings