Are you struggling to understand the Confusion Matrix? You're not alone. Despite its name, the Confusion Matrix can be quite perplexing for many. However, we're here to simplify it for you.

What is a Confusion Matrix?

A Confusion Matrix is a crucial tool in machine learning and statistics. It helps you evaluate the performance of a classification algorithm. By breaking down the true positives, true negatives, false positives, and false negatives, you can gain a clear picture of how well your model is performing.

Why is the Confusion Matrix Important?

Performance Measurement: It provides detailed insight into the performance of your classification model.
Error Analysis: Helps identify where your model is making mistakes, allowing for targeted improvements.
Model Comparison: Essential for comparing different models to select the best one.

Why is it called a "Confusion" Matrix?

Because it shows where the model gets "confused" in its predictions.

How to Interpret a Confusion Matrix?

True Positive(TP): Positive events are correctly classified as Positive.
True Negative(TN): Negative events are correctly classified as Negative.
False Positive(FP): (Type 1 Error) Negative values are incorrectly classified as Positive.
False Negative(FN): (Type 2 Error) Positive values are incorrectly classified as Negative. we discuss these more clearly as we move further.

The performance of the classification model is evaluated using a confusion matrix. I’m sure most of us have gone through this concept several times and still find it confusing. And realized why it is named confusion matrix.

In supervised learning, if the target variable is categorial, then the classification model has to classify given test data into respective categories. The efficiency of classified data to respective categories is checked using a confusion matrix. For simplicity let’s consider binary classification.

The results of the classification model can be categorized into four types:

True Positive(TP): Correctly predicted positive observations.
- Cases where the model correctly predicts the class or events as positive.

Eg1: You think you’ll love the new pizza place,
and you do!

Eg2: A patient has got heart disease and the model predicts the same.

True Negative(TN): Negative events are correctly classified as Negative.
- Cases where the model correctly predicts the class or events as negative.

Eg: You think you won't enjoy that new movie, and you don’t.

Eg: A patient does not have heart disease and the model predicts that the patient is all right.

False Positive(FP): (Type 1 Error)
Incorrectly predicted positive observations (Type I error).
- Cases where the model incorrectly predicts the positive class (Type I error).

Eg: You think that weirdly flavored ice cream will be great, but it’s a disaster.

Eg: A patient does not have heart disease but the model predicts the patient has heart disease.

False Negative(FN) : (Type 2 Error)

Incorrectly predicted negative observations (Type II error).

Cases where the model incorrectly predicts the negative class (Type II error).

Eg: You skip that boring-looking book, only to find out later it’s amazing.

Eg: A patient has heart disease but the model predicts as the patient does not have heart disease.

Construction of Confusion Matrix

Here are a few steps to write the confusion matrix: Let’s consider two characters XY where X can be True or False and Y can take the value Positive or Negative.

Step 1: Fill in the confusion matrix for character Y. It is solely based on predicted value, if the prediction is true then it’s Positive otherwise it’s Negative.

Step 2: Fill in the first character X. Compare the actual with the predicted if both are of the same category say Actual is True and Predicted is True fill T (i.e. True), else fill F (i.e. False).

Now it’s time to check the performance of classification using count of TP, FN, FP, and TN.

Evaluation metrics

Consider there are nine Positive and nine Negative actual values out of which few are incorrectly classified by the model.

Key Metrics Derived from the Confusion Matrix:

Accuracy: (TP + TN) / (TP + TN + FP + FN)
Precision: TP / (TP + FP)
Recall: TP / (TP + FN)
F1 Score: 2 * (Precision * Recall) / (Precision + Recall)

Recall /True Positive Rate (TPR)/Sensitivity: It tells us among actual true events how many are correctly predicted as true.

Recall = TP / (TP + FN)

Precision: Out of all events that are predicted as positive how many are actually positive?

Precision = TP / (TP + FP)

Accuracy: It tells, out of all events how many are predicted correctly.

Accuracy = ( TP + TN) /(TP + TN + FP + FN )

Accuracy gives the same importance to positive and negative events hence use this when we have a balanced data set. Otherwise, the majority might bias the minority class.

The higher the value of Recall, Precision, and Accuracy better the model.

Misclassification rate (Error Rate): How many are wrongly classified w.r.t total number of values.

Error Rate = (FP + FN) / (TP + TN + FP + FN)

False Positive Rate(FPR)/ Specificity: The number of values that are wrongly classified as positive w.r.t the total number of actual negative values.

FPR = FP / (FP + TN)

F Score: There is a chance that precision may be low while recall is high or vice versa, in this scenario, we need to consider both recall and precision to evaluate the model. Hence F Score comes into existence.

F score = ( 1 + β ^2)* (Precision)* (Recall)/((Precision * β ^2) + Recall)

β factor provides the weightage to Recall and Precision.

β = 1: Recall and Precision are balanced
β < 1: Precision oriented evaluation
β >1: Recall-oriented evaluation

When β = 1 we call it F1 score:

F1 score = 2 * Recall * Precision /(Recall + Precision)

The higher the F1 score better the model.

Implementation of the confusion matrix

import numpy
from sklearn import metrics
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score, precision_score,
recall_score, f1_score

actual = [1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0]
predicted = [1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,0]

confusion_matrix = metrics.confusion_matrix(actual, predicted)

# Finding precision and recall

accuracy = accuracy_score(actual, predicted)
print("Accuracy :", accuracy)

precision = precision_score(actual, predicted)
print("Precision :", precision)

recall = recall_score(actual, predicted)
print("Recall :", recall)

F1= f1_score(actual, predicted)
print("F1 Score:", F1)

import seaborn as sns
import matplotlib.pyplot as plt
sns.heatmap(confusion_matrix,
annot=True,
fmt='g',
xticklabels=['0', '1'],
yticklabels=['0', '1'])
plt.ylabel('Actual',fontsize=13)
plt.xlabel('Predicted',fontsize=13)
plt.title('Confusion Matrix',fontsize=17)
plt.show()

The trade-off between Recall and Precision:

In real-time scenarios, where it’s okay to have a false positive than a false negative we choose Recall. Eg: Diagnosis of cancer, if the patient has cancer but is diagnosed as negative then the patient’s life is at stake.

Where it’s okay to have false negatives but not false positives we use Precision. Eg: Missing to punish a criminal is better than punishing an innocent person.

Conclusion

Understanding the Confusion Matrix is essential for anyone working with classification models. By demystifying its components and applications, we hope to make it less confusing and more useful in your data analysis toolkit.

EndNote:

I hope you enjoyed the article and gained insight into how to construct the confusion matrix and evaluation matrix like Recall, Precision, F score, and Accuracy. Please drop your suggestions or queries in the comment section.

Confusion Matrix: A Clear Guide to Understanding It