Understanding Multi-Class Classification with Python's Confusion Matrix
Written on
Multi-class classification differs from binary classification, which only predicts between two classes like ‘Yes’ or ‘No’. Multi-class classification, however, involves predicting among three or more categories. A prime example is the Iris dataset, where different species of flowers are categorized.
Understanding Multi-Class Classification through the Iris Dataset
In the context of the Iris dataset, the task is to classify one of three species—setosa, virginica, or versicolor—using measurements of their sepals and petals. This clearly illustrates a multi-class problem as there are more than two possible outcomes.
We will demonstrate how to construct a confusion matrix for a multi-class classification task. The approach involves splitting the dataset into training and testing subsets and utilizing the Decision Tree algorithm to predict the species of Iris plants. Following the predictions, we will generate a confusion matrix and examine additional metrics such as macro and micro precision. We will also utilize sklearn’s classification_report function to analyze precision, recall, and F1-score metrics.
Importing Required Libraries
We will import the sklearn library to load the Iris dataset, apply the Decision Tree algorithm for classification, divide the data into training and testing sets, and assess the model’s efficacy using a confusion matrix. Additionally, we will use seaborn and matplotlib, which are visualization libraries to create a heatmap for the confusion matrix.
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import confusion_matrix import seaborn as sns import matplotlib.pyplot as plt
Loading the Iris Dataset
The load_iris() function allows us to load the Iris dataset, which is assigned to the variable iris. The X variable contains the features (measurements of sepals and petals), while the y variable holds the corresponding labels (species of Iris).
iris = load_iris() X = iris.data y = iris.target
Splitting the Data
We use the train_test_split() function to separate the dataset into training and testing sets. The test_size parameter is set to 0.32, indicating that 32% of the data will be allocated for testing, with the remaining portion reserved for training. The random_state parameter ensures that the splits are reproducible by producing the same sequence of random numbers each time the code is run.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.32, random_state=15)
Initializing the Classifier
A Decision Tree classifier is initialized with a random_state of 24 for consistency in results.
tree = DecisionTreeClassifier(random_state=24)
Training the Classifier
We employ the fit() method to train the Decision Tree classifier using the training data.
tree.fit(X_train, y_train)
Making Predictions
The predict() method generates predictions based on the test data, with results stored in y_pred.
y_pred = tree.predict(X_test)
Creating a Confusion Matrix
The confusion_matrix() function is utilized to generate a confusion matrix, which juxtaposes the actual labels (y_test) against the predicted labels (y_pred).
cm = confusion_matrix(y_test, y_pred)
Plotting the Confusion Matrix
Finally, we use seaborn’s heatmap() function to visualize the confusion matrix. The xticklabels and yticklabels are set to the names of the Iris species, and the plot is displayed with plt.show().
sns.heatmap(cm, annot=True, fmt='d', cmap='YlGnBu', xticklabels=iris.target_names, yticklabels=iris.target_names) plt.ylabel('Prediction', fontsize=12) plt.xlabel('Actual', fontsize=12) plt.title('Confusion Matrix', fontsize=16) plt.show()
Full Code:
# Import libraries from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import confusion_matrix import seaborn as sns import matplotlib.pyplot as plt
# Load the Iris dataset iris = load_iris() X = iris.data y = iris.target
# Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.32, random_state=15)
# Initialize the Decision Tree classifier tree = DecisionTreeClassifier(random_state=24)
# Train the classifier tree.fit(X_train, y_train)
# Make predictions on the test set y_pred = tree.predict(X_test)
# Create a confusion matrix cm = confusion_matrix(y_test, y_pred)
# Plot the confusion matrix sns.heatmap(cm, annot=True, fmt='d', cmap='YlGnBu', xticklabels=iris.target_names, yticklabels=iris.target_names) plt.ylabel('Prediction', fontsize=12) plt.xlabel('Actual', fontsize=12) plt.title('Confusion Matrix', fontsize=16) plt.show()
This code will produce a 3x3 confusion matrix:
The matrix illustrates the performance of the multi-class classification model by comparing the predicted species with the actual species for the Iris dataset.
The Decision Tree classifier successfully identified all 16 instances of setosa. However, it misclassified 2 versicolor instances as virginica while accurately classifying 15 versicolor instances. Additionally, 3 virginica instances were misclassified as versicolor, but the classifier correctly identified 12 virginica instances. Overall, the Decision Tree classifier exhibited strong performance on the Iris dataset, with only a few misclassifications between the versicolor and virginica classes.
Performance Metrics
Now that we have our confusion matrix, we can derive various performance metrics to assess the model's effectiveness.
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, fbeta_score
# Accuracy accuracy = accuracy_score(y_test, y_pred) print(accuracy) # 0.895833
# Precision precision = precision_score(y_test, y_pred, average=None) print(precision) # [1 0.83333333 0.85714286]
# Recall recall = recall_score(y_test, y_pred, average=None) print(recall) # [1 0.88235294 0.8]
# F1 score f1 = f1_score(y_test, y_pred, average=None) print(f1) # [1 0.85714286 0.82758621]
# F0.5 scores f0_5 = fbeta_score(y_test, y_pred, beta=0.5, average=None) print(f0_5) # [1 0.84269663 0.84507042]
# F2 scores f2 = fbeta_score(y_test, y_pred, beta=2, average=None) print(f2) # [1 0.87209302 0.81081081]
Setting `average=None` calculates the metric for each class individually, providing detailed performance for all classes.
The metrics reveal that the Decision Tree classifier performed admirably on the Iris dataset. The accuracy of 0.8958 indicates that approximately 89.58% of the predictions were correct. Observing the precision scores, the classifier excelled in predicting the setosa class (1.0) while performing slightly less on versicolor and virginica (0.83 and 0.86, respectively). This signifies that when the classifier predicted a sample to belong to a specific class, it was generally correct, particularly for the setosa class.
For recall, the classifier was flawless in identifying setosa instances (1.0) but was slightly less effective for versicolor (0.88) and virginica (0.80) classes. This suggests that while it successfully identified most setosa instances, it missed a few versicolor and virginica instances.
The F1-score, which balances precision and recall, also affirms the classifier’s strong performance, with values ranging from 0.86 to 1.0 across the classes. The F0.5 and F2 scores further substantiate this observation by considering the balance between precision and recall with different weights.
Micro Precision
Micro Precision aggregates all the predictions made by the model and counts how many were correct, irrespective of their class. This metric assigns equal weight to every individual prediction, making it particularly useful for assessing the model’s performance on a global scale, especially when class distributions are balanced.
micro_precision = precision_score(y_test, y_pred, average='micro') print(micro_precision) # 0.89583
The micro precision score of approximately 0.896 suggests that the model is effective in making accurate predictions across all classes, minimizing false positives.
Macro Precision
Macro Precision calculates the precision for each class individually and then averages these values. This metric treats all classes equally, giving a holistic view of the model's performance across different classes. It serves as a valuable indicator for evaluating multi-class classification models, as it considers the performance on each class without bias.
macro_precision = precision_score(y_test, y_pred, average='macro') print(macro_precision) # 0.89682
The macro precision score of about 0.897 indicates that the model maintains consistent precision across various classes, reflecting balanced performance without significant favoritism towards any class.
Classification Report
A classification report provides a straightforward summary of a classification model's performance, like a decision tree. It outlines critical evaluation metrics such as precision, recall, F1-score, and support (number of instances per class), along with overall metrics like accuracy.
from sklearn.metrics import classification_report
cr = classification_report(y_test, y_pred, target_names=iris.target_names) print(cr)
Conclusion
In this article, we explored how to use a confusion matrix to assess a multi-class classification model's performance. We constructed a Decision Tree classifier based on the Iris dataset and calculated several performance metrics. We also introduced metrics like micro and macro precision, enhancing our understanding of the model's capabilities.