Confusion matrix & MCC statistic




In the field of predictive analytics, a confusion matrix is a table that allows the visualization of the performance of an algorithm, whose objective is to predict the classes of a variable.

The name “confusion” stems from it making it easy to see if the system is mislabelling one class as another.

The confusion matrix shows two rows and two columns that report the number of false positivesfalse negativestrue positives, and true negatives.


  • True Positive, TP: Number of data correctly predicted as a member of the class
  • True Negative, TN: Number of data correctly rejected as a member of the class
  • False Positive, FP: Number of data incorrectly identified as a member of the class
  • False Negative, FN: Number of data incorrectly rejected as a member of the class

This allows for a more detailed analysis than, for example, the mere proportion of correct guesses (accuracy). The confusion matrix is a more reliable metric for the real performance of a classifier because it won’t yield misleading results if the data set is unbalanced (that is, when the number of samples in different classes vary greatly). This is the main advantage of representing the results by using this structure: we avoid statistics which return a misleading high success rate.

We could argue that it’s easy to compare two confusion matrices. For example, we could say the confusion matrix M2 is better than confusion matrix M1, below:


Obviously, two matrix only can be compared if they are based on the same data, so they have to sum the same.

Unfortunately, two confusion matrices are not always easily comparable. In fact, two confusion matrices M1 and M2 are comparable if and only if:




When two matrices are acceptably comparable, the matrix with fewer false positives and false negatives can then safely be called a better prediction.

So what can we do if two confusion matrices are not so comparable? How could we then compare them? A good option is using the Matthew’s Correlation Coefficient (MCC), which helps us to represent the confusion matrix with a single value. While there is no perfect way of describing the confusion matrix of true and false positives and negatives with a single number, the Matthew’s Correlation Coefficient is generally regarded as being one of the best such treatments. It is a measure of binary classification quality, and since it takes into account the true and false positives and negatives, it can be used even if the classes are of very different sizes.

The MCC is basically a correlation coefficient between the actual and predicted series. It returns values between −1 and +1. A coefficient of +1 represents a perfect prediction, 0 no better than random prediction and −1 indicates total disagreement between prediction and observation.

The MCC can be calculated directly from the confusion matrix by the formula:


For example, we have developed a methodology to predict trends of an index by using its corresponding Implied Volatility Index. We suppose that the implied volatility index can exhibit 3 possible scenarios: A, B and C:


Our selected index in this test is the S&P 500 (so the Implied Volatility Index considered is VIX). We have identified its trends in a posteriori analysis and we have 3 real classes: upward trends, downward trends and ranged market. Remember that our prediction only predicts upward and downward trends.

The results of our predictions show that, in essence, the expected correspondence between the scenarios A and C, and the actual market of the index, come true:



But as the actual upwards trends dominate the whole period, when we look directly at the frequencies, the results do not look as good:


Now, let’s compare the MCC against the Accuracy statistic (defined as the number of correctly classified/total number).

The MCC’s values of the two classes we wanted to predict show superiority of the upward class, but the two values are not very encouraging. The MCC mean of this test shows poor precision:


However, if we had only worked out the accuracy statistic, the value would have shown an acceptable result:


In fact, the accuracy of the upward class would have been extremely good. This value is high due to the predominance of actual upward trends of the index.

In Conclusion

We have to take into account any existing bias in the actual data and choose the best indicators to analyse our results in order to avoid making the wrong conclusions.

add a comment