Importance of Mathews Correlation Coefficient & Cohen’s Kappa for Imbalanced Classes

Which accuracy metrics to use for imbalanced class use case?

Sarit Maitra
11 min readJun 27, 2021
Image by author

https://sarit-maitra.medium.com/membership

Accuracy and F1 score computed on confusion matrices have been among the most popular adopted metrics in binary classification tasks and a lot of businesses are still relying on these when dealing with imbalance data set. Imbalanced distribution of data is a big challenge for standard learning algorithms and statistical measures can dangerously show overoptimistic inflated results. Imbalanced class is persistent in many real world problems, especially when connected with anomaly detection, such as in financial fraud, email fraud detection, medical diagnosis or computer intrusion detections.

Here, we will talk about Matthews Correlation Coefficient (MCC) and Cohen’s Kappa; these two are more reliable statistical measures when it comes to imbalanced class distribution.

  • Though MCC sounds like magic bullet to determine accuracy, but to simplify, MCC is Pearson correlation coefficient to a confusion matrix.
  • Cohen’s Kappa is calculated based on the confusion matrix. The value for kappa can be less than 0 (-ve). However, instead of overall accuracy, Kappa takes

--

--

Sarit Maitra
Sarit Maitra

Written by Sarit Maitra

Analytics & Data Science Practice Lead

Responses (1)