Member-only story

CLASSIFICATION OF IMBALANCE FINANCIAL DATA

ML and accuracy metrics when dealing with imbalance data-set

Predicting the success of market -campaign

Sarit Maitra

--

Image by author

Classification is one of the cornerstones of Supervised Machine Learning and is being used in knowledge discovery in databases and data mining. In a classification model, learning algorithm reveals the underlying relationship between the features and target variables, and identifies a model that best fits the training data.

“The class imbalance distribution, by itself, does not seem to be a problem, but when allied to highly overlapped classes, it can significantly decrease the number of minority (small) class examples correctly classified.”

Since most of the standard learning algorithms consider a balanced training set, this may generate sub-optimal classification models, i.e. a good coverage of the majority examples, whereas the minority ones are misclassified frequently. Therefore, those algorithms, which obtain a good behavior in the framework of standard classification, do not necessarily achieve the best performance for imbalanced datasets. There are several reasons behind…

--

--

No responses yet