Member-only story
CLASSIFICATION OF IMBALANCE FINANCIAL DATA
ML and accuracy metrics when dealing with imbalance data-set
Predicting the success of market -campaign
Classification is one of the cornerstones of Supervised Machine Learning and is being used in knowledge discovery in databases and data mining. In a classification model, learning algorithm reveals the underlying relationship between the features and target variables, and identifies a model that best fits the training data.
“The class imbalance distribution, by itself, does not seem to be a problem, but when allied to highly overlapped classes, it can significantly decrease the number of minority (small) class examples correctly classified.”
Since most of the standard learning algorithms consider a balanced training set, this may generate sub-optimal classification models, i.e. a good coverage of the majority examples, whereas the minority ones are misclassified frequently. Therefore, those algorithms, which obtain a good behavior in the framework of standard classification, do not necessarily achieve the best performance for imbalanced datasets. There are several reasons behind…