Online ISSN: 2515-8260

Improved Sampling Data Workflow Using Smtmk To Increase The Classification Accuracy Of Imbalanced Dataset

Main Article Content

Muhammad Syafiq Alza bin Alias1 , Norazlin Binti Ibrahim2 , Zalhan Bin Mohd Zin3

Abstract

One of the main challenges in machine learning classification is handling imbalanced data because imbalanced data can produce result bias towards the majority class and a poor performance of classification. Therefore, in this paper, an improved workflow is introduced to cater this issue. After combination of Synthetic Minority Oversampling Technique (SMOTE) and Tomek Links or known as SMTmk method is performed, additional step is required to further increase the performance of machine learning classification especially in Specificity field. The step is completed by reducing the number of majority class based on the ratio of minority class. Three machine learning algorithms is used to test the classification result which are Extreme Gradient Boosting, Random Forest and Logistic Regression. Result recorded in this research shows that the ratio of 7 to 1 is better than the established methods which are SMOTE and hybrid method of SMOTE and Tomek Links.

Article Details