Open Access Open Access  Restricted Access Subscription or Fee Access

A Smooth Oversampling Technique Using SLP


(*) Corresponding author


Authors' affiliations


DOI: https://doi.org/10.15866/irea.v11i4.23102

Abstract


The class imbalance problem occurs when the number of samples in a given dataset is not balanced according to the class labels. In a binary classification problem of two classes, one class contains a minority of samples while the other has a majority. The commonly utilized SMOTE technique generates synthetic samples in the middle of a pair of original samples in the feature space to oversample the minority class. The midlines and randomness characteristics of this technique and its extensions capture the local characteristics of the minority class. Besides, these techniques do not significantly improve the classification problem’s performance. Accordingly, this paper proposes a new technique for oversampling using a single-layer perceptron that captures the general characteristics of the data with low resource consumption. The proposed technique uses the weights trained for the network as input to other networks to generate new samples within the min-max range. The results showed that the proposed technique overperformed the SMOTE techniques, as reported using several machine learning techniques, including decision tree, random forest, support vector machine, and K-nearest neighbors. The experiments are conducted using the highly imbalanced credit card transaction dataset, which contains 492 fraudulent transactions out of 284,807 total transactions. The random forest reported the best results, in which the proposed technique improves the recall and the f-measure while maintaining full precision, similar to SMOTE.
Copyright © 2023 Praise Worthy Prize - All rights reserved.

Keywords


SMOTE; Oversampling; Binary Classification; Performance Evaluation

Full Text:

PDF


References


Sharma, S., A. Gosain, and S. Jain. A Review of the Oversampling Techniques in Class Imbalance Problem. in International Conference on Innovative Computing and Communications. 2022. Springer.
https://doi.org/10.1007/978-981-16-2594-7_38

Ma, Y. and H. He, Imbalanced learning: foundations, algorithms, and applications. 2013.

Qazi, N. and K. Raza. Effect of feature selection, SMOTE and under sampling on class imbalance classification. in 2012 UKSim 14th International Conference on Computer Modelling and Simulation. 2012. IEEE.
https://doi.org/10.1109/UKSim.2012.116

Joshi, A.J., F. Porikli, and N. Papanikolopoulos. Multi-class active learning for image classification. in 2009 IEEE conference on computer vision and pattern recognition. 2009. IEEE.
https://doi.org/10.1109/CVPR.2009.5206627

Li, J., et al., A binary PSO-based ensemble under-sampling model for rebalancing imbalanced training data. The Journal of Supercomputing, 2022. 78(5): p. 7428-7463.
https://doi.org/10.1007/s11227-021-04177-6

Le, T., et al., A hybrid approach using oversampling technique and cost-sensitive learning for bankruptcy prediction. Complexity, 2019. 2019.
https://doi.org/10.1155/2019/8460934

Kubat, M. and S. Matwin. Addressing the curse of imbalanced training sets: one-sided selection. in ICML. 1997. Citeseer.

Laurikkala, J. Improving identification of difficult small classes by balancing class distribution. in Conference on artificial intelligence in medicine in Europe. 2001. Springer.
https://doi.org/10.1007/3-540-48229-6_9

Dablain, D., B. Krawczyk, and N.V. Chawla, DeepSMOTE: Fusing deep learning and SMOTE for imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 2022.
https://doi.org/10.1109/TNNLS.2021.3136503

Koziarski, M., B. Krawczyk, and M. Woźniak, Radial-based oversampling for noisy imbalanced data classification. Neurocomputing, 2019. 343: p. 19-33.
https://doi.org/10.1016/j.neucom.2018.04.089

Chawla, N.V., et al., SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 2002. 16: p. 321-357.
https://doi.org/10.1613/jair.953

Han, H., W.-Y. Wang, and B.-H. Mao. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. in International conference on intelligent computing. 2005. Springer.
https://doi.org/10.1007/11538059_91

Bunkhumpornpat, C., K. Sinapiromsaran, and C. Lursinsap. Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. in Pacific-Asia conference on knowledge discovery and data mining. 2009. Springer.
https://doi.org/10.1007/978-3-642-01307-2_43

Hu, S., et al. MSMOTE: Improving classification performance when training data is imbalanced. in 2009 second international workshop on computer science and engineering. 2009. IEEE.
https://doi.org/10.1109/WCSE.2009.756

Jo, T. and N. Japkowicz, Class imbalances versus small disjuncts. ACM Sigkdd Explorations Newsletter, 2004. 6(1): p. 40-49.
https://doi.org/10.1145/1007730.1007737

Zhang, H. and Z. Wang. A normal distribution-based over-sampling approach to imbalanced data classification. in International conference on advanced data mining and applications. 2011. Springer.
https://doi.org/10.1007/978-3-642-25853-4_7

Dong, Y. and X. Wang. A new over-sampling approach: random-SMOTE for learning from imbalanced data sets. in International Conference on Knowledge Science, Engineering and Management. 2011. Springer.
https://doi.org/10.1007/978-3-642-25975-3_30

Maciejewski, T. and J. Stefanowski, Local neighbourhood extension of SMOTE for mining imbalanced data, in 2011 IEEE symposium on computational intelligence and data mining (CIDM). 2011, IEEE: Paris, France p. 104-111.
https://doi.org/10.1109/CIDM.2011.5949434

Ramentol, E., et al., SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced datasets using SMOTE and rough sets theory. Knowledge and information systems, 2012. 33(2): p. 245-265.
https://doi.org/10.1007/s10115-011-0465-6

Koto, F. SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE: An enhancement strategy to handle imbalance in data level. in 2014 international conference on advanced computer science and information system. 2014. IEEE.
https://doi.org/10.1109/ICACSIS.2014.7065849

Nekooeimehr, I. and S.K. Lai-Yuen, Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets. Expert Systems with Applications, 2016. 46: p. 405-416.
https://doi.org/10.1016/j.eswa.2015.10.031

Torres, F.R., J.A. Carrasco-Ochoa, and J.F. Martínez-Trinidad. SMOTE-D a deterministic version of SMOTE. in Mexican Conference on Pattern Recognition. 2016. Springer.
https://doi.org/10.1007/978-3-319-39393-3_18

Douzas, G. and F. Bacao, Geometric SMOTE: Effective oversampling for imbalanced learning through a geometric extension of SMOTE. arXiv preprint arXiv:1709.07377, 2017.

Douzas, G., F. Bacao, and F. Last, Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 2018. 465: p. 1-20.
https://doi.org/10.1016/j.ins.2018.06.056

Gosain, A. and S. Sardana, Farthest SMOTE: a modified SMOTE approach, in Computational Intelligence in Data Mining. 2019, Springer: Singapore. p. 309-320.
https://doi.org/10.1007/978-981-10-8055-5_28

Maldonado, S., J. López, and C. Vairetti, An alternative SMOTE oversampling strategy for high-dimensional datasets. Applied Soft Computing, 2019. 76: p. 380-389.
https://doi.org/10.1016/j.asoc.2018.12.024

Pan, T., et al., Learning imbalanced datasets based on SMOTE and Gaussian distribution. Information Sciences, 2020. 512: p. 1214-1233.
https://doi.org/10.1016/j.ins.2019.10.048

Rodríguez-Torres, F., J.F. Martínez-Trinidad, and J.A. Carrasco-Ochoa, An Oversampling Method for Class Imbalance Problems on Large Datasets. Applied Sciences, 2022. 12(7): p. 3424.
https://doi.org/10.3390/app12073424

Cervantes, J., et al., PSO-based method for SVM classification on skewed data sets. Neurocomputing, 2017. 228: p. 187-197.
https://doi.org/10.1016/j.neucom.2016.10.041

Li, J., et al., Adaptive multi-objective swarm fusion for imbalanced data classification. Information Fusion, 2018. 39: p. 1-24.
https://doi.org/10.1016/j.inffus.2017.03.007

Fajardo, V.A., et al., On oversampling imbalanced data with deep conditional generative models. Expert Systems with Applications, 2021. 169: p. 114463.
https://doi.org/10.1016/j.eswa.2020.114463

Dal Pozzolo, A., et al. Calibrating probability with undersampling for unbalanced classification. in 2015 IEEE symposium series on computational intelligence. 2015. IEEE.
https://doi.org/10.1109/SSCI.2015.33

Zainab, K., Dhanda, N., Abbas, Q., Assessment of Class Balancing Techniques for Credit Card Deception Recognition, (2021) International Journal on Communications Antenna and Propagation (IRECAP), 11 (5), pp. 347-353.
https://doi.org/10.15866/irecap.v11i5.20878

Moloi, K., Langa, H., A Fault Diagnostic Protection Scheme Technique for a Grid-Integrated Power Distribution System, (2023) International Review of Electrical Engineering (IREE), 18 (3), pp. 188-197.
https://doi.org/10.15866/iree.v18i3.23095

Ben Safar, S., A New Approach for Faults Detection and Classification in Overhead Line Systems Using Multiple Methods, (2020) International Review of Electrical Engineering (IREE), 15 (5), pp. 412-420.
https://doi.org/10.15866/iree.v15i5.16800

Moloi, K., Jordaan, J., Hamam, Y., The Development of a High Impedance Fault Diagnostic Scheme on Power Distribution Network, (2020) International Review of Electrical Engineering (IREE), 15 (1), pp. 69-79.
https://doi.org/10.15866/iree.v15i1.17074

Shatnawi, M., Bani Yassein, M., Aljawarneh, S., Alodibat, S., Meqdadi, O., Hmeidi, I., Al Zoubi, O., An Improvement of Neural Network Algorithm for Anomaly Intrusion Detection System, (2020) International Journal on Communications Antenna and Propagation (IRECAP), 10 (2), pp. 84-93.
https://doi.org/10.15866/irecap.v10i2.18735

Shurrab, S., Duwairi, R., Effect of Missing Data Treatment on the Predictive Accuracy of C4.5 Classifier, (2021) International Journal on Communications Antenna and Propagation (IRECAP), 11 (3), pp. 156-165.
https://doi.org/10.15866/irecap.v11i3.19721

Bani Yassein, M., Shatnawi, M., Alomari, O., Users Awareness Prediction of Cyber Security Aspects in Twitter Using Machine Learning Algorithms, (2021) International Journal on Communications Antenna and Propagation (IRECAP), 11 (6), pp. 383-392.
https://doi.org/10.15866/irecap.v11i6.20725


Refbacks

  • There are currently no refbacks.



Please send any question about this web site to info@praiseworthyprize.com
Copyright © 2005-2024 Praise Worthy Prize