A Smooth Oversampling Technique Using SLP
(*) Corresponding author
DOI: https://doi.org/10.15866/irea.v11i4.23102
Abstract
The class imbalance problem occurs when the number of samples in a given dataset is not balanced according to the class labels. In a binary classification problem of two classes, one class contains a minority of samples while the other has a majority. The commonly utilized SMOTE technique generates synthetic samples in the middle of a pair of original samples in the feature space to oversample the minority class. The midlines and randomness characteristics of this technique and its extensions capture the local characteristics of the minority class. Besides, these techniques do not significantly improve the classification problem’s performance. Accordingly, this paper proposes a new technique for oversampling using a single-layer perceptron that captures the general characteristics of the data with low resource consumption. The proposed technique uses the weights trained for the network as input to other networks to generate new samples within the min-max range. The results showed that the proposed technique overperformed the SMOTE techniques, as reported using several machine learning techniques, including decision tree, random forest, support vector machine, and K-nearest neighbors. The experiments are conducted using the highly imbalanced credit card transaction dataset, which contains 492 fraudulent transactions out of 284,807 total transactions. The random forest reported the best results, in which the proposed technique improves the recall and the f-measure while maintaining full precision, similar to SMOTE.
Copyright © 2023 Praise Worthy Prize - All rights reserved.
Keywords
Full Text:
PDFReferences
Sharma, S., A. Gosain, and S. Jain. A Review of the Oversampling Techniques in Class Imbalance Problem. in International Conference on Innovative Computing and Communications. 2022. Springer.
https://doi.org/10.1007/978-981-16-2594-7_38
Ma, Y. and H. He, Imbalanced learning: foundations, algorithms, and applications. 2013.
Qazi, N. and K. Raza. Effect of feature selection, SMOTE and under sampling on class imbalance classification. in 2012 UKSim 14th International Conference on Computer Modelling and Simulation. 2012. IEEE.
https://doi.org/10.1109/UKSim.2012.116
Joshi, A.J., F. Porikli, and N. Papanikolopoulos. Multi-class active learning for image classification. in 2009 IEEE conference on computer vision and pattern recognition. 2009. IEEE.
https://doi.org/10.1109/CVPR.2009.5206627
Li, J., et al., A binary PSO-based ensemble under-sampling model for rebalancing imbalanced training data. The Journal of Supercomputing, 2022. 78(5): p. 7428-7463.
https://doi.org/10.1007/s11227-021-04177-6
Le, T., et al., A hybrid approach using oversampling technique and cost-sensitive learning for bankruptcy prediction. Complexity, 2019. 2019.
https://doi.org/10.1155/2019/8460934
Kubat, M. and S. Matwin. Addressing the curse of imbalanced training sets: one-sided selection. in ICML. 1997. Citeseer.
Laurikkala, J. Improving identification of difficult small classes by balancing class distribution. in Conference on artificial intelligence in medicine in Europe. 2001. Springer.
https://doi.org/10.1007/3-540-48229-6_9
Dablain, D., B. Krawczyk, and N.V. Chawla, DeepSMOTE: Fusing deep learning and SMOTE for imbalanced data. IEEE Transactions on Neural Networks and Learning Systems, 2022.
https://doi.org/10.1109/TNNLS.2021.3136503
Koziarski, M., B. Krawczyk, and M. Woźniak, Radial-based oversampling for noisy imbalanced data classification. Neurocomputing, 2019. 343: p. 19-33.
https://doi.org/10.1016/j.neucom.2018.04.089
Chawla, N.V., et al., SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 2002. 16: p. 321-357.
https://doi.org/10.1613/jair.953
Han, H., W.-Y. Wang, and B.-H. Mao. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. in International conference on intelligent computing. 2005. Springer.
https://doi.org/10.1007/11538059_91
Bunkhumpornpat, C., K. Sinapiromsaran, and C. Lursinsap. Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. in Pacific-Asia conference on knowledge discovery and data mining. 2009. Springer.
https://doi.org/10.1007/978-3-642-01307-2_43
Hu, S., et al. MSMOTE: Improving classification performance when training data is imbalanced. in 2009 second international workshop on computer science and engineering. 2009. IEEE.
https://doi.org/10.1109/WCSE.2009.756
Jo, T. and N. Japkowicz, Class imbalances versus small disjuncts. ACM Sigkdd Explorations Newsletter, 2004. 6(1): p. 40-49.
https://doi.org/10.1145/1007730.1007737
Zhang, H. and Z. Wang. A normal distribution-based over-sampling approach to imbalanced data classification. in International conference on advanced data mining and applications. 2011. Springer.
https://doi.org/10.1007/978-3-642-25853-4_7
Dong, Y. and X. Wang. A new over-sampling approach: random-SMOTE for learning from imbalanced data sets. in International Conference on Knowledge Science, Engineering and Management. 2011. Springer.
https://doi.org/10.1007/978-3-642-25975-3_30
Maciejewski, T. and J. Stefanowski, Local neighbourhood extension of SMOTE for mining imbalanced data, in 2011 IEEE symposium on computational intelligence and data mining (CIDM). 2011, IEEE: Paris, France p. 104-111.
https://doi.org/10.1109/CIDM.2011.5949434
Ramentol, E., et al., SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced datasets using SMOTE and rough sets theory. Knowledge and information systems, 2012. 33(2): p. 245-265.
https://doi.org/10.1007/s10115-011-0465-6
Koto, F. SMOTE-Out, SMOTE-Cosine, and Selected-SMOTE: An enhancement strategy to handle imbalance in data level. in 2014 international conference on advanced computer science and information system. 2014. IEEE.
https://doi.org/10.1109/ICACSIS.2014.7065849
Nekooeimehr, I. and S.K. Lai-Yuen, Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets. Expert Systems with Applications, 2016. 46: p. 405-416.
https://doi.org/10.1016/j.eswa.2015.10.031
Torres, F.R., J.A. Carrasco-Ochoa, and J.F. Martínez-Trinidad. SMOTE-D a deterministic version of SMOTE. in Mexican Conference on Pattern Recognition. 2016. Springer.
https://doi.org/10.1007/978-3-319-39393-3_18
Douzas, G. and F. Bacao, Geometric SMOTE: Effective oversampling for imbalanced learning through a geometric extension of SMOTE. arXiv preprint arXiv:1709.07377, 2017.
Douzas, G., F. Bacao, and F. Last, Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Information Sciences, 2018. 465: p. 1-20.
https://doi.org/10.1016/j.ins.2018.06.056
Gosain, A. and S. Sardana, Farthest SMOTE: a modified SMOTE approach, in Computational Intelligence in Data Mining. 2019, Springer: Singapore. p. 309-320.
https://doi.org/10.1007/978-981-10-8055-5_28
Maldonado, S., J. López, and C. Vairetti, An alternative SMOTE oversampling strategy for high-dimensional datasets. Applied Soft Computing, 2019. 76: p. 380-389.
https://doi.org/10.1016/j.asoc.2018.12.024
Pan, T., et al., Learning imbalanced datasets based on SMOTE and Gaussian distribution. Information Sciences, 2020. 512: p. 1214-1233.
https://doi.org/10.1016/j.ins.2019.10.048
Rodríguez-Torres, F., J.F. Martínez-Trinidad, and J.A. Carrasco-Ochoa, An Oversampling Method for Class Imbalance Problems on Large Datasets. Applied Sciences, 2022. 12(7): p. 3424.
https://doi.org/10.3390/app12073424
Cervantes, J., et al., PSO-based method for SVM classification on skewed data sets. Neurocomputing, 2017. 228: p. 187-197.
https://doi.org/10.1016/j.neucom.2016.10.041
Li, J., et al., Adaptive multi-objective swarm fusion for imbalanced data classification. Information Fusion, 2018. 39: p. 1-24.
https://doi.org/10.1016/j.inffus.2017.03.007
Fajardo, V.A., et al., On oversampling imbalanced data with deep conditional generative models. Expert Systems with Applications, 2021. 169: p. 114463.
https://doi.org/10.1016/j.eswa.2020.114463
Dal Pozzolo, A., et al. Calibrating probability with undersampling for unbalanced classification. in 2015 IEEE symposium series on computational intelligence. 2015. IEEE.
https://doi.org/10.1109/SSCI.2015.33
Zainab, K., Dhanda, N., Abbas, Q., Assessment of Class Balancing Techniques for Credit Card Deception Recognition, (2021) International Journal on Communications Antenna and Propagation (IRECAP), 11 (5), pp. 347-353.
https://doi.org/10.15866/irecap.v11i5.20878
Moloi, K., Langa, H., A Fault Diagnostic Protection Scheme Technique for a Grid-Integrated Power Distribution System, (2023) International Review of Electrical Engineering (IREE), 18 (3), pp. 188-197.
https://doi.org/10.15866/iree.v18i3.23095
Ben Safar, S., A New Approach for Faults Detection and Classification in Overhead Line Systems Using Multiple Methods, (2020) International Review of Electrical Engineering (IREE), 15 (5), pp. 412-420.
https://doi.org/10.15866/iree.v15i5.16800
Moloi, K., Jordaan, J., Hamam, Y., The Development of a High Impedance Fault Diagnostic Scheme on Power Distribution Network, (2020) International Review of Electrical Engineering (IREE), 15 (1), pp. 69-79.
https://doi.org/10.15866/iree.v15i1.17074
Shatnawi, M., Bani Yassein, M., Aljawarneh, S., Alodibat, S., Meqdadi, O., Hmeidi, I., Al Zoubi, O., An Improvement of Neural Network Algorithm for Anomaly Intrusion Detection System, (2020) International Journal on Communications Antenna and Propagation (IRECAP), 10 (2), pp. 84-93.
https://doi.org/10.15866/irecap.v10i2.18735
Shurrab, S., Duwairi, R., Effect of Missing Data Treatment on the Predictive Accuracy of C4.5 Classifier, (2021) International Journal on Communications Antenna and Propagation (IRECAP), 11 (3), pp. 156-165.
https://doi.org/10.15866/irecap.v11i3.19721
Bani Yassein, M., Shatnawi, M., Alomari, O., Users Awareness Prediction of Cyber Security Aspects in Twitter Using Machine Learning Algorithms, (2021) International Journal on Communications Antenna and Propagation (IRECAP), 11 (6), pp. 383-392.
https://doi.org/10.15866/irecap.v11i6.20725
Refbacks
- There are currently no refbacks.
Please send any question about this web site to info@praiseworthyprize.com
Copyright © 2005-2024 Praise Worthy Prize