CNB-MRF: Adapting Correlative Naive Bayes Classifier and MapReduce Framework for Big Data Classification

Chitrakant Banchhor; N. Srinivasu

doi:10.15866/irecos.v11i11.10116

CNB-MRF: Adapting Correlative Naive Bayes Classifier and MapReduce Framework for Big Data Classification

Chitrakant Banchhor^(1*), N. Srinivasu⁽²⁾

^(*) Corresponding author

Authors' affiliations

DOI: https://doi.org/10.15866/irecos.v11i11.10116

Abstract

Due to the continuous arrival of plenty of raw data, the big data classification is getting much attention in various fields, such as industry, medicine, and financial business. Due to the urgent need of big data classification in those fields, several standard classifiers are adapted to perform the big data classification, but they are not able to deal with these huge data problems. More commonly, Navie Bayes classifier is mostly preferred for big data classification because of its simple computation procedure. However, the main drawback behind this classifier is the independence hypothesis, in which the input data are conditionally independent of each other. To overcome this drawback, we propose a new classification algorithm using Correlative Naïve Bayes classifier and MapReduce framework (CNB-MRF). Here, the newly proposed correlation function with MapReduce framework is used to make the classification based on the dependent hypothesis to improve the classification performance. Here, the data classification can be easily performed for every new sample through the probability index table of the training data sample and the posterior probability of the testing data samples using MapReduce framework. Then, the classification by the proposed CNB-MRF classifier is performed using localization and skin dataset. It is concluded that, the proposed CNB-MRF classifier achieves a high classification accuracy of 74.77% and 61.35% for localization and skin dataset respectively, as compared with Naïve bayes classifier and MapReduce framework (NB-MRF).
Copyright © 2016 Praise Worthy Prize - All rights reserved.

Keywords

Navie Bayes Classifier; MapReduce Framework; Correlation Function; Data Classification

Full Text:

PDF

References

Wu, X., Zhu, X., Wu, G.-Q., Ding, W., “Data mining with big data”, IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97–107, 2014.
http://dx.doi.org/10.1109/tkde.2013.109

V. Marx, “The big challenges of big data”, Nature, vol. 498, no. 7453, pp. 255–260, 2013.
http://dx.doi.org/10.1038/498255a

M. Minelli, M. Chambers, A. Dhiraj, “Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses”, 1st Edition, Wiley Publishing, 2013.
http://dx.doi.org/10.1002/9781118562260

D. Plummer, T. Bittman, T. Austin, D. Cearley, D. S. “Cloud, Defining and describing an emerging phenomenon”, Technical report, Gartner, 2008.
http://dx.doi.org/10.4337/9781783476947.00006

J. Han, M. Kamber, and J. Pei, “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, Waltham, Mass, USA, 3rd edition, 2011.
http://dx.doi.org/10.1016/b978-0-12-381479-1.00007-1

Xindong Wu1,2, Xingquan Zhu3, Gong-Qing Wu2, Wei Ding4, " data mining with big data", IEEE Transactions on Knowledge and Data Engineering, vol.26 , no.1, pp. 97-107, 2013.
http://dx.doi.org/10.1109/tkde.2013.109

A. Sathi, “Big Data Analytics: Disruptive Technologies for Changing the Game”, MC Press, 2012.
http://dx.doi.org/10.1002/9781119205005

E. Alpaydin, “Introduction to Machine Learning”, 2nd Edition, MIT Press, Cambridge, MA, 2010.
http://dx.doi.org/10.1017/s1351324912000290

Woniak M, Grana M, Corchado E, “A survey of multiple classifier systems as hybrid systems”, Information Fusion, vol. 16, pp. 3–17, 2014.
http://dx.doi.org/10.1016/j.inffus.2013.04.006

Haibo He, Edwardo A. Garcia, "Learning from Imbalanced Data", IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 9, 2009.
http://dx.doi.org/10.1109/tkde.2008.239

Zhu Q Y, Qin A K, Suganthan P N, Huang G B, “Evolutionary extreme learning machine”, Pattern Recognition, vol. 38, no. 10, pp. 1759–1763, 2005.
http://dx.doi.org/10.1016/j.patcog.2005.03.028

Victoria López, Sara del Río, José Manuel Benítez, Francisco Herrera, "Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data", Fuzzy Sets and Systems, Vol. 258, pp. 5–38, January 2015.
http://dx.doi.org/10.1016/j.fss.2014.01.015

H. He, E.A. Garcia, “Learning from imbalanced data”, IEEE Transaction on Knowledge and Data Engineering, vol. 21, no. 9, pp. 1263–1284, 2009.
http://dx.doi.org/10.1109/tkde.2008.239

Y. Sun, A.K.C. Wong, M.S. Kamel, “Classification of imbalanced data: A review”, Int. J. Pattern Recognit. Artif. Intell., vol. 23, no. 4, pp. 687–719, 2009.
http://dx.doi.org/10.1142/s0218001409007326

Xindong Wu, Vipin Kumar, J. Ross Quinlan, Joydeep Ghosh, Qiang Yang, Hiroshi Motoda, Geoffrey J. McLachlan, Angus Ng, Bing Liu, Philip S. Yu, Zhi-Hua Zhou, Michael Steinbach, David J. Hand, Dan Steinberg, "Top 10 algorithms in data mining”, Knowledge and Information Systems, Vol. 14, no. 1, pp. 1-37, January 2008.
http://dx.doi.org/10.1007/s10115-007-0114-2

Chu, C.-T., Kim, S. K., Lin, Y. A., Yu, Y. Y., “Map-reduce for machine learning on multicore”, In: Advances in Neural Information Processing Systems, pp. 281–288, 2007.
http://dx.doi.org/10.1007/978-3-642-34478-7_35

Shen Gao, Kai Gao, "Modelling on Classification and Retrieval Strategy in Map-Reduce Based IR System", in proceedings of 2014 International Conference on Modelling, Identification and Control, Melbourne, Australia, December 3-5, 2014.
http://dx.doi.org/10.1109/icmic.2014.7020773

J.Dean and S.Ghemawat, "MapReduce: Simplified data processing on large clusters", Communications of the ACM, Vol.51, No.1, pp. 107-113, 2008.
http://dx.doi.org/10.1145/1327452.1327492

Isaac Triguero,, Daniel Peralta, Jaume Bacardit, Salvador Garcıac, Francisco Herrera, "MRPR: A MapReduce Solution for Prototype Reduction in Big Data Classification", Neurocomputing, Vol. 150, Part A, pp. 331–345, February 2015.
http://dx.doi.org/10.1016/j.neucom.2014.04.078

Simone Scardapane, Dianhui Wang, Massimo Panella, "A Decentralized Training Algorithm for Echo State Networks in Distributed Big Data Applications", Neural Networks, August 2015.
http://dx.doi.org/10.1016/j.neunet.2015.07.006

Jemal H. Abawajy, Andrei Kelarev, and Morshed Chowdhury, "Large Iterative Multitier Ensemble Classifiers for Security of Big Data", IEEE Transactions on Emerging Topics in Computing, vol. 2, no. 3, pp.352 - 363, April 2014.
http://dx.doi.org/10.1109/tetc.2014.2316510

Junchang Xin, Zhiqiong Wang, Luxuan Qu, Guoren Wang, "Elastic extreme learning machine for big data classification", Neurocomputing, vol. 149, pp. 464–471, 2015.
http://dx.doi.org/10.1016/j.neucom.2013.09.075

Reshma C. Bhagat, Sachin S. Patil, "Enhanced SMOTE Algorithm for Classification of Imbalanced Big-Data using Random Forest", in proceedings of IEEE International on Advance Computing Conference (IACC), pp. 403 - 408, 2015.
http://dx.doi.org/10.1109/iadcc.2015.7154739

I. Triguero, M. Galar, S. Vluymans, C. Cornelis, H. Bustince, F. Herrera and Y. Saeys, "Evolutionary Undersampling for Imbalanced Big Data Classification", Evolutionary Computation, Vol. 17, No. 3, pp. 275-306, 2009.
http://dx.doi.org/10.1109/cec.2015.7256961

Hazewinkel, Michiel, "Arithmetic series", Encyclopedia of Mathematics, 2001.
http://dx.doi.org/10.1016/s0012-365x(00)00470-2

Garren, Steven T, "Maximum likelihood estimation of the correlation coefficient in a bivariate normal model with missing data". Statistics & Probability Letters, vol. 38, no. 3, pp. 281–288, 1998.
http://dx.doi.org/10.1016/s0167-7152(98)00035-2

UCI Machine Learning Repository from "http://archive.ics.uci.edu/ml/".

Refbacks

There are currently no refbacks.

Username
Password
Remember me