MKS-MRF: a Multiple Kernel and a Swarm-Based Map Reduce Framework for Big Data Clustering

Omkaresh Kulkarni; Sudarson Jena

doi:10.15866/irecos.v11i11.10117

MKS-MRF: a Multiple Kernel and a Swarm-Based Map Reduce Framework for Big Data Clustering

Omkaresh Kulkarni^(1*), Sudarson Jena⁽²⁾

^(*) Corresponding author

Authors' affiliations

DOI: https://doi.org/10.15866/irecos.v11i11.10117

Abstract

Due to ever increasing size of datasets with an extremely high number of attributes, mining of datasets turns out to be a universal scenario in various fields such as, industry, social and scientific areas. In order to search with very large databases, clustering is an important tasks required in various applications. Moreover, clustering algorithms that provide better scaling to very big data are significant and useful. When the size of the data become too big, it is unfeasible to store it in a single computer memory and also it consume more time for entire kernel matrix computation. In order to alleviate this problem, we propose a multiple kernel and a swarm-based MapReduce framework for big data clustering to improve the clustering accuracy. To map the input data, we implement a new clustering algorithm using tangential kernel and spherical kernel, which is used for calculating the relevant cluster centroid. Then, particle swarm based optimization (PSO) is utilized in reducer function, where the fitness value is calculated using Davis-Bouldin (DB) index. The proposed multi kernel swarm based MapReduce framework is experimented with skin and localization data sets. It can be concluded that, the proposed MKS-MRF achieves higher clustering accuracy of 81% and 85% for localization and skin dataset respectively, as compared with other clustering algorithms.
Copyright © 2016 Praise Worthy Prize - All rights reserved.

Keywords

Particle Swarm Based Optimization; Multiple Kernel; Mapreduce Framework; Clustering Algorithm; Fuzzy C-Means

Full Text:

PDF

References

T. Bengtsson, P. Bickel, and B. Li, “Curse-of-dimensionality revisited: Collapse of the particle filter in very large scale systems,” in Probability and Statistics: Essays in Honour of David A. Freedman, vol. 2, pp. 316–334, 2008.
http://dx.doi.org/10.1214/193940307000000518

M. Jordan, “On statistics, computation and scalability,” on statics computation and scalabity, vol. 19, no. 4, pp. 1378– 1390, 2013.
http://dx.doi.org/10.3150/12-bejsp17

A. Cuzzocrea, Y. Song, K.C. Davis, "Analytics over large-scale multidimensional data: the big data revolution” in Proceedings of DOLAP, pp. 101–104, 2011.
http://dx.doi.org/10.1145/2064676.2064695

Nikolaos Tsapanos, Anastasios Tefas, Nikolaos Nikolaidis, Alexandros Iosifidis, and Ioannis Pitas, "Fast Kernel Matrix Computation for Big Data Clustering", in Proceedings of International Conference On Computational Science, vol. 51, pp. 2445–2452, 2015.
http://dx.doi.org/10.1016/j.procs.2015.05.352

B. Salam, F. Salman, S. Sayn, and M. Trkay, “A mixed-integer programming approach to the clustering problem with an application in customer segmentation”, European Journal of Operational Research, vol. 173, no. 3, pp. 866–879, 2006.
http://dx.doi.org/10.1016/j.ejor.2005.04.048

D. Beeferman and A. Berger, “Agglomerative clustering of a search engine query log”, In Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data mining, pp. 407–416, 2000.
http://dx.doi.org/10.1145/347090.347176

A. Shepitsen, J. Gemmell, B. Mobasher, and R. Burke, “Personalized recommendation in social tagging systems using hierarchical clustering”, In Proceedings of the ACM conference on Recommender systems, pp. 259–266, 2008.
http://dx.doi.org/10.1145/1454008.1454048

E. Fowlkes and C. Mallows, “A method for comparing two hierarchical clustering”, Journal of the American Statistical Association, vol. 78, no. 383, pp. 553–569, 1983.
http://dx.doi.org/10.1080/01621459.1983.10478008

J. MacQueen,”Some methods for classification and analysis of multivariate observations”, In Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297, 1967.
http://dx.doi.org/10.1002/0471271357.ch9

M. Ester, H. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise”, In Proceedings of the second International Conference on Knowledge Discovery and Data Mining, pp. 226–231, 1996.
http://dx.doi.org/10.1109/icde.1998.655795

J. Han, M. Kamber, and J. Pei, “Data Mining: Concepts and Techniques”, Morgan Kaufmann Publishers, Waltham, Mass, USA, 3rd edition, 2011.
http://dx.doi.org/10.1016/b978-0-12-381479-1.00007-1

D. Pelleg and A. Moore, “X-means: extending K-means with efficient estimation of the number of clusters,” in Proceedings of the 17th International Conference on Machine Learning, pp. 727–734, June 2000.
http://dx.doi.org/10.1109/icmla.2011.67

J. Dean and S. Ghemawat., “MapReduce: Simpilifed Data Processing on Large Clusters”, In Communications of ACM, vol. 51, no. 1, pp.107-113, 2008.
http://dx.doi.org/10.1145/1327452.1327492

Vadivel.M, Raghunath.V, "Enhancing Map-Reduce Framework for Bigdata with Hierarchical Clustering", Innovative Research in Computer and Communication Engineering, vol.2, no. 1, March 2014.
http://dx.doi.org/10.17148/ijarcce.2014.31248

Long Chen, C. L. Philip Chen, Mingzhu Lu, "A Multiple-Kernel Fuzzy C-Means Algorithm for Image Segmentation", IEEE Transactions on Systems, Man, and Cybernetics, vol. 41, no. 5, pp. 1263-1274, October 2011.
http://dx.doi.org/10.1109/tsmcb.2011.2124455

Mitchell Yuwono, Steven W. Su, Bruce D. Moulton, and Hung T. Nguyen, "Data Clustering Using Variants of Rapid Centroid Estimation", IEEE Transactions on Evolutionary Computation, vol. 18, no. 3, pp. 366-377, June 2014.
http://dx.doi.org/10.1109/tevc.2013.2281545

Timothy C. Havens, James C. Bezdek, Christopher Leckie, Lawrence O. Hall, and Marimuthu Palaniswami, "Fuzzy c-Means Algorithms for Very Large Data", IEEE transactions on fuzzy systems, vol. 20, no. 6, pp. 1130-1146, December 2012.
http://dx.doi.org/10.1109/tfuzz.2012.2201485

Panagiotis A. Traganitis, Konstantinos Slavakis, Georgios B. Giannakis, "Sketch and Validate for Big Data Clustering ", IEEE Journal of Selected Topics in Signal Processing, vol. 9, no. 4, pp. 678 - 690, 2015.
http://dx.doi.org/10.1109/jstsp.2015.2396477

Weizhong Yan, Umang Brahmakshatriya, Ya Xue, Mark Gilder, Bowden Wise, "p-PIC: Parallel power iteration clustering for big data", Parallel and Distributed Computing, vol. 73, pp. 352–359, 2013.
http://dx.doi.org/10.1016/j.jpdc.2012.06.009

Hongfu Liu, Gong Cheng, Junjie Wu, "Consensus Clustering on Big Data", in proceedings of International Conference on Service Systems and Service Management (ICSSSM), pp. 1 - 6, 2015.
http://dx.doi.org/10.1109/icsssm.2015.7170344

Dawen Xia, Binfeng Wang, Yantao Li, Zhuobo Rong and Zili Zhang, "An Efficient MapReduce-Based Parallel Clustering Algorithm for Distributed Traffic Subarea Division", Discrete Dynamics in Nature and Society, Article ID 793010, 18 pages, 2015.
http://dx.doi.org/10.1155/2015/793010

Sergej Fries, Stephan Wels, Thomas Seidl, "Projected Clustering for Huge Data Sets in MapReduce", in proceedings of International Conference on Extending Database Technology (EDBT),pp. 49-60, Athens, Greece, 2014.
http://dx.doi.org/10.1109/icde.2014.6816701

Davies, David L.; Bouldin, Donald W. (1979). "A Cluster Separation Measure". IEEE Transactions on Pattern Analysis and Machine Intelligence. PAMI-1 (2): 224–227.
http://dx.doi.org/10.1109/tpami.1979.4766909

UC Irvine Machine Learning Repository from "http://archive.ics.uci.edu/ml/datasets.html".
http://dx.doi.org/10.7554/elife.16777.018

A. Da Silva, Y. Lechevallier, F. de Carvalho, “Analyzing Distance Measures for Symbolic Data Based on Fuzzy Clustering”, In proceedings of seventh international conference on intelligent system design and applications, pp. 109-114, 2007.
http://dx.doi.org/10.1109/isda.2007.4389594

Jin Yuan, Tao Yu, Wang, Kesheng, Xuemei Liu,” Adaptive spherical Gaussian kernel for fast relevance vector machine regression”, In Proceedings of intelligent control and automation, pp. 2071-2076, 2008.
http://dx.doi.org/10.1109/wcica.2008.4593243

Xiao Wang, Xiao-fang Liu, “Neighbor sample membership weighted KFCM algorithm for remote sensing image classification”, In Proceedings of Wavelet Active Media Technology and Information Processing, pp. 12-15, 2012.
http://dx.doi.org/10.1109/icwamtip.2012.6413428

Refbacks

There are currently no refbacks.

Username
Password
Remember me