Open Access Open Access  Restricted Access Subscription or Fee Access

New Algorithms for Data Mining on Grid Computing


(*) Corresponding author


Authors' affiliations


DOI: https://doi.org/10.15866/irecos.v11i12.10414

Abstract


Data Mining on the grid is a new challenge which consists of processing a large quantity of data by knowledge discovery methods. New methods which are able to take into consideration all the specificities of data Mining problems as well as those of grid computing (data grid) are needed for a better  exploitation of the treatment of available resources in grid computing. In this paper, we have proposed a method for the extraction of association rules, adapted for parallel and distributed environment such as grids. The goal here is to minimize the execution time by reducing the cost of parallelization and communication. A parallel version of the Partition algorithm for finding association rules has been proposed. The introduction of an intelligent distribution of the base on the grid using clustering to minimize the search space in each machine and consequently reduce the processing time has been considered. To this end, another parallel clustering algorithm for treating numerical and categorical data in a distributed environment has been proposed.
Copyright © 2016 Praise Worthy Prize - All rights reserved.

Keywords


Association Rules; Clustering; Distributed Data Mining; Partition; K-Prototypes; Parallel Algorithm

Full Text:

PDF


References


R. Agrawal and J. C. Shafer, « Parallel mining of association rules», IEEE Transactions on knowledge and Data Engineering, vol. 8, no 6, p. 962–969, 1996.
http://dx.doi.org/10.1109/69.553164

D. Cheung et al., "A Fast Distributed Algorithm for Mining Association Rules," Proc. 4th Int'l Conf. Parallel and Distributed Information Systems, IEEE Computer Soc. Press, Los Alamitos, Calif., pp. 31–42. 1996.
http://dx.doi.org/10.1109/pdis.1996.568665

D. W. Cheung, V. T. Ng, A. W. Fu, et Y. Fu, « Efficient mining of association rules in distributed databases », Knowledge and Data Engineering, IEEE Transactions on, vol. 8, no 6, p. 911–922, 1996.
http://dx.doi.org/10.1109/69.553158

M. Z. Ashrafi, D. Taniar, et K. Smith, « ODAM: An optimized distributed association rule mining algorithm », IEEE distributed systems online, vol. 5, no 3, p. 1–18, 2004.
http://dx.doi.org/10.1109/mdso.2004.1285877

M. J. Zaki, S. Parthasarathy, M. Ogihara, et W. Li, « Parallel algorithms for discovery of association rules », Data Mining and Knowledge Discovery, vol. 1, no 4, p. 343–373, 1997.
http://dx.doi.org/10.1023/A:1009773317876

J. S. Park, M.-S. Chen, et P. S. Yu, « Efficient parallel data mining for association rules », in Proceedings of the Fourth international Conference on information and Knowledge Management, Baltimore, Maryland, United States, pp.31-36, Dec 1995.
http://dx.doi.org/10.1145/221270.221320

D. W. Cheung et Y. Xiao, « Effect of data skewness in parallel mining of association rules », in Research and Development in Knowledge Discovery and Data Mining, Springer, p. 48–60, 1998.
http://dx.doi.org/10.1007/3-540-64383-4_5

H. Li, Y. Wang, D. Zhang, M. Zhang, et E. Y. Chang, « Pfp: parallel fp-growth for query recommendation », Proceedings of the 2008 ACM conference on Recommender systems, 2008, p. 107–114.
http://dx.doi.org/10.1145/1454008.1454027

L. Fenghua « Research on Improved Distributed Association Rules Mining Algorithm in Hadoop Cloud Platform”. The Open Automation and Control Systems Journal, 7, 2273-2279, 2015.
http://dx.doi.org/10.2174/1874444301507012273

P. Kalaivani, D. Kerana Hanirex, K.P. Kaliyamurthie “Association Rules Mining in Vertically Distributed Databases” International Journal of Innovative Research in Computer and Communication Engineering. Vol. 3, Issue 3, March 2015.
http://dx.doi.org/10.17485/ijst/2015/v8i31/89006

N. Marodkar , M.Chaudhari « Mining of Association Rules in Distributed Databases «.International Journal of Science and Research (IJSR). Volume 4 Issue 2, February 2015.
http://dx.doi.org/10.21275/v4i11.sub159080

A.Vasoya, N. Koli, «Mining of Association Rules on Large Database Using Distributed and Parallel Computing ». Proceedings of International Conference on Communication, Computing and Virtualization (ICCCV) 2016. Procedia Computer Science. Volume 79, Pages 221-230, 2016.
http://dx.doi.org/10.1016/j.procs.2016.03.029

Djenouri Youcef, Bendjoudi Ahcène, Djenouri Djamel, Habas Zineb, « Parallel BSO Algorithm for Association Rules Mining Using Master/Worker Paradigm". Chapter Parallel Processing and Applied Mathematics Volume 9573 of the series Lecture Notes in Computer Science pp 258-268. 02 April 2016.
http://dx.doi.org/10.1007/978-3-319-32149-3_25

Zakaria W, Kotb Y, Ghaleb FF."PMCR-Miner: parallel maximal confident association rules miner algorithm for microarray data set". Int J Data Min Bioinform. 13(3):225-47 .2015
http://dx.doi.org/10.1504/ijdmb.2015.072091

Negussie Zadig Serawit. "Parallel Association Rule Algorithm using Hybrid Parallel Computing". International Journal of Science and Research (IJSR). Volume 4 Issue 2, February 2015
http://dx.doi.org/10.15864/ajac.v2i1.94

Vinaya Sawant Ms. and Ketan Shah. "Performance Evaluation of Distributed Association Rule Mining Algorithms". Procedia Computer Science 79 (2016) 127 – 134. Proceedings of International Conference on Communication, Computing and Virtualization (ICCCV). Volume 79, 2016, Pages 127-134, 2016
http://dx.doi.org/10.1016/j.procs.2016.03.017

Hong-Yi Chang, Zih-Huan Hong, Tu-Liang Lin, Wan-Kun Chang, Yi-Ying Lin. "IPARBC: An Improved Parallel Association Rule Based on MapReduce Framework". International Conference on Networking and Network Applications (NaNA), 2016.
http://dx.doi.org/10.1109/nana.2016.78

Nour E. Oweis, Mohamed Mostafa Fouad, Sami R. Oweis, Suhail S. Owais, Vaclav Snasel. "A Novel Mapreduce Lift Association Rule Mining Algorithm (MRLAR) for Big Data". (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 7, No. 3, 2016
http://dx.doi.org/10.14569/ijacsa.2016.070321

I. S. Dhillon et D. S. Modha, « A data-clustering algorithm on distributed memory multiprocessors », in Large-Scale Parallel Data Mining, Springer, p. 245–260, 2000.
http://dx.doi.org/10.1007/3-540-46502-2_13

W. Zhao, H. Ma, et Q. He, « Parallel k-means clustering based on mapreduce », in Cloud Computing, Springer, p. 674–679, 2009.
http://dx.doi.org/10.1007/978-3-642-10665-1_71

Y.-X. Wang, Z.-H. Wang, et X.-M. Li, « A parallel clustering algorithm for categorical data set », in Artificial Intelligence and Soft Computing-ICAISC 2004, Springer, p. 928–933, 2004.
http://dx.doi.org/10.1007/978-3-540-24844-6_144

S. Kantabutra et A. L. Couch, « Parallel K-means clustering algorithm on NOWs », NECTEC Technical journal, vol. 1, no 6, p. 243–247, 2000.
http://dx.doi.org/10.1109/ipdps.2002.1015567

Y.-P. Zhang, J.-Z. Sun, Y. Zhang, et X. Zhang, « Parallel implementation of CLARANS using PVM », Proc. on Machine Learning and Cybernetics, vol. 3, pp. 1646-1649, 2004.
http://dx.doi.org/10.1109/icmlc.2004.1382039

Li, Y., Luo, C. and Chung, S.M." A parallel text document clustering algorithm based on neighbors". Cluster Computin Journal, Volume 18, Issue 2, pp 933–948, June 2015.
http://dx.doi.org/10.1007/s10586-015-0450-z

Xiaoming Gao, Emilio Ferrara, Judy Qiu. "Parallel Clustering of High-Dimensional Social Media Data Streams". Cluster, Cloud and Grid Computing (CCGrid), 15th IEEE/ACM International Symposium 2015.
http://dx.doi.org/10.1109/ccgrid.2015.19

Ahmed M. Fahim. "An Efficient Parallel K-Means On Multi-Core Processors". International Journal of engineering and Technology research (IJSETR), Volume 4, Issue 12, 4234 ISSN: 2278 – 7798 December 2015.
http://dx.doi.org/10.1109/iccet.2010.5485338

Zhenhong Du, Yuhua Gu, Chuanrong Zhang, Feng Zhang, Renyi Liu, Jean Sequeira and Weidong Li. "ParSymG: a parallel clustering approach for unsupervised classification of remotely sensed imagery". International Journal of Digital Earth, 2016.
http://dx.doi.org/10.1080/17538947.2016.1229818

N. Korda, B. Sörényi, S. Li. Distributed Clustering of Linear Bandits in Peer to Peer Networks The International Conference on Machine Learning (ICML), 2016.
http://dx.doi.org/10.1109/icmla.2008.57

A. Savasere, E. R. Omiecinski, et S. B. Navathe, « An efficient algorithm for mining association rules in large databases », Proceeding VLDB ' 95 Proceedings of the 21th International Conference on Very Large Data Bases, pp. 432-444, 1995.
http://dx.doi.org/10.1007/bfb0053471

Z. Huang, « Clustering large data sets with mixed numeric and categorical values » Proceedings of the First Pacific Asia Knowledge Discovery and Data Mining Conference, Singapore: World Scientific, p. 21–34, 1997.
http://dx.doi.org/10.1109/wcica.2004.1342001


Refbacks

  • There are currently no refbacks.



Please send any question about this web site to info@praiseworthyprize.com
Copyright © 2005-2024 Praise Worthy Prize