An Efficient Approach of Extracting Frequent Itemsets from Large Data Using HDFS Framework

Prajakta G. Kulkarni; S. R. Khonde

doi:10.15866/irecap.v7i6.13354

An Efficient Approach of Extracting Frequent Itemsets from Large Data Using HDFS Framework

Prajakta G. Kulkarni^(1*), S. R. Khonde⁽²⁾

^(*) Corresponding author

Authors' affiliations

DOI: https://doi.org/10.15866/irecap.v7i6.13354

Abstract

Frequent itemsets extraction is very important in various data mining applications. It attempts to extract interesting patterns from given databases like association rules, correlations and clusters. It is difficult to calculate the frequent itemsets having a good speed from the available database. There are various algorithms to find out frequent itemsets like Apriori, FP growth algorithms, etc. Unfortunately, these algorithms fail in extracting interesting items, when it comes across excessive data. In the distributing environment, there is not only a need to automatically parallelize, but also to balance workloads well, which is also not possible with these algorithms. To defeat these disadvantages, there is a need to implement an algorithm supporting the missing elements, like automatic parallelization and workload balancing. This paper proposes a new algorithm for the extraction of frequent itemsets using Hadoop and MapReduce paradigms. The proposed algorithm is based on Modified Apriori algorithm, named as Frequent Itemset Mining using Modified Apriori (FIMMA). In this method, mappers will work independently and concurrently using the hashing technique for large databases; databases will distribute the number of mappers and the result will be given to the reducers. The reducers will give the final result showing the most frequent itemsets.
Copyright © 2017 Praise Worthy Prize - All rights reserved.

Keywords

Association Rules; Frequent Item Sets; Load Balancing; Mapreduce; Modified Apriori; FIMMA

Full Text:

PDF

References

Y. Xun, Jifu Zhang, and Xiao Qin, FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce, IEEE Transactions on Systems, MAN, And Cybernetics: Systems, Vol. 46, n. 3, pp-313-325, 2016.
http://dx.doi.org/10.1109/tsmc.2015.2437327

G. S Pole, P. Gera, A Recent Study of Emerging Tools and Technologies Boosting Big Data Analytics in Innovations in Computer Science and Engineering, Advances in Intelligent Systems and Computing 413, © Springer Science+Business Media Singapore 2016.
http://dx.doi.org/10.1007/978-981-10-0419-3_4

R. Agrawal, T. Imieli ́nski, and A. Swami, Mining association rules between sets of items in large databases, ACM SIGMOD Rec., vol.22, n. 2, pp. 207–216, 1993.
http://dx.doi.org/10.1145/170036.170072

R. Agrawal and J. Shafer. Parallel mining of association rules, IEEE Trans. Knowl. Data Eng. Vol.8, n. 6, pp.962-969, 1996.
http://dx.doi.org/10.1109/69.553164

J. S. Park, M.-S. Chen, and P. S. Yu, Using a hash-based method with transaction trimming for mining association rules, IEEE Trans. Knowl. Data Eng., Vol. 9, n. 5, pp. 813–825, Sep./Oct. 1997.
http://dx.doi.org/10.1109/69.634757

L. Zhou et al., Balanced parallel FP-growth with MapReduce,” in Proc. IEEE Youth Conf. Inf. Compute. Telecommun. (YC-ICT), Beijing, China, 2010, pp.243–246.
http://dx.doi.org/10.1109/ycict.2010.5713090

Y.-J. Tsay, T.-J. Hsu, and J.-R. Yu, “FIUT: A new method for mining frequent itemsets, Information Science, Vol. 179, n. 11, pp. 1724–1737, 2009.
http://dx.doi.org/10.1016/j.ins.2009.01.010

K. Chavan, P. Kulkarni, P. Ghodekar, S. N. Patil, Frequent itemset mining for Big data, IEEE, 2015 Internatinal Conference on Green Computing and Internet of Things (ICGCIoT), pp. 1365–1368, 2015.
http://dx.doi.org/10.1109/icgciot.2015.7380679

M. Riondato, J. A. DeBrabant, R. Fonseca and E. Upfal, PARMA: A parallel randomized algorithm for approximate association rules mining in MapReduce, in Proc. 21st ACM Int Conf. Inf. Knowl. Manage., Maui, HI, USA, 2012, pp. 85–94 .
http://dx.doi.org/10.1145/2396761.2396776

D. Chen et al., Tree partition based parallel frequent pattern mining on shared memory systems, in Proc. 20th IEEE Int. Parallel Distrib.Process. Symp. (IPDPS), Rhodes Island, Greece, pp. 1–8, 2006.
http://dx.doi.org/10.1109/ipdps.2006.1639620

Y. Xun, Jifu Zhang, Xiao Qin, FiDoop-Dp: Data Partitioning in Frequent Itemset Mining on Hadoop clusters, IEEE Transactions on Parallel and Distributed Systems, Vol. 28, n. 1, pp. 101-113, 2016.
http://dx.doi.org/10.1109/tpds.2016.2560176

S. Deshpande, H. Pawar, A Chandras, A Langhe, Data Partitioning in Frequent Itemset Mining on Hadoop Clusters- International Research Journal of Engineering and Technology (IRJET), Vol. 3, n. 11, pp. 1231-1234, 2016.
http://dx.doi.org/10.26483/ijarcs.v8i7.4499

G. S. Pole, M. A. Potey, A Highly Efficient Distributed Indexing System Based on Large Cluster of Commodity Machines, IEEE, Ninth International conference WOCN 2012, held on 2012 Sept. 2012.
http://dx.doi.org/10.1109/wocn.2012.6335562

L. Liu, E. Li, Y. Zhang, and Z. Tang, Optimization of frequent itemset mining on multiple-core processor, in Proc. 33rd Int. Conf. Very Large Data Bases, Vienna, Austria, pp. 1275–1285, 2007.
http://dx.doi.org/10.1109/dmo.2011.5976503

L. Padmavathy, V. Umarani, An Efficient Association Rule Mining Using the H-BIT Array Hashing Algorithm, International Journal of Advanced Research in Computer Science and Software Engineering, Vol. 3, n. 1, pp. 410-419, 2013.
http://dx.doi.org/10.1109/iccee.2009.211

M. Krishnamurthy, A. Kannan, R. Baskaran, R. Deepalakshmi, Frequent Item set Generation Using Hashing-Quadratic Probing Technique, European Journal of Scientific Research, vol.50 n. 4, pp. 523-532, 2011.
http://dx.doi.org/10.1016/j.proeng.2012.06.181

S. A. Ozel, H. A. Guvenir, An Algorithm for Mining Association Rules Using Perfect Hashing and Database Pruning, In 10th TurkishSymposium on Artificial Intelligence and Neural Networks, Gazimagusa, T.R.N.C., A. Acan, I. Aybay, and M. Salamah, Eds. Springer, Berlin, Germany, pp.257–264, 1998.
http://dx.doi.org/10.1109/tai.2002.1180787

G. Grahne, Jianfei Zhu, Fast Algorithms for Frequent Itemset Mining Using FP-Trees, IEEE Transactions on Knowledge and Data Engg.Vol. 17, n. 10, October 2005.
http://dx.doi.org/10.1109/tkde.2005.166

J. Agarwal, A. Singh, Frequent Item set generation based onTransaction Hashing, IEEE 5th International Conference onfluence, 2014, held on 25-26 Sept.978-1-4799-4236-7/14/
http://dx.doi.org/10.1109/confluence.2014.6949340

Jaisankar, N., Marudhamuthu, K., Kousalya, S., Kannan, A., Frequent Itemsets Generation using Efficient Utility Mining Algorithm, (2014) International Review on Computers and Software (IRECOS), 9 (6), pp. 1049-1054.

Sasikala, D., Premalatha, K., XML Document Classification by Frequent Itemset Mining on Menonym Tree, (2014) International Review on Computers and Software (IRECOS), 9 (5), pp. 892-899.

Refbacks

There are currently no refbacks.

Username
Password
Remember me