Efﬁcient Processing of Job by Enhancing Hadoop Map Reduce Framework Using Containers

Shweta V. H. Chaudhari; R. M. Wahul

doi:10.15866/irecap.v7i6.13604

Efﬁcient Processing of Job by Enhancing Hadoop Map Reduce Framework Using Containers

Shweta V. H. Chaudhari^(1*), R. M. Wahul⁽²⁾

^(*) Corresponding author

Authors' affiliations

DOI: https://doi.org/10.15866/irecap.v7i6.13604

Abstract

Energy efficiency is mainly considered as a primary reason for migrating to Cloud environment. Hadoop is not only a Big Data processing system but also a storage system. Currently, huge effort has been invested to enhance Hadoop performance by considering different factors like job scheduling, improving HDFS mechanism, data locality, job execution time, system memory, data caching. But these experiments are seen less effective to enhance the Hadoop performance. The objective of this work is to focus on the network latency of Hadoop to enhance its performance by using the docker technique. The Docker is an open stage for delivery, creating and running applications. It is able to bundle and to package and run an application in a loosely isolated environment called a container. In our study, containers are running the Hadoop framework in a private Cloud. The workload is distributed among the containers in a single system. The workload depends on distributing HDFS Data. The goal is to enhance the performance of MapReduce framework of Hadoop by considering factors like the number of reading operations, the number of writing operations and CPU processing time using the different job with different data set size. In the proposed experiment, the same job with different size of the dataset was used.
Copyright © 2017 Praise Worthy Prize - All rights reserved.

Keywords

Bigdata; Cloud Computing; Hadoop; HDFS; Containers; Distributing Computing

Full Text:

PDF

References

Ishwarappa, Anuradha J., A brief Introduction on big data 5v’s characteristics and Hadoop technology International Conference on Intelligent Computing and Communication and convergence, Vol.48, n. 04, pp-319-324,2015.
http://dx.doi.org/10.1016/j.procs.2015.04.188

Mike Loukides and Meghan Blanchette, Hadoop: The Definitive Guide, (Fourth Edition, O’Reilly Media, 2015).

Mike Loukides and Meghan Blanchette, Hadoop: The Definitive Guide, Fourth Edition (United States of America, O’Reilly Media, 2015, 19-40).

B. B. Rad, H. J.Bhatti, M.Amhadi, An Introduction to Docker and Analysis of its Performance., In IJCSNS International Journal of Computer Science and Network Security, Vol.17, n. 03, March, pp. 228-235, 2017.
http://dx.doi.org/10.7763/ijcte.2017.v9.1154

Xueyuan Wang, Brian Lee, Yuansong Qiao, Experimental Evaluation of Memory Configurations of Hadoop in Docker Environments , In Signals and Systems Conference (ISSC), pp.1-6, 2016.
http://dx.doi.org/10.1109/issc.2016.7528448

Tasneem Salah; M. Jamal Zemerly, Chan YeobYeun, Mahmoud Al-Qutayri,YousofAl-Hammadi, Performance comparison between container-based and VM-basedservices, In 20th Conference on Innovations in Clouds, Internet and Networks (ICIN), pp.185-190, 2017.
http://dx.doi.org/10.1109/icin.2017.7899408

Yuansong Qiao, Xueyuan Wang, Guiming Fang, Brian Lee. Doopnet: An Emulator for Network Performance Analysis of Hadoop Clusters Using Docker and Mininet, In 2016 IEEE Symposium on Computers and Communication (ISCC),pp. 784-790, 2016.
http://dx.doi.org/10.1109/iscc.2016.7543832

H. Alshammari, J. Lee, and H. Bajwa,”H2Hadoop: Improving Hadoop Performance using the Metadata of Related Jobs,” IEEE Transactions on Cloud Computing, Vol. P, n.99, pp.1-14,2016. DOI:10.1109/TCC.2016.2535261.
http://dx.doi.org/10.1109/tcc.2016.2535261

Rajkumar Buyya, Introduction to the IEEE Transactions on Cloud Computing, In IEEE Transactions on Cloud Computing, Vol. 1, n.1, pp. 3-21, 2013.
http://dx.doi.org/10.1109/tcc.2013.13

R. Gu, X. Yang, J. Yan, Y. Sun, B. Wang, C. Yuan, et al., SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters , In Journal of Parallel and Distributed Computing, vol. 7, n. 03, pp. 2166-2179, 2014.
http://dx.doi.org/10.1016/j.jpdc.2013.10.003

Kun, D. Dong, Z. Xuehai, S. Mingming, L. Changlong, and Z. Hang, Unbinds data and tasks to improving the Hadoop performance, In 15th IEEE/ACIS International Conference on Software Engineering, Artiﬁcial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp.1-6, 2014.
http://dx.doi.org/10.1109/snpd.2014.6888710

M. Hammoud and M. F. Sakr, Locality-Aware Reduce Task Scheduling for MapReduce, IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 570-576, 2011.
http://dx.doi.org/10.1109/cloudcom.2011.87

J. B. Buck, N. Watkins, J. LeFevre, K. Ioannidou, C. Maltzahn, N. Poly-zotis, et al., SciHadoop: Array-based query processing in Hadoop, InInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1-11, 2011.
http://dx.doi.org/10.1145/2063384.2063473

S. Leo, F. Santoni, and G. Zanetti, Biodoop: Bioinformatics on Hadoop, Parallel Processing Workshops, InInternational Conference on Parallel Processing Workshops, pp. 415-422, 2009.
http://dx.doi.org/10.1109/icppw.2009.37

B. Palanisamy, A. Singh, L. Liu, and B. Jain, Purlieus: locality-aware resource allocation for MapReduce in a cloud Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 58-71, 2011.
http://dx.doi.org/10.1145/2063384.2063462

C. Vorapongkitipun and N. Nupairoj, Improving performance of small-ﬁle accessing in Hadoop, In 11th International Joint Conference on Computer Science and Software Engineering (JCSSE),pp. 200-205, 2014.
http://dx.doi.org/10.1109/jcsse.2014.6841867

M. Ishii, J. Han, and H. Makino, Design and performance evaluation for hadoop clusters on virtualized environment, In International Conference on Information Networking (ICOIN), pp. 244-249, 2013.
http://dx.doi.org/10.1109/icoin.2013.6496384

T. Jian, M. Shicong, M. Xiaoqiao, and Z. Li, Improving ReduceTask data locality for sequential MapReduce jobs INFOCOM Proceedings IEEE, pp. 1627-1635, 2013.
http://dx.doi.org/10.1109/infcom.2013.6566959

ZQi, L. Ling, L. Kisung, Z. Yang, A. Singh, N. Mandagere, et al. , Improving Hadoop Service Provisioning in a Geographically Distributed Cloud IEEE 7th International Conference on Cloud Computing (CLOUD), pp. 432-439, 2014.
http://dx.doi.org/10.1109/cloud.2014.65

Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica, Improving MapReduce Performance in Heterogeneous Environments, In Proceeding OSDI' 08 Proceedings of the 8th USENIX conference on Operating systems design and implementation, Vol.8, n. 07, pp. 29-42, 2008.
http://dx.doi.org/10.1145/238721

J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, et al., Improving mapreduce performance through data placement in heterogeneous hadoop clusters , In IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW),pp. 1-9, 2010.
http://dx.doi.org/10.1109/ipdpsw.2010.5470880

I. Elghandour and A. Aboulnaga, ReStore: reusing results of MapReducejobs, In Proceedings of the VLDB Endowment (PVLDB), Vol. 5, n. 06, pp. 586-597, 2012.
http://dx.doi.org/10.14778/2168651.2168659

Tiwari, N., et al., Classiﬁcation Framework of MapReduce Scheduling Algorithms, In Journal ACM Computing Surveys (CSUR), Vol. 47, n .03: pp. 1-38, 2015.
http://dx.doi.org/10.1145/2693315

Zaharia, M., et al., Job scheduling for multi-user MapReduce clusters.In EECS Department, University of California, Berkeley, Vol. 55, pp. 1-16, 2009.
http://dx.doi.org/10.1109/paap.2011.33

Gu, R., et al., SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters, In Journal of Parallel and Distributed Computing, Vol. 74, n. 03, pp 2166-2179,2014
http://dx.doi.org/10.1016/j.jpdc.2013.10.003

Jinshuang, Y., et al., Performance Optimization for Short MapReduce Job Execution in Hadoop, In Cloud and Green Computing (CGC), Second International Conference on. 2012, pp. 688-694, 2012.
http://dx.doi.org/10.1109/cgc.2012.40

Apache, Centralized Cache Management in HDFS. InApache Software Foundation, 2014.

Zhang, J., et al., A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services, In Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing, pp. 12-21, 2012.
http://dx.doi.org/10.1109/grid.2012.17

Longbin, L., et al, ShmStreaming: A Shared Memory Approach for Improving Hadoop Streaming Performance, In Advanced Information Networking and Applications (AINA), 2013 EEE 27th International Conference, pp.1-9, 2013.
http://dx.doi.org/10.1109/aina.2013.90

Ali, M., Kumar, J., Design and Implementation of HDPS Scheduler in Hadoop Over Rackspace Cloud Server for Better Management of Data in Heterogeneous Networks, (2014) International Review on Computers and Software (IRECOS), 9 (6), pp. 1043-1048.

Luo, T., Yuan, W., Deng, P., Zhang, Y., Chen, G., A Hybrid System of Hadoop and DBMS for Earthquake Precursor Application, (2013) International Review on Computers and Software (IRECOS), 8 (2), pp. 462-467.

Sneha, Y., Thampi, N., Mahadevan, G., A Comparative and Performance Analysis Of Similarity Metrics In Recommneder System Based On Hadoop Framework, (2014) International Journal on Information Technology (IREIT), 2 (3), pp. 87-94.

Khoshkholghi, M., Abdullah, A., Subramaniam, S., Othman, M., Derahman, M., A Taxonomy and Survey of Power Management Strategies in Cloud Data Centers, (2016) International Journal on Communications Antenna and Propagation (IRECAP), 6 (5), pp. 305-317.
http://dx.doi.org/10.15866/irecap.v6i5.10561

Rammohan, N., Baburaj, E., Genetic Clustering with Workload Multi-task Scheduler in Cloud Environment, (2014) International Journal on Communications Antenna and Propagation (IRECAP), 4 (3), pp. 77-86.

Refbacks

There are currently no refbacks.

Username
Password
Remember me