Open Access Open Access  Restricted Access Subscription or Fee Access

Efficient Processing of Job by Enhancing Hadoop Map Reduce Framework Using Containers

(*) Corresponding author

Authors' affiliations



Energy efficiency is mainly considered as a primary reason for migrating to Cloud environment. Hadoop is not only a Big Data processing system but also a storage system. Currently, huge effort has been invested to enhance Hadoop performance by considering different factors like job scheduling, improving HDFS mechanism, data locality, job execution time, system memory, data caching. But these experiments are seen less effective to enhance the Hadoop performance. The objective of this work is to focus on the network latency of Hadoop to enhance its performance by using the docker technique. The Docker is an open stage for delivery, creating and running applications. It is able to bundle and to package and run an application in a loosely isolated environment called a container. In our study, containers are running the Hadoop framework in a private Cloud. The workload is distributed among the containers in a single system. The workload depends on distributing HDFS Data. The goal is to enhance the performance of MapReduce framework of Hadoop by considering factors like the number of reading operations, the number of writing operations and CPU processing time using the different job with different data set size. In the proposed experiment, the same job with different size of the dataset was used.
Copyright © 2017 Praise Worthy Prize - All rights reserved.


Bigdata; Cloud Computing; Hadoop; HDFS; Containers; Distributing Computing

Full Text:



Ishwarappa, Anuradha J., A brief Introduction on big data 5v’s characteristics and Hadoop technology International Conference on Intelligent Computing and Communication and convergence, Vol.48, n. 04, pp-319-324,2015.

Mike Loukides and Meghan Blanchette, Hadoop: The Definitive Guide, (Fourth Edition, O’Reilly Media, 2015).

Mike Loukides and Meghan Blanchette, Hadoop: The Definitive Guide, Fourth Edition (United States of America, O’Reilly Media, 2015, 19-40).

B. B. Rad, H. J.Bhatti, M.Amhadi, An Introduction to Docker and Analysis of its Performance., In IJCSNS International Journal of Computer Science and Network Security, Vol.17, n. 03, March, pp. 228-235, 2017.

Xueyuan Wang, Brian Lee, Yuansong Qiao, Experimental Evaluation of Memory Configurations of Hadoop in Docker Environments , In Signals and Systems Conference (ISSC), pp.1-6, 2016.

Tasneem Salah; M. Jamal Zemerly, Chan YeobYeun, Mahmoud Al-Qutayri,YousofAl-Hammadi, Performance comparison between container-based and VM-basedservices, In 20th Conference on Innovations in Clouds, Internet and Networks (ICIN), pp.185-190, 2017.

Yuansong Qiao, Xueyuan Wang, Guiming Fang, Brian Lee. Doopnet: An Emulator for Network Performance Analysis of Hadoop Clusters Using Docker and Mininet, In 2016 IEEE Symposium on Computers and Communication (ISCC),pp. 784-790, 2016.

H. Alshammari, J. Lee, and H. Bajwa,”H2Hadoop: Improving Hadoop Performance using the Metadata of Related Jobs,” IEEE Transactions on Cloud Computing, Vol. P, n.99, pp.1-14,2016. DOI:10.1109/TCC.2016.2535261.

Rajkumar Buyya, Introduction to the IEEE Transactions on Cloud Computing, In IEEE Transactions on Cloud Computing, Vol. 1, n.1, pp. 3-21, 2013.

R. Gu, X. Yang, J. Yan, Y. Sun, B. Wang, C. Yuan, et al., SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters , In Journal of Parallel and Distributed Computing, vol. 7, n. 03, pp. 2166-2179, 2014.

Kun, D. Dong, Z. Xuehai, S. Mingming, L. Changlong, and Z. Hang, Unbinds data and tasks to improving the Hadoop performance, In 15th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp.1-6, 2014.

M. Hammoud and M. F. Sakr, Locality-Aware Reduce Task Scheduling for MapReduce, IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 570-576, 2011.

J. B. Buck, N. Watkins, J. LeFevre, K. Ioannidou, C. Maltzahn, N. Poly-zotis, et al., SciHadoop: Array-based query processing in Hadoop, InInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), pp. 1-11, 2011.

S. Leo, F. Santoni, and G. Zanetti, Biodoop: Bioinformatics on Hadoop, Parallel Processing Workshops, InInternational Conference on Parallel Processing Workshops, pp. 415-422, 2009.

B. Palanisamy, A. Singh, L. Liu, and B. Jain, Purlieus: locality-aware resource allocation for MapReduce in a cloud Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 58-71, 2011.

C. Vorapongkitipun and N. Nupairoj, Improving performance of small-file accessing in Hadoop, In 11th International Joint Conference on Computer Science and Software Engineering (JCSSE),pp. 200-205, 2014.

M. Ishii, J. Han, and H. Makino, Design and performance evaluation for hadoop clusters on virtualized environment, In International Conference on Information Networking (ICOIN), pp. 244-249, 2013.

T. Jian, M. Shicong, M. Xiaoqiao, and Z. Li, Improving ReduceTask data locality for sequential MapReduce jobs INFOCOM Proceedings IEEE, pp. 1627-1635, 2013.

ZQi, L. Ling, L. Kisung, Z. Yang, A. Singh, N. Mandagere, et al. , Improving Hadoop Service Provisioning in a Geographically Distributed Cloud IEEE 7th International Conference on Cloud Computing (CLOUD), pp. 432-439, 2014.

Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica, Improving MapReduce Performance in Heterogeneous Environments, In Proceeding OSDI' 08 Proceedings of the 8th USENIX conference on Operating systems design and implementation, Vol.8, n. 07, pp. 29-42, 2008.

J. Xie, S. Yin, X. Ruan, Z. Ding, Y. Tian, J. Majors, et al., Improving mapreduce performance through data placement in heterogeneous hadoop clusters , In IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW),pp. 1-9, 2010.

I. Elghandour and A. Aboulnaga, ReStore: reusing results of MapReducejobs, In Proceedings of the VLDB Endowment (PVLDB), Vol. 5, n. 06, pp. 586-597, 2012.

Tiwari, N., et al., Classification Framework of MapReduce Scheduling Algorithms, In Journal ACM Computing Surveys (CSUR), Vol. 47, n .03: pp. 1-38, 2015.

Zaharia, M., et al., Job scheduling for multi-user MapReduce clusters.In EECS Department, University of California, Berkeley, Vol. 55, pp. 1-16, 2009.

Gu, R., et al., SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters, In Journal of Parallel and Distributed Computing, Vol. 74, n. 03, pp 2166-2179,2014

Jinshuang, Y., et al., Performance Optimization for Short MapReduce Job Execution in Hadoop, In Cloud and Green Computing (CGC), Second International Conference on. 2012, pp. 688-694, 2012.

Apache, Centralized Cache Management in HDFS. InApache Software Foundation, 2014.

Zhang, J., et al., A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services, In Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing, pp. 12-21, 2012.

Longbin, L., et al, ShmStreaming: A Shared Memory Approach for Improving Hadoop Streaming Performance, In Advanced Information Networking and Applications (AINA), 2013 EEE 27th International Conference, pp.1-9, 2013.

Ali, M., Kumar, J., Design and Implementation of HDPS Scheduler in Hadoop Over Rackspace Cloud Server for Better Management of Data in Heterogeneous Networks, (2014) International Review on Computers and Software (IRECOS), 9 (6), pp. 1043-1048.

Luo, T., Yuan, W., Deng, P., Zhang, Y., Chen, G., A Hybrid System of Hadoop and DBMS for Earthquake Precursor Application, (2013) International Review on Computers and Software (IRECOS), 8 (2), pp. 462-467.

Sneha, Y., Thampi, N., Mahadevan, G., A Comparative and Performance Analysis Of Similarity Metrics In Recommneder System Based On Hadoop Framework, (2014) International Journal on Information Technology (IREIT), 2 (3), pp. 87-94.

Khoshkholghi, M., Abdullah, A., Subramaniam, S., Othman, M., Derahman, M., A Taxonomy and Survey of Power Management Strategies in Cloud Data Centers, (2016) International Journal on Communications Antenna and Propagation (IRECAP), 6 (5), pp. 305-317.

Rammohan, N., Baburaj, E., Genetic Clustering with Workload Multi-task Scheduler in Cloud Environment, (2014) International Journal on Communications Antenna and Propagation (IRECAP), 4 (3), pp. 77-86.


  • There are currently no refbacks.

Please send any question about this web site to
Copyright © 2005-2024 Praise Worthy Prize