MMSD: a Metadata-Aware Multi-Tiered Source Deduplication Cloud Backup System in the Personal Computing Environment

Haiyan Meng; Jing Li; Weiqing Liu; Changchun Zhang

doi:10.15866/irecos.v8i2.3123

MMSD: a Metadata-Aware Multi-Tiered Source Deduplication Cloud Backup System in the Personal Computing Environment

Haiyan Meng^(1*), Jing Li⁽²⁾, Weiqing Liu⁽³⁾, Changchun Zhang⁽⁴⁾

^(*) Corresponding author

DOI's assignment:
the author of the article can submit here a request for assignment of a DOI number to this resource!
Cost of the service: euros 10,00 (for a DOI)

Abstract

The deduplication efficiency/overhead ratio of existing source deduplication solutions in cloud backup systems, in despite of their complex deduplication workflows, was not ideal due to their insufficient study of file semantics. In this paper, we present MMSD, a metadata-aware multi-tiered source deduplication cloud backup system in personal computing environment, to obtain an optimal tradeoff between the deduplication efficiency and deduplication metadata storage overhead and finally achieve a shorter backup window than existing approaches. MMSD makes full use of file metadata including file size, type, timestamp, path information, modification frequency, and the advantage of whole file level deduplication compared with chunk level deduplication, and efficiently combines these two deduplication levels. Our experimental results with real world datasets show that, compared with the state-of-art source deduplication methods, MMSD can improve the overall deduplication efficiency from 75% to 91% with only 33.8% of deduplication metadata storage overhead and shorten the backup window by at least 54.2%.
Copyright © 2013 Praise Worthy Prize - All rights reserved.

Keywords

Source Deduplication; File Metadata; Cloud Backup

Full Text:

PDF

References

Kabooza Global Backup Survey about backup habits, risk factors, worries and data loss of home PCs, http://www.kabooza.com/globalsurvey.html

Jena, R.K., Mahanti, P.K., Computing in the cloud: Concept and trends, (2011) International Review on Computers and Software (IRECOS), 6 (1), pp. 1-10.

Li, L., Research on e-commerce security based on cloud computing platform, (2012) International Review on Computers and Software (IRECOS), 7 (6), pp. 3262-3266.

Zhou, X., Miao, F., Ma, H., A cloud storage optimization approach based on genetic algorithm, (2012) International Review on Computers and Software (IRECOS), 7 (3), pp. 1386-1391.

Xiao, P., Yan, G., Liu, D., Qu, X., A reliable-enhanced cloud storage framework for mobile computing environments, (2012) International Review on Computers and Software (IRECOS), 7 (2), pp. 586-592.

Keren Jin , Ethan L. Miller, The effectiveness of deduplication on virtual machine disk images, Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference (Article No. 7 Year of Publication: 2009 ISBN: 978-1-60558-623-6 ).

D. Meister and A. Brinkmann, Multi-Level Comparison of Data Deduplication in a Backup Scenario, Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference (Article No. 8 Year of Publication: 2009 ISBN: 978-1-60558-623-6 ).

EMC Avamar. http://www.emc.com/avamar.

BackupPC. http://backuppc.sourceforge.net/.

Y. Tan, H. Jiang, D. Feng, L. Tian, Z. Yan and G. Zhou, SAM: A Semantic-Aware Multi-tiered Source De-duplication Framework for Cloud Backup, Proceedings of the 39th International Conference on Parallel Processing (Page: 614 Year of Publication: 2010 ISBN: 978-1-4244-7913-9).

Fu, Y., Jiang, H., Xiao, N., Tian, L., Liu, F., AA-Dedupe: An Application-Aware Source Deduplication Approach for Cloud Backup Services in the Personal Computing Environment, Proceedings of the 2011 IEEE International Conference on Cluster Computing (Page: 112 Year of Publication: 2011 ISBN: 978-1-4577-1355-2).

N. Agralwal, W.J. Bolosky, J.R. Dourceur, and J.R. Lorch, A Five-Year Study of File-System Metadata, ACM Transactions on Storage, Vol. 3, n. 3, Article No. 9, 2007.

D.T. Meyer and W.J. Bolosky, A Study of Practical Deduplication, ACM Transactions on Storage, Vol. 7, n. 4, Article No. 14, 2012.

M. Lillibridge, K. Eshghi, D. Bhagwat, V. Deolalikar, G. Trezise, and P. Camble, Sparse indexing: Large scale, inline deduplication using sampling and locality, Proceedings of the 7th conference on File and storage technologies (Page: 111 Year of Publication: 2009 ISBN: 978-1-931971-66-9).

A. T. Clements, I. Ahmad, M. Vilayannur, and J. Li, Decentralized deduplication in SAN cluster file systems, Proceedings of the 2009 conference on USENIX Annual technical conference (Page: 8 Year of Publication: 2009 ISBN: 978-1-931971-68-3).

M. Dutch, Understanding data deduplication ratios, SNIA Data Management Forum, June 2008.

S. Jones, Online De-duplication in a Log-Structured File System for Primary Storage, M.S. Thesis, Dept. Computer Science, University of California Santa Cruz, California, United States, 2010.

Nagapramod Mandagere , Pin Zhou , Mark A Smith , Sandeep Uttamchandani, Demystifying data deduplication, Proceedings of the ACM/IFIP/USENIX Middleware '08 Conference Companion (Page: 12 Year of Publication: 2008 ISBN: 978-1-60558-369-3).

Xiaoyun Wang, Yiqun Yin, and Hongbo Yu. Finding collisions in the full sha-1, Proceedings of Crypto (Page: 17 Year of Publication: 2005 ISBN: 3-540-28114-2).

RABIN, i.O, Fingerprinting by random polynomials, Tech. Rep. TR-15-81, Center for Research in Computing Technology, Harvard Univ., Cambridge, Mass., 1981.

M. Teschner, B. Heidelberger, M. Mueller, D. Pomeranets and M. Gross, Optimized spatial hashing for collision detection of deformable objects, Proceedings of Vision, Modeling, Visualization (Page: 47 Year of Publication: 2003 ISBN: 3-89838-048-3).

B. Zhu, K. Li, and H. Patterson, Avoiding the Disk Bottleneck in the Data Domain Deduplication File System, Proceedings of the 6th USENIX Conference on File and Storage Technologies (Page: 269 Year of Publication: 2008 ISBN: 978-1-93197-156-0).

C. Liu, Y. Lu, C. Shi, G. Lu, D. Du, and D.S. Wang, ADMAD: Application-driven metadata aware de-duplication archival storage systems, Proceedings of the 5th IEEE International Workshop on Storage Network Architecture and Parallel I/Os (Page: 29 Year of Publication: 2008 ISBN: 978-0-7695-3408-4).

Refbacks

There are currently no refbacks.

Username
Password
Remember me