A Technique to Mine Clusters Using Privacy Preserving Data Mining

J. Anitha; R. Rangarajan

doi:10.15866/irecos.v8i10.3502

A Technique to Mine Clusters Using Privacy Preserving Data Mining

^(*) Corresponding author

DOI's assignment:
the author of the article can submit here a request for assignment of a DOI number to this resource!
Cost of the service: euros 10,00 (for a DOI)

Abstract

In recent years privacy preserving data mining problem has gained considerable importance, due to the vast amount of personal data about individuals that are stored at different commercial vendors and organizations. Privacy preserving clustering is not intensively studied as other data mining techniques, such as rule mining, sequence mining, etc. In this paper, we obtain privacy by anonymization, where the data is encrypted from the original data, along with the secure key and the secure key is obtained by the Diffie Hellman key exchange algorithm. In order to perform clustering on the anonymize data Fuzzy C Means clustering algorithm is used. The Fuzzy C means clustering algorithm is suitable for clustering data where the boundaries are ambiguous. However in this paper initially distance matrix is calculated and using which similarity matrix and dissimilarity matrix is formed. Similarity matrix calculates the similarity among the data point with the cluster centroids and dissimilarity matrix calculates the dissimilarity among the data point with the cluster centroids. The membership matrix is constructed from the above matrices is used to cluster the anonymized data. The experimental result of the proposed algorithm is compared with K-Means algorithm in terms of running time, memory usage, and accuracy and it is proved that the efficiency of the proposed algorithm is better in terms of accuracy and execution time
Copyright © 2013 Praise Worthy Prize - All rights reserved.

Keywords

Privacy Preserving; Similarity Matrix; Anonymization; Similarity Matrix; Dissimilarity Matrix; Ciphertext; Data Containers; Third Party

Full Text:

PDF

References

Ali İnan, Selim V. Kaya, Yücel Saygin, Erkay Savas , Ayca A. Hintoglu, Albert Levi,” Privacy Preserving Clustering On Horizontally Partitioned Data,” Data and knowledge engineering , vol. 63, no. 3, pp. 646-666, 2007.

Arik Friedman, Ran Wolff, Assaf Schuster, "Providing k-anonymity in data mining," The International Journal on Very Large Data Bases, vol. 17, no.4, 2008.

R. Agrawal and R. Srikant,” Privacy-preserving data mining,” in proceedings of the ACM SIGMOD conference on management of data, pp. 439-450, 2000.

Y. Lindell and B. Pinkas” Privacy preserving data mining,” In proceedings of CRYPTO, pp. 36-54, 2000.

Jinlong Wang, Congfu Xu, Yunhe Pan,”An Incremental Algorithm for Mining Privacy-Preserving Frequent Itemsets, ” proceedings of international conference on machine learning and cybernetics , pp.1132-1137, 2006.

D. Agrawal and C. C. Aggarwal,”On the Design and Quantification of Privacy Preserving Data Mining Algorithms, ” In Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp.247-255, 2001.

H. Kargupta, S. Datta, Q. Wang,and K. Sivakumar. On the Privacy Preserving Properties of Random Data Perturbation Techniques. In Proceedings of the Third IEEE International Conference on Data Mining, pp. 99-106, 2003.

Li Liu, Murat Kantarcioglu, Bhavani Thuraisingham, “The applicability of the perturbation based privacy preserving data mining for real-world data,” journal of data & knowledge engineering, vol.65, no.1, pp.5-21, 2008.

G. Jagannathan, K. Pillaipakkamnatt, and R.N. Wright, “A New Privacy-Preserving Distributed k-Clustering Algorithm,” in Proceedings of the Sixth SIAM International Conference on Data Mining, 2006.

J. Vaidya, C. Clifton, “Privacy Preserving Association Rule Mining in Vertically Partitioned Data,” Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 639-644, 2002.

Zekeriya Erkin, Thijs Veugen, Tomas Toft, Reginald L. Lagendijk,”Privacy-preserving user clustering in a social network,”Proceedings of IEEE international Workshop on Information Forensics and Security, pp.96-100, 2009.

Geetha Jagannathan and Rebecca N. Wright, “Privacy-preserving distributed k-means clustering over arbitrarily partitioned data,” In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp.593–599, 2005.

P. Bunn and R. Ostrovsky, “Secure two-party k-means clustering,” In Proceedings of the 14th ACM conference on computer and communications security, pp. 486–497, 2007.

W. Du and Z. Zhan,”Building Decision Tree Classifier on Private Data,” In Proceedings of the IEEE ICDM Workshop on Privacy, Security and Data Mining, pp.1-8 2002.

Murat Kantarcioˇglu, Jiashun Jin, and Chris Clifton. When do data mining results violate privacy? In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 599–604, 2004.

Gagan Aggarwal, Tom´as Feder, Krishnaram Kenthapadi, Rajeev Motwani, Rina Panigrahy, Dilys Thomas, and An Zhu,”Approximation algorithms for k-anonymity,” In Journal of Privacy Technology (JOPT), 2005.

Roberto J. Bayardo Jr. and Rakesh Agrawal,”Data privacy through optimal k-anonymization,” InICDE, pp. 217–228, 2005.

Elisa Bertino, Beng Chin Ooi, Yanjiang Yang, and Robert H. Deng. Privacy and ownership preserving of outsourced medical data. InICDE, pages 521–532, 2005.

Kristen LeFevre,David J. DeWitt and Raghu Ramakrishnan, “Mondrian Multidimensional K-Anonymity,” 22nd international conference on data engineering, 2006.

L. Sweeney,”K-anonymity: A model for protecting privacy,” International Journal on Uncertainty, Fuzziness, and Knowledge-based Systems, vol.10, no.5, pp.557–570, 2002.

K. LeFevre, D.DeWitt, and R. Ramakrishnan. “Incognito: Efficient full domain k-anonymity,”In Proceedings of the ACM SIGMOD International Conference on Management of Data, pp.49-60, 2005.

A. Meyerson and R. Williams,”On the complexity of optimal k-anonymity,” InProceedings of the 23rd ACM SIGACT-SIGMOD-SIGART Symposiums on Principles of Database Systems, pp.223-228, 2004.

Slava Kisilevich, Lior Rokach, Yuval Elovici, Bracha Shapira,”Efficient Multi-Dimensional Suppression for K-Anonymity”, IEEE transaction on knowledge and data engineering, vol.22, no.3, pp.334-347, 2010.

Anna Monreale, Gennady Andrienko, Natalia Andrienko, Fosca Giannotti, Dino Pedreschi, Salvatore Rinzivillo, Stefan Wrobel, “Movement Data Anonymity through Generalization,”in journal of Transactions on Data Privacy, vol.3, no.2, 2010.

Geetha jagannathan, krishnan pollaipakkamantt, rebecca N. Wrighg, Daryl Umano,"Communication-Efficient Privacy –Preserving Clustering," journal of transaction on data privacy, vol.3, no.1, 2010.

Jinfei Liu, Jun Luo, Joshua Zhexue Huang, Li Xiong,”Privacy Preserving Distributed DBSCAN Clustering,” join conference of EDBT/ICDT, 2012.

W. Diffie and M. E. Hellman, “New Directions in Cryptography IEEE Transactions on Information Theory”, vol. IT-22, 1976, pp: 644–654, 1976.

Dataset from UCI machine learning repository, http://archive.ics.uci.edu/ml/datasets/EMG+Physical+Action+Data+Set

Zhang, F., Li, Z., Gan, L., Computational soundness of Diffie-Hellman key exchange against active attackers, (2012) International Review on Computers and Software (IRECOS), 7 (7), pp. 3507-3512.

Qian, P., Wu, M., Privacy preserving in data aggregation of WSN, (2012) International Review on Computers and Software (IRECOS), 7 (5), pp. 2489-2494.

Refbacks

There are currently no refbacks.

Username
Password
Remember me