Extracting Data Quality Rules Using Information Theoretic Measures


(*) Corresponding author


Authors' affiliations


DOI's assignment:
the author of the article can submit here a request for assignment of a DOI number to this resource!
Cost of the service: euros 10,00 (for a DOI)

Abstract


Data quality is of major concern in today’s information world. Poor data quality impacts information based businesses negatively. Conditional functional dependencies (CFDs) are extensions of functional dependencies (FDs) with semantics of data values and are used as quality assessment rules. Conditional functional dependencies contribute to the improvement of data quality by detecting data inconsistencies. CFDs extracted from data can be used as rules for data cleaning. Any violation of CFDs can be identified as inconsistencies and could be repaired by applying the rules. As mining traditional functional dependencies is computationally intensive, mining CFDs is still more costly. In this paper, we discuss an information theoretic approach (ITCFD) to detect CFDs, which is scalable to large datasets.
Copyright © 2013 Praise Worthy Prize - All rights reserved.

Keywords


Functional Dependency; Conditional Functional Dependency; Information Theory; Entropy

Full Text:

PDF


References


W.Fan, F.Geerts, J.Li, and M.Xiong, Discovering Conditional Functional Dependencies , IEEE Transactions on Knowledge and Data Engineering, Vol. 23, pp:683-698, 2011.

J. Li, On Optimal Rrule Discovery, IEEE Transactions on Knowledge and Data Engineering,Vol.18, No.(4), pp: 460 -471, 2000.

W.Fan, F.Geerts, X.Jia, and Kementsietsidis. A, ”Conditional functional dependencies for capturing data inconsistencies”, ACM Transactions on Database Systems, Vol.33, pp.1-48, 2008.

L.Bravo, W.Fan, F.Geerts, S. Ma, Increasing the Expressivity of Conditional Functional Dependencies without Extra Complexity, Proceedings of ICDE ,pp: 516-525, 2008.

C.Giannella and E.Robertson, On Approximation Measures For Functional Dependencies , Information Systems, vol.29. No.(6), pp:483-507, 2004.

Y. Huhtala, J.Karkkainen, P.Porkka, and H.Toivonen, TANE: An Efficient Algorithm For Discovering Functional And Approximate Dependencies”, The Computer Journal, Vol.42, No.2, pp:100-111, 1999.

L.Golab, H.Karlo, F.Kor., D.Srivastava and B.Yu, On Generating Near-Optimal Tableaux for Conditional Functional Dependencies, Proceedings of the VLDB Endowment, vol.1, pp: 376-390, 2008.

J.Li, J.Liu, H.Toivonen, J.Yong , Effective Pruning for the Discovery of Conditional Functional Dependencies, The Computer Journal , June 2012.

M..M. Aqel, N.F.Shilbayeh and M.S. Hakawati, CFD-Mine: An Efficient Algorithm for Discovering Functional and Conditional Functional Dependencies , Trends in Applied Sciences Research Vol. 7, No.4, pp:285-302, 2012.

C.Batini and M.Scannapieco, Data Quality: Concepts, Methodologies and Techniques, Springer, 2006.

J. Chomicki and J. Marcinkowski, Minimal-change integrity maintenance using tuple deletions, Information and Computation, vol. 197, pp. 90–121, 2005.

W.Garrett, K.Aravind ,K.Hemal, B.Raju, C.Bhaumik, F.Jianchun, C.Yi and K.Subbarao, ”Query Processing over Incomplete Autonomous Databases: Query Rewriting Using Learned Data Dependencies, International Journal on Very Large Data Base VLDBJ, Special Issue on Uncertain and Probabilistic Databases, vol.18.No.(5), pp:1167-1190, 2009.

E.Leopoldo, S.Bertossi, Database Repairing and Consistent Query Answering , Morgan & Claypool Publishers , 2011.

E.Leopoldo, S.Bertossi, V. S. Lakshmanan, Data cleaning and query answering with matching dependencies and matching functions , Proceedings of ICDT ‘11, pp :268-279, 2011.

E.Rahm and H.H.Do, Data Cleaning: Problems And Current Approaches, IEEE Data Engineering Bulletin, vol. 23, no. 4, pp. 3–13, 2000.

J. Wijsen, Database Repairing Using Updates , ACM Transactions on Database Systems, Vol. 30, No. 3, pp.722–768, 2005.

Y.Hong, Howard J. Hamilton , Mining Functional Dependencies from Data , Data Mining and Knowledge Discovery, Springer Netherlands, Vol. 16, No. 2 , pp 179-219, 2008.

N. Novelli, and R. Cicchetti., FUN: An Efficient Algorithm for Mining Functional and Embedded Dependencies , Proceedings of the 8th International Conference on Database Theory (ICDT’01), pp 189-203, 2001

F.Chiang, and R.J Miller, Discovering Data Quality Rules, Proceedings of the VLDB Endowment, 1166-1177, 2008.

G.Cong, W.Fan, F.Geerts, X.Jia, and S.Ma, Improving Data Quality: Consistency and Accuracy,” VLDBJournal, 2007.

Y.Y Yao, , Information-Theoretic Measures for Knowledge Discovery and Data Mining, Springer, Berlin, pp:115-136,2003.

A.Frank and Asuncion, UCI machinelearning repository. University of California, Irvine, School of Information and Computer Sciences, 2009. http://archive.ics.uci.edu/ml.

M. H. Marghny, A. A. Shakour, Fast, Simple and Memory Efficient Algorithm for Mining Association Rules, (2007) International Review on Computers and Software (IRECOS), 2 (1), pp. 55 – 63.

N. Kakanakov, M. Shopov, I. Stankov, and G. Spasov, Web Services and Data Integration in Distributed Automation and Information Systems in Internet Environment, (2006) International Review on Computers and Software (IRECOS), 1 (3), pp. 194-201.


Refbacks

  • There are currently no refbacks.



Please send any question about this web site to info@praiseworthyprize.com
Copyright © 2005-2024 Praise Worthy Prize