A Fast K-Modes Clustering Algorithm to Warehouse Very Large Heterogeneous Medical Databases

(*) Corresponding author

Authors' affiliations

DOI's assignment:
the author of the article can submit here a request for assignment of a DOI number to this resource!
Cost of the service: euros 10,00 (for a DOI)


The medical informatics databases have been constructed by various organizations to support their medical development. These medical databases may be incorporated with one or several specific uses, because of this reason, the data stored in these databases are always limited and incomplete. In this paper, we use one of the data mining techniques, named as K-Modes Clustering Algorithm, to provide proper and organized data from the data warehouse for the purpose of making reports, queries, analysis, etc. Modes in K-Modes are the values of attribute with high frequency. The attribute values that are frequently occurred are used as modes. Using a dissimilarity measure, we can compare each object with the modes and allocate each object in the nearest cluster. After the allocation of each object to the clusters, update the mode of the cluster. Thus all the similar objects are placed in one cluster. Classification is then performed using the fuzzy logic. Now we can easily collect the proper medical data to provide the required information in a direct, speedy and meaningful way. The proposed approach is implemented in MATLAB and is evaluated using various medical related databases. The experimental results show that our proposed methodology is more efficient to warehouse very large heterogeneous medical databases. The proposed algorithm gives better accuracy while tested on the dataset. The proposed technique speeds up the query processing and it reduces the cost.
Copyright © 2013 Praise Worthy Prize - All rights reserved.


Medical Informatics Databases; Data Warehouse; K-Modes Clustering Algorithm; Dissimilarity Measure; Fuzzy Logic

Full Text:



Sujansky, Walter . "Heterogeneous Database Integration in Biomedicine". Journal of Biomedical Informatics 34 (4): 285-298, August 2001.

Gang Yu, Jingzhong Chen ., “Intelligent Information Technology Application”, Volume- 2, 233- 236, IITA 2009.

Dilts, D.M., Wu, W. "Knowledge and Data Engineering" ,V- 3,Page(s): 237- 245, IEEE Aug 2002.

Kajal T. Claypool and Elke A. Rundensteiner, "Flexible Database Transformations: The SERF Approach" Worcester Polytechnic Institute, IEEE, 1999

Sudarshan Chawathe, Hector Garcia-Molina, Joachim Hammer, Kelly Ireland, Yannis Papakonstantinou, “The TSIMMIS Integration of Heterogeneous Information Sources”

Y. Breitbart, H. Garcia-Molina, and A. Silberschatz. “Overview of multi database transaction management”. VLDBJ, 1(2):181, October 1992.

M.P.Reddy, B.E.Prasad, P.G.Reddy, Amar Gupta, “A Methodology of Integration of Heterogeneous Databases”, IEEE Transactions on Knowledge and Data Engineering, Vol -6,December 1994

WANG Feng-chun, YAN Ping, LIU Fei, HUANG Jiang-chuan, LIU Ying, “A Solution of EAI Based on Data Integration”,china,2004

R. Ben Mosbah, S. Dourlens, A. Ramdane-Cherif, N. Levy, F. Losavio, "Information management of mechatronic systems materials", International Conference on Computer Systems and Technologies, jan 2011.

Sara Mostafavi and Quaid Morris “Fast integration of heterogeneous data sources for predicting gene function with limited annotation”, Vol. 26 no. 14 2010, pages 1759–1765, May 2010

Kristian Ovaska, Marko Laakso, Saija Haapa-Paananen, Riku Louhimo, Ping Chen, Viljami Aittomäki, “Large-scale data integration framework provides a comprehensive view on glioblastoma multiforme”, volume- 2, page-65, 2010.

Ranjit Singh and Dr. Kawaljeet Singh, "A Descriptive Classification of Causes of Data Quality Problems in Data Warehousing" IJCSI International Journal of Computer Science Issues, Vol. 7, Issue 3, No 2, May 2010.

Yong Jung1, Hwa Jeong Seo4, Yu Rang Park, Jihun Kim, Sang Jay Bien and Ju Han Kim, “Standard-based Integration of Heterogeneous Large-scale DNA Microarray Data for Improving Reusability”, Genomics & Informatics Vol. 9(1) 19-27, March 2011

Adrien Coulet, Yael Garten, Michel Dumontier, Russ B Altman, Mark A Musen, Nigam H Shah, “Integration and publication of heterogeneous text-mined relationships on the Semantic Web”, Journal of Biomedical Semantics, 2011.

Aleksander Byrski, Marek Kisiel-Dorohinicki, Jacek Dajda, Grzegorz Dobrowolski and Edward Nawarecki, “Hierarchical Multi-Agent System for Heterogeneous Data Integration”, Studies in Computational Intelligence, Volume 362/2011, 165-186, 2011

Daniel Schilberg,Tobias Meisen, Rudolf Reinhard,Sabina Jeschke, "Simulation and interoperability in the planning phase of production processes",IMEC,November 11-17, 2011.

Alfredo Cuzzocrea, Jose-Norberto Mazon, Juan Trujillo, Jose Zubcoff,"Model-driven data mining engineering: from solution-driven implementations to 'composable' conceptual data mining models",International Journal of Data Mining, Modelling and Management, Volume 3, 217-251, November 2011.

M. Sassi, A. Grissa Touzi, and H. Ounelli, "A Multiple Aspect Data Model Design for Knowledge Discovery in Databases", (2008) International Review on Computers and Software (IRECOS), 3 (2), pp. 133-140.


  • There are currently no refbacks.

Please send any question about this web site to info@praiseworthyprize.com
Copyright © 2005-2024 Praise Worthy Prize