A Fast K-Modes Clustering Algorithm to Warehouse Very Large Heterogeneous Medical Databases

The medical informatics databases have been constructed by various organizations to support their medical development. These medical databases may be incorporated with one or several specific uses, because of this reason, the data stored in these databases are always limited and incomplete. In this paper, we use one of the data mining techniques, named as K-Modes Clustering Algorithm, to provide proper and organized data from the data warehouse for the purpose of making reports, queries, analysis, etc. Modes in K-Modes are the values of attribute with high frequency. The attribute values that are frequently occurred are used as modes. Using a dissimilarity measure, we can compare each object with the modes and allocate each object in the nearest cluster. After the allocation of each object to the clusters, update the mode of the cluster. Thus all the similar objects are placed in one cluster. Classification is then performed using the fuzzy logic. Now we can easily collect the proper medical data to provide the required information in a direct, speedy and meaningful way. The proposed approach is implemented in MATLAB and is evaluated using various medical related databases. The experimental results show that our proposed methodology is more efficient to warehouse very large heterogeneous medical databases. The proposed algorithm gives better accuracy while tested on the dataset. The proposed technique speeds up the query processing and it reduces the cost.
Medical Informatics Databases; Data Warehouse; K-Modes Clustering Algorithm; Dissimilarity Measure; Fuzzy Logic

