GraphConnect: Framework of Discovering Closed Highly Connected Pattern from Semistructured Dataset

F. L. Gaol; B. Widjaja

doi:10.15866/irea.v5i5.14713

GraphConnect: Framework of Discovering Closed Highly Connected Pattern from Semistructured Dataset

^(*) Corresponding author

DOI's assignment:
the author of the article can submit here a request for assignment of a DOI number to this resource!
Cost of the service: euros 10,00 (for a DOI)

Abstract

Semistructured data appears when the source does not impose a rigid structure on the data, such as the web, or when data is combined from several heterogeneous sources. In mathematical terms, we called semi structured data set as graph data set One particular interesting in mining semi structured pattern is finding frequent highly connected subgraph in large relational graphs. The common problem is to find not only frequent graphs, but also graphs that satisfy the connectivity constraint. We identify three major characteristics different from the previous frequent graph mining problem. First, in relational graphs each node represents a distinct object. No two nodes share the same label. In biological networks, nodes often represent unique objects like genes and enzymes. Secondly, relational graphs may be very large. Thirdly, the interesting patterns should not only be frequent but also satisfy the connectivity constraint. . In order to handle these new challenges, we identify two issues have to be solved: (1) how to mine frequent graphs efficiently in large relational graphs, and (2) how to handle the connectivity constraint. Since frequent graph mining usually generates too many patterns, it is more appealing to mine closed frequent graphs only. Our major contribution is to tackle the connectivity constraint. We use the minimum cut criterion to measure the connectivity of a pattern and examine the issues of integrating the connectivity constraint with the closed graph mining process.
Copyright © 2017 Praise Worthy Prize - All rights reserved.

Keywords

Semistructured Data; Closed Pattern; Closed Frequent Graphs; Connectivity

Full Text:

PDF

References

T. Washio and H. Motoda. State of the art of graph-based data mining. SIGKDD Explorations, 5:59–68, 2003.

A Butte, P. Tamayo, D. Slonim, T. Golub, and I. Kohane Discovering functional relationships between rna expression and chemotherapeutic susceptibility. In Proc. of the National Academy of Science, volume 97, pages 12182–12186, 2000.

V. Spirin and L. Mirny. Protein complexes and functional modules in molecular networks. In Proc. of the National Academy of Science, volume 100, pages 12123–12128, 2003.

A. Inokuchi, T. Washio, and H. Motoda. An apriori-based algorithm for mining frequent substructures from graph data. In Proc. 2000 European Symp. Principle of Data Mining and Knowledge Discovery (PKDD’00), pages 13–23, 1998.

M. Kuramochi and G. Karypis. Frequent subgraph discovery. In Proc. Int. Conf. Data Mining (ICDM’01), pages 313–320, 2001.

C. Borgelt and M. Berthold. Mining molecular fragments: Finding relevant substructures of molecules. In Proc. 2002 Int. Conf. on Data Mining (ICDM’02), pages 211–218, 2002.

Ford Lumban Gaol, Belawati Widjaja. Formal Necessary and Sufficient Conditions for Support Measure Admissibility. In Proceedings International Communication & Technology Symposium 2006, ITS, Surabaya, Indonesia, Agustus 2006.

Ford Lumban Gaol. Finding Structured Pattern in Graph using. Permutation Matrix. In Proceedings International Conference 9th QIR, UI, Depok, Indonesia, August 2006.

C. Chekuri, A. Goldberg, D. Karger, M. Levine, and C. Stein. Experimental study of minimum cut algorithms. In Proc. of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’97), pages 324–333, 1997.

D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A maximal frequent itemset algorithm for transactional databases. In Proc. 2001 Int. Conf. Data Engineering (ICDE’01), pages 443–452, 2001.

M. Zaki and K. Gouda. Fast vertical mining using diffsets. In Proc. 2003 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD’03), pages 326–335, 2003.

J. Wang, J. Han, and J. Pei. Closet+: Searching for the best strategies for mining frequent closed itemsets. In Proc. 2003 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD’03), pages 236–245, 2003.

Ford Lumban Gaol, Belawati Widjaja Mining. Frequent Semistructured Pattern Using Path Covers. In Proceedings IJJS Joint Conference Chiba Univ – UI, UI, Depok, August 2006.

J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and. A. Tropsha. Mining spatial motifs from protein structure graphs. In Proc. of the 8th Annual Int. Conf. on Research in Computational Molecular Biology (RECOMB’04), pages 308–315.

D. West. Introduction to Graph Theory. Prentice Hall, Cambridge, MA, 2000.

M. Eisen, P. Spellman, P. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. In Proc. of the National Academy of Science, volume 95, pages 14863–14868, 1998.

G. Flake, S. Lawrence, and C. Giles. Efficient identification of web communities. In Proc. 2000 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD’00), pages 150–160, 2000.

L. Holder, D. Cook, and S. Djoko. Substructure discovery in the subdue system. In Proc. AAAI’94 Workshop on Knowledge Discovery in Databases (KDD’94), pages 169 – 180, 1994.

T. Mielikainen. Intersecting data to closed sets with constraints. In Proc. of the First ICDM Workshop on Frequent Itemset Mining Implementation (FIMI’03), 2003.

F. Pan, G. Cong, A. Tung, J. Yang, and M. Zaki. Carpenter: Finding closed patterns in long biological datasets. In Proc. 2003 ACM Int. Conf. Knowledge Discovery and Data Mining (KDD’03), 2003.

J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.

M. Stoer and F. Wagner. A simple min-cut algorithm. Journal of the ACM, 44:585–591, 1997.

P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. Lander, and T. Golub. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. In Proc. of the National Academy of Science, volume 96, pages 2907–2912, 1999.

Z. Wu and R. Leahy. An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation. IEEE Trans. Oo Pattern Analysis and Machine Intelligence, 15:1101–1113, 1993.

Ford Lumban Gaol, Belawati Widjaja. Finding Pattern in Molecular Dataset using Maximal Pattern Framework. In Proceedings Industrial Engineering Symposium’06, ITS, Surabaya, Indonesia, November 2006.

Refbacks

There are currently no refbacks.

Username
Password
Remember me