An Improved Semi Supervised Nonnegative Matrix Factorization Based Tumor Clustering with Efficient Infomax ICA Based Gene Selection

S Praba; A. K. Santra

doi:10.15866/irecos.v9i4.812

An Improved Semi Supervised Nonnegative Matrix Factorization Based Tumor Clustering with Efficient Infomax ICA Based Gene Selection

^(*) Corresponding author

DOI's assignment:
the author of the article can submit here a request for assignment of a DOI number to this resource!
Cost of the service: euros 10,00 (for a DOI)

Abstract

In cancer class discovery, tumor clustering becomes an active area of research. An exact identification of the type of tumors is essential for efficient treatment of cancer at the prior stage. Numerous techniques have been proposed and used to examine gene expression data, which clustering algorithms are inappropriate to many real-world problems whereas less amount of domain information only available. Adding the domain knowledge can help a clustering algorithm, to improve the quality of clustering result. In this paper, a semi-supervised non-negative matrix factorization (SS-NMF) structure is proposed for selected gene clustering and the genes are selected using proposed infomax ICA approach. Proposed system gives the tumor clustering result in terms of representing pairwise constraints on a small number of data objects that are to be clustered together while specifying whether “cannot” or “must”. Use of iterative procedure perform symmetric trifactorization data similarity matrix to conclude the cluster result. Finally, the experiments on the gene expression dataset verified that the proposed scheme can attain improved clustering results
Copyright © 2014 Praise Worthy Prize - All rights reserved.

Keywords

Clustering; Gene Expression Data; ICA; Semi-Supervised Non-Negative Matrix Factorization; Tumor

Full Text:

PDF

References

Tao Shi, David Seligson, Arie S Belldegrun, Aarno Palotie and Steve Horvath, “Tumor classification by tissue microarray profiling: random forest clustering applied to renal cell carcinoma”, Modern Pathology Vol. 18, pages: 547–557, 2005.

Zhong-Yuan Zhang† Xiang-Sun Zhang, “Two Improvements of NMF Used for Tumor Clustering”, First International Symposium on Optimization and Systems Biology (OSB’07) Beijing, China, 2007.

Xiong M, Li W, Zhao J, Jin L, Boerwinkle E., “Feature (gene) selection in gene expression-based tumor classification”, Molecular Genetics and Metabolism, Volume: 73, Issue: 3, Pages: 239-247, 2001.

Karthikeyan, S., Rengarajan, N., Hybrid feature analysis for assessment of glaucoma using RNFL defects, (2014) International Review on Computers and Software (IRECOS), 9 (1), pp. 178-187.

Li, Y., Shen, C., Xu, J., Li, C., Wang, W., Text feature selection based on the improved dynamic rough sets, (2012) International Review on Computers and Software (IRECOS), 7 (6), pp. 3272-3278.

J.-P. Brunet, P. Tamayo, T. R. Golun, and J. P. Mesirov, “Metagenes and molecular pattern discovery using matrix factorization,” Proc. Nat. Acad. Sci. USA, vol. 101, no. 12, pp. 4164–416, 2004.

K. Bryan, P. Cunningham, and N. Bolshakova, “Application of simulated annealing to the biclustering of gene expression data,” IEEE Trans. Inf. Technol. Biomed., vol. 10, no. 3, pp. 519–525, Jul. 2006.

Y. Gao and C. George, “Improving molecular cancer class discovery through sparse non-negative matrix factorization,” Bioinformatics, vol. 21, pp. 3970–3975, 2005.

S. Dudoit, J. Fridlyand, T. Speed, “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data”, J. Am. Statistical Assoc., Vol. 97, pp. 77-87, 2002.

C. Peterson, M. Ringner, “Analysis Tumor Gene Expression Profiles”, Artificial Intelligence in Medicine, Vol. 28, no. 1, pp. 59-74, 2003.

K. Yendrapalli, R. Basnet, S. Mukkamala, A. H. Sung, “Gene Selection for Tumor Classification Using Microarray Gene Expression Data”, Proceedings of the World Congress on Engineering 2007 Vol I, WCE 2007, July 2 - 4, 2007.

V. Cherkassy, “Model complexity control and statistical learning theory”, Journal of natural computing 1: (2002) 109–133.

N. Cristianini, J. S. Taylor, “Support Vector Machines and Other Kernel-based Learning Algorithms”, Cambridge, UK: Cambridge University Press, 2000.

Antai Wang, Edmund A. Gehan,” Gene selection for microarray data analysis using principal component analysis”, Statistics in Medicine Volume 24, Issue 13, pages 2069–2087, 15 July 2005

Asparoukhov, O. & Krzanowski, W. J. Non-parametric smoothing of the location model in mixed variable discrimination. Statistics and Computing, 10, 289-297,2000.

D. M. Witten, R. Tibshirani, and T. Hastie. A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics, 10(3):515, 2009.

Li Jiangeng, Duan Yanhua and Ruan Xiaogang, “A Novel Hybrid Approach to Selecting Marker Genes for Cancer Classification Using Gene Expression Data”, IEEE, 2007.

A. Hyv¨arinen and E. Oja, “Independent component analysis: algorithm and applications,” Neural Netw., vol. 2000, no. 13, pp. 411–430, 2000.

C. H. Zheng, D. S. Huang, K. Li, G. Irwin, and Z. L. Sun, “MISEP method for post-nonlinear blind source separation,” Neural Comput., vol. 19, pp. 2557–2578, 2007.

Blei, D., Ng, A., & Jordan, M. “Latent Dirichlet allocation”,In NIPS*14,2002.

W. Liebermeister, “Linear modes of gene expression determined by independent component analysis,” Bioinformatics, vol. 18, pp. 51–60, 2002.

Smitha, J.C., Suresh Babu, S., Innovative features in pathological tissues segmentation and classification of MRI brain images with aid of Back Propagation Neural Network, (2014) International Review on Computers and Software (IRECOS), 9 (1), pp. 88-100.

D. S. Huang and C. H. Zheng, “Independent component analysis-based penalized discriminant method for tumor classification using gene expression data,” Bioinformatics, vol. 22, pp. 1855–1862, 2006.

S. I. Lee and S. Batzoglou, “Application of independent component analysis to microarrays,” Genome Biol., vol. 4, p. R76, 2003.

Lee, D. and Seung, H. ,”Algorithms for non-negative matrix factorization”, proceedings of Annual Conference on Neural Information Processing Systems, pp. 362-371,2001.

Ding, C., Li, T., Peng, W. and Park, H,” Orthogonal nonnegative matrix tri- factorizations for clustering”, proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, New York, NY, USA, pp. 126-135,2006.

Xu, W., Liu, X. and Gong, Y., “Document clustering based on non-negative matrix factorization”, proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 267-273,2003.

S. L. Pomeroy, P. Tamayo, M. Gaasenbeek, L. M. Sturla, M. Angelo, M. E. Mclaughlin, J. Y. Kim, L. C. Goumnerova, P. M. Black, C. Lau, J. C. Allen, D. Zagzag, J. M. Olson, T. Curran, C. Wetmore, J. A. Biegel, T. Poggio, S. Mukherjee, R. Rifkin, A. Califano, G. Stolovitzky, D. N. Louis, J. P. Mesirov, E. S. Lander, and T. R. Golub, “Prediction of central nervous system embryonal tumour outcome based on gene expression,” Nature, vol. 415, pp. 436–442, 2002.

Refbacks

There are currently no refbacks.

Username
Password
Remember me