Hybrid Model Based Feature Selection Approach Using Kernel PCA for Large Datasets


(*) Corresponding author


Authors' affiliations


DOI's assignment:
the author of the article can submit here a request for assignment of a DOI number to this resource!
Cost of the service: euros 10,00 (for a DOI)

Abstract


Due Knowledge discovery from large data sets by means of traditional data mining approaches has been established to be tedious due to large size in both dimension and samples. In real applications, data sets frequently consist of many noisy, unnecessary and unrelated features, resulting in humiliating the classification accuracy and increasing the difficulty exponentially. Owing to the intrinsic nature, the analysis of the quality of data sets is tricky and very partial approaches about this problem can be present in the survey. There is a requirement for quality of data; therefore the quality of data is important at the end. Data preprocessing, feature selection is a commonly used data mining techniques which selects a subset of informative attributes or variables to construct models relating data then classification is made. Through by eliminating redundant and unrelated or noise features, dimensionality reduction of the data for feature selection can increase the predictive accuracy and the clarity of the predictors or classifiers. Various feature selection algorithms with different selection condition and the classification algorithms have been presented by the researchers. In this paper, an Enhanced KNN method is used to find the missing values from the whole dataset by preprocessing process. Then the feature selection of the datasets is done using Enhanced Genetic Algorithm combined with Kernel PCA_SVM Algorithm. Experimental results shows that the proposed approaches perform better when compared to other existing techniques.
Copyright © 2013 Praise Worthy Prize - All rights reserved.

Keywords


Datasets; Enhanced KNN; Dimensionality Reduction; Feature Selection; Preprocessing

Full Text:

PDF


References


Stephen D. Bay, Dennis Kibler, Michael J. Pazzani, and Padhraic Smyth, “The UCI KDD Archive of Large Data Sets for Data Mining Research and Experimentation”, 2000.

Houari, K., Chahir, Y., Kholladi, M.-K., Spectral clustering and dimensionality reduction applied to content based image retrieval with hybrid descriptors, (2009) International Review on Computers and Software (IRECOS), 4 (6), pp. 633-639.

R. L. Somorjai, B. Dolenko, and R. Baumgartner. Class Prediction and Discovery using Gene Microarray and Proteomics Mass Spectroscopy Data: Curses, Caveats, Cautions. Bioinformatics, 19(12):1484–1491, 2003

X. W. et. al. Top 10 algorithms in data mining. Knowl. Inf. Syst., 14(1):1–37, 2007

G. Ruß and R. Kruse. Feature Selection for Wheat Yield Prediction. In Research and Development in Intelligent Systems XXVI, Incorporating Applications and Innovations in Intelligent Systems XVII, pages 465–478, 2010.

C. D. Stefano, F. Fontanella, and C. Marrocco. A GA-based Feature Selection Algorithm for Remote Sensing Images. In EvoWorkshops 2008, pages 285–294, 2008”,

I. Esmaeil Zadeh, H. Nabovati, M. R. Asgari, A New Method for the Feature Selection in Pattern Recognition Algorithms, (2008) International Review on Computers and Software (IRECOS), 3. (6), pp. 605 - 609.

Hsu, D.F. and Taksa, I., Comparing rank and score combination methods for data fusion in information retrieval, Information Retrieval 8(3), (2005) pp.449-480

Chuang H-Y et al (2004) Identifying significant genes from microarray data. Fourth IEEE symposium on bioinformatics and bioengineering (BIBE’04) p. 358

R. Kohavi and G. H. John, Wrappers for feature subset selection, Artifi. Intell., 97 (1997), pp. 273–324.

M. Fauvel, J. Chanussot, J.A. Benediktssonc, A. Villa, “Parsimonious Mahalanobis kernel for the classification of high dimensional data”, tern Recognition 46 (2013) 845–854.

Guha, S., Rastogi, R., Shim, K.(1998): CURE an efficient clustering algorithm for large databases. In: Proceedings of the 1998 ACM SIGMOD International Conferenceon Management of Data, Seattle, Washington (1998) 73–84

I.T.Jollie. Principle component analysis, Springer-Verlag, New York, 1986.

Z. G. Chen, H. D. Ren, X J Du. Minimax probability machine classifier with feature extraction by kernel PCA for intrusion detection, Wireless Communications, Networking and Mobile Computing, pages 1 - 4, 2008.

W. Z. Liao, J. S. Jiang. Image feature extraction based on kernel ICA, Image and Signal Processing, Proceedings, 2: pages 763 - 767, 2008.

M. Ding, Z. Tian, and H. Xu, Adaptive kernel principal analysis for online feature extraction, Proc. World Acad. Sci., Eng. Technol., 59: pages 288 - 293, 2009.

Hightower, J. & Borriello, G., (2001). “Location systems for ubiquitous computing”, IEEE Computer.

Joubish, F. (2009). Educational Research Department of Education,Federal Urdu University, Karachi,Pakistan

Li, Y., Ma, P., Yu, L., LS-SVM soft sensing based on hybrid particle swarm optimization, (2012) International Review on Computers and Software (IRECOS), 7 (1), pp. 283-289.

Azizi, E., Abedikia, H., Haddadnia, J., Rezaee, K., Ghezelbash, M.R., A novel online EEG-based epileptic seizure onset detection algorithm based on GTDA features and KNN classifier, (2012) International Review on Computers and Software (IRECOS), 7 (6), pp. 2849-2855.

Amshakala, K., Nedunchezhian, R., Extracting data quality rules using information theoretic measures, (2013) International Review on Computers and Software (IRECOS), 8 (6), pp. 1321-1327.


Refbacks

  • There are currently no refbacks.



Please send any question about this web site to info@praiseworthyprize.com
Copyright © 2005-2024 Praise Worthy Prize