Open Access Open Access  Restricted Access Subscription or Fee Access

Learning to Generate Optimized Term Weighting for Web Documents Classification - A Parallel Mimetic Approach Based on Support Vector Machines


(*) Corresponding author


Authors' affiliations


DOI: https://doi.org/10.15866/irecos.v11i12.10964

Abstract


In this paper, we propose a mimetic approach for simultaneous weighting and feature selection based on a combination between an evolutionary algorithm and Support Vector Machines to improve web documents categorization. However, textual representation generates high dimensionality matrix, which processed with a mimetic approach, may increase drastically the processing time in training step. In this paper, we present a parallel implementation of the proposed algorithm based on an Island model topology. Experiments on three well-known datasets: Reuters-21578, 7Sectors and Webkb show that we have significantly reduced the set of features and computing time while improving categorization performance.
Copyright © 2016 Praise Worthy Prize - All rights reserved.

Keywords


Parallel Mimetic Algorithm; Text Categorization; Feature Selection; Term Weighting; Support Vector Machines

Full Text:

PDF


References


G. Domeniconi, G. Moro, R. Pasolini, and C. Sartori, A Comparison of Term Weighting Schemes for Text Classification and Sentiment Analysis with a Supervised Variant of tf.idf, In M. Helfert, A. Holzinger, O. Belo, and C. Francalanci (Eds.), Data Management Technologies and Applications: 4th International Conference, Colmar, France, Revised Selected Papers, (Cham: Springer International Publishing, 2016, 39–58).
http://dx.doi.org/10.1007/978-3-319-30162-4_4

M. H. Aghdam and S. Heidari, Feature selection using Particle Swarm Optimization in text categorization, (2015), Journal of Artificial Intelligence and Soft Computing Research, 5 (4), pp. 231–238.
http://dx.doi.org/10.1515/jaiscr-2015-0031

S. Paul and S. Das, Simultaneous feature selection and weighting - An evolutionary multi-objective optimization approach, (2015), Pattern Recognition Letters, 65, pp. 51–59.
http://dx.doi.org/10.1016/j.patrec.2015.07.007

J. Pérez-Rodríguez, A. G. Arroyo-Peña, and N. García-Pedrajas, Simultaneous instance and feature selection and weighting using evolutionary computation: Proposal and study, (2015), Applied Soft Computing Journal, 37, pp. 416–443.
http://dx.doi.org/10.1016/j.asoc.2015.07.046

I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, Gene Selection for Cancer Classification using Support Vector Machines, (2002), Machine Learning, 46 (1–3), pp. 389–422.
http://dx.doi.org/10.1023/a:1012487302797

L. Zhang and X. Huang, Multiple SVM-RFE for multi-class gene selection on DNA Microarray data, Proceedings of the International Joint Conference on Neural Networks (IJCNN), (Page: 1 Year of Publication: 2015 ISBN:978-1-4799-1960-4).
http://dx.doi.org/10.1109/ijcnn.2015.7280417

M. Nekkaa and D. Boughaci, A memetic algorithm with support vector machine for feature selection and classification, (2015), Memetic Computing, 7 (1), pp. 59–73.
http://dx.doi.org/10.1007/s12293-015-0153-2

Y.-J. Gong, W.-N. Chen, Z.-H. Zhan, J. Zhang, Y. Li, Q. Zhang, and J.-J. Li, Distributed evolutionary algorithms and their models: A survey of the state-of-the-art, (2015), Applied Soft Computing, 34, pp. 286–300.
http://dx.doi.org/10.1016/j.asoc.2015.04.061

G. Salton and C. Buckley, Term-weighting approaches in automatic text retrieval, (1988), Information Processing & Management, 24 (5), pp. 513–523.
http://dx.doi.org/10.1016/0306-4573(88)90021-0

Y. Yang and J. O. Pedersen, A Comparative Study on Feature Selection in Text Categorization, Proceedings of the Fourteenth International Conference on Machine Learning, (Page: 412 Year of Publication: 1997 ISBN:1-55860-486-3).
http://dx.doi.org/10.5220/0005593100170026

Bai, V., Manimegalai, D., A Document Level Measure for Text Categorization, (2013) International Review on Computers and Software (IRECOS), 8 (6), pp. 1374-1381.

T. Wang, Y. Cai, H. f. Leung, Z. Cai, and H. Min, Entropy-Based Term Weighting Schemes for Text Categorization in VSM, Proceedings of the 27th International Conference on Tools with Artificial Intelligence (ICTAI), (Page: 325 Year of Publication: 2015 ISBN:978-1-5090-0163-7).
http://dx.doi.org/10.1109/ictai.2015.57

M. Abdel Fattah, New term weighting schemes with combination of multiple classifiers for sentiment analysis, (2015), Neurocomputing, 167, pp. 434–442.
http://dx.doi.org/10.1016/j.neucom.2015.04.051

J.-C. Lamirel, P. Cuxac, A. S. Chivukula, and K. Hajlaoui, Optimizing text classification through efficient feature selection based on quality metric, (2015), Journal of Intelligent Information Systems, 45 (3), pp. 379–396.
http://dx.doi.org/10.1007/s10844-014-0317-4

B. Li, Selecting Features with Class Based and Importance Weighted Document Frequency in Text Classification, Proceedings of the ACM Symposium on Document Engineering, (Page: 139 Year of Publication: 2016 ISBN:978-1-4503-4438-8).
http://dx.doi.org/10.1145/2960811.2967164

K. Javed, S. Maruf, and H. A. Babri, A two-stage Markov blanket based feature selection algorithm for text classification, (2015), Neurocomputing, 157, pp. 91–104
http://dx.doi.org/10.1016/j.neucom.2015.01.031

Sivaram Prasad, N., Rajasekhara Rao, K., Effective Clustering of Text Documents in Low Dimension Space Using Semantic Association Among Terms, (2015) International Review on Computers and Software (IRECOS), 10 (5), pp. 467-474.
http://dx.doi.org/10.15866/irecos.v10i5.5847

H. J. Escalante, M. A. García-Limón, A. Morales-Reyes, M. Graff, M. Montes-y-Gómez, E. F. Morales, and J. Martínez-Carranza, Term-weighting learning via genetic programming for text classification, (2015), Knowledge-Based Systems, 83, pp. 176–189.
http://dx.doi.org/10.1016/j.knosys.2015.03.025

E. Alba and J. M. Troya, A survey of parallel distributed genetic algorithms, (1999), Complexity, 4 (4), pp. 31–52.
http://dl.acm.org/citation.cfm?id=315495

L. Vanneschi, D. Codecasa, and G. Mauri, A comparative study of four parallel and distributed PSO methods, (2011), New Generation Computing, 29 (2), pp. 129–161.
http://dx.doi.org/10.1007/s00354-010-0102-z

Y. Hei, W. Li, W. Fu, and X. Li, Efficient Parallel Artificial Bee Colony Algorithm for Cooperative Spectrum Sensing Optimization, (2015), Circuits, Systems, and Signal Processing, 34 (11), pp. 3611–3629.
http://dx.doi.org/10.1007/s00034-015-0028-2

J. Tang, M. H. Lim, and Y. S. Ong, Diversity-adaptive parallel memetic algorithm for solving large scale combinatorial optimization problems, (2006), Soft Computing, 11 (9), pp. 873–888.
http://dx.doi.org/10.1007/s00500-006-0139-6

J. A. Lozano, R. Sagarna, and P. Larra Naga, Parallel estimation of distribution algorithms, In P. Larrañaga and J. A. Lozano (Eds.), Towards a New Evolutionary Computation, (Kluwer Academic Publishers, 2002, 125–142).
http://dx.doi.org/10.1007/978-1-4615-1539-5_5

M. Ruciński, D. Izzo, and F. Biscani, On the impact of the migration topology on the Island Model, (2010), Parallel Computing, 36 (10–11), pp. 555–571.
http://dx.doi.org/10.1016/j.parco.2010.04.002

M. Kurdi, An effective new island model genetic algorithm for job shop scheduling problem, (2016), Computers & Operations Research, 67, pp. 132–142.
http://dx.doi.org/10.1016/j.cor.2015.10.005

T. Joachims, Text categorization with Support Vector Machines: Learning with many relevant features, In C. Nédellec and C. Rouveirol (Eds.), Machine Learning: ECML-98, 1398, (Berlin, Heidelberg: Springer Berlin Heidelberg, 1998, 137–142).
http://dx.doi.org/10.1007/bfb0026683

S. Shahraki and M. R. A. Tutunchy, Continuous Gaussian Estimation of Distribution Algorithm, In R. Kruse, M. R. Berthold, C. Moewes, M. Á. Gil, P. Grzegorzewski, and O. Hryniewicz (Eds.), Synergies of Soft Computing and Statistics for Intelligent Data Analysis, 190, (Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, 211–218).
http://dx.doi.org/10.1007/978-3-642-33042-1_23

Fathy, A., El-Arini, M., Othman, A., A New Evolutionary Algorithm for the Optimal Sizing of Stand-Alone Photovoltaic System Based on Genetic Algorithm, (2013) International Review of Electrical Engineering (IREE), 8 (3), pp. 1067-1075.

Filipiak, S., Stępień, J., Application of the Co-Evolutionary Algorithm with Memory at the Population Level for Optimisation of the Operation of Real Electric Power Distribution Networks, (2014) International Review of Electrical Engineering (IREE), 9 (2), pp. 373-381.

Li, Y., Gu, X., Application of Online SVR in Very Short-Term Load Forecasting, (2013) International Review of Electrical Engineering (IREE), 8 (1), pp. 277-283.

Zhang, Y., Wang, J., Zhang, Y., A Hybrid Method for Short-Term Electricity Price Forecasting Based on BPNN and GSM-SVM, (2013) International Review of Electrical Engineering (IREE), 8 (5), pp. 1509-1519.

Arıkan, Ç., Özdemir, M., Classification of Power Quality Disturbances Using Support Vector Machines and Comparing Classification Performance, (2013) International Review of Electrical Engineering (IREE), 8 (2), pp. 776-784.

Haifeng, Y., Yong, Q., Liangqi, S., Gehao, S., Xiuchen, J., UHF Partial Discharge Pattern Recognition for GIS with Support Vector Machine, (2013) International Review of Electrical Engineering (IREE), 8 (1), pp. 491-496.


Refbacks

  • There are currently no refbacks.



Please send any question about this web site to info@praiseworthyprize.com
Copyright © 2005-2024 Praise Worthy Prize