Learning to Generate Optimized Term Weighting for Web Documents Classification - A Parallel Mimetic Approach Based on Support Vector Machines

In this paper, we propose a mimetic approach for simultaneous weighting and feature selection based on a combination between an evolutionary algorithm and Support Vector Machines to improve web documents categorization. However, textual representation generates high dimensionality matrix, which processed with a mimetic approach, may increase drastically the processing time in training step. In this paper, we present a parallel implementation of the proposed algorithm based on an Island model topology. Experiments on three well-known datasets: Reuters-21578, 7Sectors and Webkb show that we have significantly reduced the set of features and computing time while improving categorization performance.
Parallel Mimetic Algorithm; Text Categorization; Feature Selection; Term Weighting; Support Vector Machines

