Blending Firefly and Bayes Classifier for Email Spam Classification


(*) Corresponding author


Authors' affiliations


DOI's assignment:
the author of the article can submit here a request for assignment of a DOI number to this resource!
Cost of the service: euros 10,00 (for a DOI)

Abstract


Email spam is a serious worldwide problem which causes problems for almost all computer users. This issue not only affects normal users of the internet, but also causes a huge problem for companies and organizations since it costs a huge amount of money in lost productivity, wasting user’s time and network bandwidth. Recently, various researchers are presented several email spam classification techniques. In this paper, we have developed an efficient technique to classify the email spam using firefly and naïve bayes classifier. Initially, the input email data is given to the feature selection to select the suitable feature for spam classification. The traditional firefly algorithm is taken and the optimized feature space is chosen with the best fitness. Once the best feature space is identified through firefly algorithm, the spam classification is done using the naïve bayes classifier. The results for the spam detection are validated through evaluation metrics namely, sensitivity, specificity, accuracy. For comparative analysis, proposed spam classification is compared with the existing works such as particle swarm optimization and neural network. From the results, our proposed algorithm performed better than PSO algorithm and neural network in terms of accuracy and specificity.
Copyright © 2013 Praise Worthy Prize - All rights reserved.

Keywords


Spam; Feature Selection; Firefly; Classification; Naïve-Bayes

Full Text:

PDF


References


Wenqing Zhao, Yongli Zhu, “An Email Classification Scheme Based on Decision-Theoretic Rough Set Theory and Analysis of Email Security,” Audio, Transactions of the IRE Professional Group ,pp.1-6,2005.

Hongrong Cheng, Zhiguang Qin, Chong Fu, and Yong Wang ,“A Novel Spam Image Filtering Framework with Multi-Label Classification,” International Conference on Communications, Circuits and Systems (ICCCAS),pp.282-285,2010.

Xiao Mang Li, Ung Mo Kim ,“A Hierarchical Framework for Content-based Image Spam Filtering,” 8th International Conference on Information Science and Digital Content Technology (ICIDT),vol.1,pp.149-155, 2012.

Rasim M. Alguliev, Ramiz M. Aliguliyev, and Saadat A. Nazirova,"Classification of Textual E-Mail Spam Using DataMining Techniques, “Applied Computational Intelligence and Soft Computing, 2011.

Dae-Neung Sohn,Jung-Tae Lee,Kyoung-Soo Han,Hae-Chang Rim, “Content-based mobile spam classification using stylistically motivated features,” Pattern Recognition Letters,vol.33, pp.364-369,2012.

Mu-Chun Su, Hsu-Hsun Lo, Fu-Hau Hsu,"A neural tree and its application to spam e-mail detection, “Expert Systems with Applications, vol.37, pp.7976-7985, 2010.

Sarah Jane Delany, Mark Buckley, Derek Greene “SMS spam filtering: Methods and data,” Expert Systems with Applications, vol.39, pp.9899-9908, 2012.

Muhammad N. Marsonoa,, M. Watheq El-Kharashi, Fayez Gebali,”Targeting spam control on middleboxes: Spam detection based on layer-3 e-mail content classification,” Computer Networks,vol.53,pp.835-848,2009.

Evangelos Moustakas, C. Ranganathan, and Penny Duquenoy,” Combating spam through legislation: A comparative analysis of us and European approaches,” In Proceedings of Second Conference on Email and Anti-Spam, CEAS'2005, 2005.

W. Yerazunis, The spam filtering plateau at 99.9% accuracy and how to get past it, in: Proceedings of the MIT Spam Conference, Cambridge, MA, USA, 2004.

C. Siefkes, F. Assis, S. Chhabra, W.S. Yerazunis, Combining winnow and orthogonal sparse bigrams for incremental spam filtering, in: Principles and Practice of Knowledge Discovery in Databases (PKDD’04), Lecture Notes in Computer Science, LNCS 3201, Springer, Berlin, 2004, pp. 410–421.

R.D. Twining, M.M. Williamson, M. Mowbray, M. Rahmouni, Email prioritization: reducing delays on legitimate mail caused by junk mail, Technical Report HPL-2004-5(R.1), HP Digital Media Systems Laboratory, Bristol, UK, May 2004.

M.N. Marsono, M.W. El-Kharashi, F. Gebali, Prioritized e-mail servicing to reduce non-spam delay and loss: a performance analysis, Wiley International Journal of Network Management 18(4) (2008) 323–342.

M. Tran, G. Armitage, Evaluating the use of spam-triggered TCP/IP rate control to protect SMTP servers, in: Proceedings of the Australian Telecommunications Networks and Applications Conference (ATNAC 2004), Sydney, Australia, 2004, pp. 329–335.

M.N. Marsono, M.W. El-Kharashi, F. Gebali, A spam rejection scheme during SMTP sessions based on layer 3 e-mail classification, Elsevier Journal of Network and Computer Applications 32 (1) (2009) 236-257.

R. Clayton, Stopping spam by extrusion detection, in: Proceedings of the First Conference on Email and Anti-Spam (CEAS), Mountain View, CA, USA, 2004.

C. Ray and H. Hunt, “Tightening the net: a review of current and next generation spam filtering tools,” Computers and Security, vol. 25, no. 8, pp. 566–578, 2006.

H. Wen-Feng and C. Te-Min, “An incremental cluster-based approach to spam filtering,” Expert Systems with Applications,vol. 34, no. 3, pp. 1599-1608, 2008.

M. L. Sang, S. K. Dong, and S. P. Jong, “Spam detection using feature selection and parameters optimization,” in Proceedings of the 4th International Conference on Complex, Intelligent and Software Intensive Systems, (CISIS ’10), pp. 883–888, Krakow , Poland, February 2010.

F. S. Mehrnoush and B. Hamid, “Spam detection using dynamic weighted voting based on clustering,” in Proceedings of the 2nd International Symposium on Intelligent Information Technology Application, (IITA ’08), pp. 122–126, Shanghai , China, December 2008.

S. Minoru and Sh. Hiroyuki, “Spam detection using text clustering,” in Proceedings of the International Conference on Cyber worlds, (CW ’05), pp. 316–319, Singapore, November 2005.

C. Paulo, L. Clotilde, S. Pedro et al., “Symbiotic data mining for personalized spam filtering,” in Proceedings of the Inter-national Conference on Web Intelligence and Intelligent Agent Technology, (IEEE/WIC/ACM), pp. 149–156, 2009.

Kh. Ahmed, “An overview of content-based spam filtering techniques,”Informatica, vol. 31, no. 3, pp. 269–277, 2007.

Zhang, J., et al. “Modified logistic regression: An approximation to SVM and its applications in large-scale text categorization”. In Proceedings of the 20th International Conference on Machine Learning. AAAI Press, pp.888–895, 2003.

H. Drucker, B. Shahrary and D. C. Gibbon, “Support vector machines: relevance feedback and information retrieval,” Inform. Process. Manag.38, 3, pp (305–323), 2002.

Islam. R, Chowdhury. M, Zhou. W, “An Innovative Spam Filtering Model Based on Support Vector Machine”, Proceedings of the IEEE International Conference on Intelligent Agents, Web Technologies and Internet Commerce, Volume 2, 28-30, Austria, 2005, pp.348-353.

A. Bratko, B. Filipic ˇ, Spam filtering using compression models, Technical Report IJS-DP-9227, Department of Intelligent Systems, Joz ˇef Stefan Institute, Ljubljana, Slovenia, 2005.

H. Drucker, D. Wu, V. Vapnik, Support vector machines for spam categorization, IEEE Transactions on Neural Networks 10 (5) (1999)1048–1054.

I. Androutsopolous, J. Koutsias, K.V. Chandrinos, G. Paliouras, C.D. Spyropolous, An evaluation of naive Bayesian anti-spam filtering, in: Proceedings of the Workshop on Machine Learning in the New Information Age, the 11th European Conference on Machine Learning (ECML 2000), Barcelona, Spain, pp. 9–17,2000.

Datasets from “(http://archive.ics.uci.edu/ml/datasets.html)”

C. GROUP. (2010, S.e.d., CSDMC2010 and S. corpus). [Cited March, 2012]; Available from: http://csmining.org/index.php/spam-email-datasets-.html..

Geetha, V., Chandrakala, D., Nadarajan, R., Dass, C.K., A bayesian classification approach for handling uncertainty in adaptive E-assessment, (2013) International Review on Computers and Software (IRECOS), 8 (4), pp. 1045-1052.

Al-Alwani, A., Bala Musa, S., An investigation of the state of the art in spam email detection and prevention indicating future research directions, (2012) International Review on Computers and Software (IRECOS), 7 (4), pp. 1594-1601.


Refbacks

  • There are currently no refbacks.



Please send any question about this web site to info@praiseworthyprize.com
Copyright © 2005-2024 Praise Worthy Prize