Open Access Open Access  Restricted Access Subscription or Fee Access

A New Efficient Text Detection Method for Image Spam Filtering


(*) Corresponding author


Authors' affiliations


DOI: https://doi.org/10.15866/irecos.v10i1.5111

Abstract


Detection of text in images plays an important role in many situations such as video retrieval, annotation, indexing, and content analysis. In information security to filter image spam, one main feature can be used is text contents in image. Extracting text features from image spam needs efficient text detection. Obfuscating techniques used by spammers such as noisy background, wavy text and text with different colors pose challenges to the text detection process. In this paper, we present a text detection method that addresses these challenges. The contribution of this research consists of two parts: a) a new edge operator can specifically be used to detect text edges, and b) proposing of text detection method for image spam filtering that can detect obfuscated text. The proposed method Accumulated Text Extraction (ATE) works for detecting horizontal and vertical lines and intersecting them, then rules are used to determine the text area and reduce non text area. The approach focuses on using non-machine learning methods with simple calculations. ATE shows encouraging results which can be efficiently used in image spam filtering. Besides its robustness against obfuscating methods in image spam, ATE shows efficient performance when used for scene text detection.
Copyright © 2015 Praise Worthy Prize - All rights reserved.

Keywords


Text Detection; Image Spam; Obfuscation Text; Security; Edge Operator; Intersection; Rule-Based; Horizontal and Vertical Lines

Full Text:

PDF


References


S. Nazirova, (2010). Mechanism of classification of text spam messages collected in spam pattern bases. In Proceedings of 3rd International Conference on Problems of Cybernetics and Informatics (Vol. 2, pp. 206-209).

Z. Chuan, Xianliang, L., & Mengshu, H. (2005). A Rapid URL-Based Spam Filtering System. Computer Science, 32, 55-57.

C. Mohd Foozy, Ahmad, R., Abdollah, M., A Framework for SMS Spam and Phishing Detection in Malay Language: a Case Study, (2014) International Review on Computers and Software (IRECOS), 9(7), pp. 1248-1255.

C. Mohd Foozy, Ahmad, R., Abdollah, M., A Practical Rule Based Technique by Splitting SMS Phishing from SMS Spam for Better Accuracy in Mobile Device, (2014) International Review on Computers and Software (IRECOS), 9(10), pp. 1776-1782
http://dx.doi.org/10.15866/irecos.v9i10.3909

H. B. Aradhye, Myers, G. K., & Herson, J. A. (2005, August). Image analysis for efficient categorization of image-based spam e-mail. In Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on (pp. 914-918). IEEE.
http://dx.doi.org/10.1109/icdar.2005.135

G. Fumera, I. Pillai, F. Roli, Spam filtering based on the analysis of text information embedded into images. Journal of Machine Learning Research (special issue on Machine Learning in Computer Security) 7, 2699–2720. 2006.

B. Mehta, S. Nangia, M. Gupta, W. Nejdl, 2008. Detecting image spam using visual features and near duplicate detection. In: Proc. 17th Int. Conf. on World Wide Web. ACM, pp. 497–506.
http://dx.doi.org/10.1145/1367497.1367565

Z. Wang, W. Josephson, Q. Lv, M. Charikar, K. Li, 2007. Filtering image spam with near-duplicate detection. In: Proc. 4th Conf. on Email and Anti- Spam (CEAS).

J. Hsia, M. Chen, 2009. Language-model-based detection cascade for efficient classification of image-based spam e-mail. In: IEEE Int. Conf. on Multimedia and Expo, pp. 1182–1185.
http://dx.doi.org/10.1109/icme.2009.5202711

Q. Liu, Z. Qin, H. Cheng, M. Wan, 2010. Efficient modeling of spam images. In: Int. Symp. on Intelligent Information Technology and Security Informatics. IEEE Computer Society, pp. 663–666.
http://dx.doi.org/10.1109/iitsi.2010.40

H. Zuo, W. Hu, O. Wu, Y. Chen, G. Luo, 2009a. Detecting image spam using local invariant features and pyramid match kernel. In: Proc. 18th Int. Conf. on World Wide Web. ACM, pp. 1187–1188.
http://dx.doi.org/10.1145/1526709.1526921

H. Zuo, X. Li, O. Wu, W. Hu, G. Luo, 2009b. Image spam filtering using fourier-mellin invariant features. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing. IEEE, pp. 849–852.
http://dx.doi.org/10.1109/icassp.2009.4959717

J. Fabrizio; B. Marcotegui; M.Cord, "Text segmentation in natural scenes using Toggle-Mapping," Image Processing (ICIP), 2009 16th IEEE International Conference on , vol., no., pp.2373,2376, 7-10 Nov. 2009
http://dx.doi.org/10.1109/icip.2009.5413435

J. Kim; S.P. Park; S. Kim, "Text locating from natural scene images using image intensities," Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on , vol., no., pp.655,659 Vol. 2, 29 Aug.-1 Sept. 2005
http://dx.doi.org/10.1109/icdar.2005.232

L. Ma; C. Wang; B. Xiao, "Text detection in natural images based on multi-scale edge detetion and classification," Image and Signal Processing (CISP), 2010 3rd International Congress on , vol.4, no., pp.1961,1965, 16-18 Oct. 2010
http://dx.doi.org/10.1109/cisp.2010.5648158

Y.-F. Pan; X. Hou; C.-L. Liu, "A Hybrid Approach to Detect and Localize Texts in Natural Scene Images," Image Processing, IEEE Transactions on , vol.20, no.3, pp.800,813, March 2011
http://dx.doi.org/10.1109/tip.2010.2070803

T.Q. Phan; P. Shivakumara; C.L. Tan, "Text detection in natural scenes using Gradient Vector Flow-Guided symmetry," Pattern Recognition (ICPR), 2012 21st International Conference on , vol., no., pp.3296,3299, 11-15 Nov. 2012

S. Shah.;C. Modi; M. Patel, "Novel approach for text extraction from natural images using ISEF edge detection," Emerging Trends in Networks and Computer Communications (ETNCC), 2011 International Conference on , vol., no., pp.487,491, 22-24 April 2011
http://dx.doi.org/10.1109/etncc.2011.6255887

C. Yi; Y. Tian, "Text String Detection From Natural Scenes by Structure-Based Partition and Grouping," Image Processing, IEEE Transactions on , vol.20, no.9, pp.2594,2605, Sept. 2011
http://dx.doi.org/10.1109/tip.2011.2126586

A. Jamil; I. Siddiqi; F. Arif; A. Raza, "Edge-Based Features for Localization of Artificial Urdu Text in Video Images," Document Analysis and Recognition (ICDAR), 2011 International Conference on , vol., no., pp.1120,1124, 18-21 Sept. 2011
http://dx.doi.org/10.1109/icdar.2011.226

Z. Ji; J. Wang; Y.-T. Su, "Text detection in video frames using hybrid features," Machine Learning and Cybernetics, 2009 International Conference on , vol.1, no., pp.318,322, 12-15 July 2009
http://dx.doi.org/10.1109/icmlc.2009.5212547

S. Shi; T. Cheng; S. Xiao; X. Lv, "A Smart Approach for Text Detection, Localization and Extraction in Video Frames," Information Technology and Computer Science, 2009. ITCS 2009. International Conference on , vol.1, no., pp.158,161, 25-26 July 2009
http://dx.doi.org/10.1109/itcs.2009.40

P. Shivakumara; T.Q. Phan; C.L. Tan, "A Gradient Difference Based Technique for Video Text Detection," Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on Document Analysis and Recognition , 2009a
http://dx.doi.org/10.1109/icdar.2009.85

P. Shivakumara; T.Q. Phan; C.L. Tan, "Video text detection based on filters and edge features," Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on , vol., no., pp.514,517, June 28 2009-July 3 2009
http://dx.doi.org/10.1109/icme.2009.5202546

Y. Song; W. Wang, "Text Localization and Detection for News Video," Information and Computing Science, 2009. ICIC '09. Second International Conference on , vol.2, no., pp.98,101, 21-22 May 2009

J. Ye; L.-L. Huang; X. Hao, "Neural Network Based Text Detection in Videos Using Local Binary Patterns," Pattern Recognition, 2009. CCPR 2009. Chinese Conference on , vol., no., pp.1,5, 4-6 Nov. 2009
http://dx.doi.org/10.1109/ccpr.2009.5343973

J. Yi, Y. Peng, and J. Xiao. Using multiple frame integration for the text recognition of video. In International Conference on Document Analysis and Recognition, pages 71-75, 2009.
http://dx.doi.org/10.1109/icdar.2009.58

X. Zhao; K.-H. Lin; Y. Fu; Y. Hu; Y. Liu; T.S. Huang, "Text From Corners: A Novel Approach to Detect Text and Caption in Videos," Image Processing, IEEE Transactions on , vol.20, no.3, pp.790-799, March 2011
http://dx.doi.org/10.1109/tip.2010.2068553

Y.-L. Chen; B.-F. Wu, "Text Extraction from Complex Document Images Using the Multi-plane Segmentation Technique," Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on , vol.4, no., pp.3540,3547, 8-11 Oct. 2006
http://dx.doi.org/10.1109/icsmc.2006.384668

S. Grover; K. Arora; S.K. Mitra, "Text Extraction from Document Images Using Edge Information," India Conference (INDICON), 2009 Annual IEEE , vol., no., pp.1,4, 18-20 Dec. 2009
http://dx.doi.org/10.1109/indcon.2009.5409409

M. Li; Ch. Wang, "An adaptive text detection approach in images and video frames," Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on , vol., no., pp.72,77, 1-8 June 2008
http://dx.doi.org/10.1109/ijcnn.2008.4633769

Q. Yuan; C.L. Tan, "Text extraction from gray scale document images using edge information," Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on , vol., no., pp.302,306, 2001
http://dx.doi.org/10.1109/icdar.2001.953803

G. Aghajari, J. Shanbehzadeh, “A Text Localization Algorithm in Color Image via New Projection Profile”. International MultiConference of Engineers and Computer Scientists, 2010.

A.K. Jain and B. Yu, “Automatic Text Location in Images and Video Frames”, Pattern Recognition, Vol. 31(12), 1998, pp. 2055-2076.
http://dx.doi.org/10.1016/s0031-3203(98)00067-3

C.W. Lee, K. Jung and H.J. Kim, “Automatic text detection and removal in video sequences”, Pattern Recognition Letters 24, 2003, pp. 2607-2623.
http://dx.doi.org/10.1016/s0167-8655(03)00105-3

T.Q. Phan, P. Shivakumara, C.L. Tan ,” A Skeleton-Based Method for Multi-Oriented Video Text Detection, DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, 2010.
http://dx.doi.org/10.1145/1815330.1815365

P. Shivakumara, A. Dutta , C.L. Tan, U. Pal, "A new wavelet-median-moment based method for multi-oriented video text detection", in Proc. Document Analysis Systems, 2010, pp.279-286.
http://dx.doi.org/10.1145/1815330.1815366

P. Shivakumara, T.Q. Phan, C.L. Tan, "A Robust Wavelet Transform Based Technique for Video Text Detection". icdar, pp.1285-1289, 2009b 10th International Conference on Document Analysis and Recognition, 2009b.
http://dx.doi.org/10.1109/icdar.2009.83

Q. Ye, Q. Huang, W. Gao and D. Zhao, “Fast and robust text detection in images and video frames”, Image and Vision Computing 23, 2005, pp. 565-576.
http://dx.doi.org/10.1016/j.imavis.2005.01.004

P. Shivakumara; W. Huang; C.L. Tan, "An Efficient Edge Based Technique for Text Detection in Video Frames," Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on , vol., no., pp.307,314, 16-19 Sept. 2008
http://dx.doi.org/10.1109/das.2008.17

S.-H. Yen, H.-W. Chang, C.-J. Wang, and C.-W. Wang. 2010. Robust news video text detection based on edges and line-deletion. WSEAS Trans. Sig. Proc. 6, 4 (October 2010), 186-195.

M. Cai; J. Song; M.R. Lyu, "A new approach for video text detection," Image Processing”. Proceedings. 2002 International Conference on , vol.1, no., pp.I-117,I-120 vol.1, 2002
http://dx.doi.org/10.1109/icip.2002.1037973

P. Dubey, "Edge Based Text Detection for Multi-purpose Application," Signal Processing, 2006 8th International Conference on , vol.4, no., pp.,, 16-20 2006
http://dx.doi.org/10.1109/icosp.2006.346106

C. Zhu, Y.X. Ouyang, L. Gao, Z.Y. Chen, Z. Xiong, "An Automatic Video Text Detection, Localization and Extraction Approach," in Proc. of Inter. Conf. on Signal-Image Techno. & Internet-based Syst., 2006.

D. Chen, J.-M. Odobez, H. Bourlard, “Text detection and recognition in images and video frames, Pattern Recognition, Volume 37, Issue 3, March 2004, Pages 595-608
http://dx.doi.org/10.1016/j.patcog.2003.06.001

K. Sobottka, H. Bunke, H. Kronenberg, Identification of text on colored book and journal covers, in: International Conference on Document Analysis and Recognition, 1999, pp. 57–63.
http://dx.doi.org/10.1109/icdar.1999.791724

V. Torre, & Poggio, T. A. (1986). On edge detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, (2), 147-163.
http://dx.doi.org/10.1109/tpami.1986.4767769

G.T. Shrivakshan, & Chandrasekar, C. (2012). A comparison of various edge detection techniques used in image processing. IJCSI International Journal of Computer Science Issues, 9(5), 269-276.

B. Biggio, G. Fumera, I. Pillai, F. Roli, “Improving image spam filtering using image text features”. In: Proc. Fifth Conf. Email Anti-Spam (CEAS). 2008.
http://dx.doi.org/10.1109/iciap.2007.4362765

X.-C. Yin, X. Yin, and K. Huang, Robust Text Detection in Natural Scene Images.
http://dx.doi.org/10.1109/tpami.2013.182

G. Lazzara, R. Levillain, T. Géraud, Y. Jacquelet, J. Marquegnies, A. Crépin-Leblond, "The SCRIBO Module of the Olena Platform: A Free Software Framework for Document Image Analysis," icdar, pp.252-258, 2011 International Conference on Document Analysis and Recognition, 2011
http://dx.doi.org/10.1109/icdar.2011.59

Subrahmanyam, C., Venkata Rao, D., Usha Rani, N., Implementation of no reference distortion patch features image quality assessment algorithm based on human visual system, (2014) International Journal on Communications Antenna and Propagation (IRECAP), 4 (5), pp. 195-201.
http://dx.doi.org/10.15866/irecap.v4i5.4166


Refbacks

  • There are currently no refbacks.



Please send any question about this web site to info@praiseworthyprize.com
Copyright © 2005-2024 Praise Worthy Prize