Arabic Named Entity Recognition for Crime Documents Using Classifiers Combination
(*) Corresponding author
DOI: https://doi.org/10.15866/irecos.v10i6.6767
Abstract
Named Entity Recognition (NER) is the process of identifying proper names including person’s name, organization’s name, location’s name, dates and currencies. It plays an essential role in multiple domains such as Information Extraction (IE), Sentiment Analysis (SA) and Question Answering (QA). There have been various research efforts in terms of Arabic NER. However, identifying Arabic named entities is a challenging task regarding to the complexity that lies in Arabic language. Such complexity can be represented by the non-existence of capitalization feature which facilitates the process of NER. Furthermore, there is a lack of lexical corpora that may include all the Arabic NEs. On other hand, most of the approaches that have been proposed for Arabic NER were based on handcrafted rule-based methods which can be laborious and time consuming. Therefore, this paper aims to propose an Arabic NER based on a combination of classifiers and feature extraction for crime dataset. The dataset has been collected from online resources and undergone multiple pre-processing tasks including normalization, tokenization and stemming. Moreover, the feature extraction task which contains POS tagging, keyword trigger, definite articles and affixes has been performed in order to improve the process of recognition. Hence, three classifiers, which are Naïve Bayes (NB), Decision Tree (DT) and a sequential combination between them, will utilize these features as training set in order to classify the named entities. The experimental results have shown that the combination of DT and NB with feature extraction outperform the individual classifier by achieving 94.19% of F-measure. This shows that the combination of multiple classifiers has a significant impact on the effectiveness of Arabic NER.
Copyright © 2015 Praise Worthy Prize - All rights reserved.
Keywords
Full Text:
PDFReferences
Khaled Shaalan and Hafsa Raza, "Arabic named entity recognition from diverse text types," in Advances in Natural Language Processing, ed: Springer, 2008, pp. 440-451.
http://dx.doi.org/10.1007/978-3-540-85287-2_42
Khaled Shaalan and Hafsa Raza, "NERA: Named entity recognition for Arabic," Journal of the American Society for Information Science and Technology, vol. 60, pp. 1652-1663, 2009.
http://dx.doi.org/10.1002/asi.21090
Yassine Benajiba, Paolo Rosso, and José Miguel Benedíruiz, "Anersys: An arabic named entity recognition system based on maximum entropy," in Computational Linguistics and Intelligent Text Processing, ed: Springer, 2007, pp. 143-153.
http://dx.doi.org/10.1007/978-3-540-70939-8_13
Sherief Abdallah, Khaled Shaalan, and Muhammad Shoaib, "Integrating rule-based system with classification for Arabic named entity recognition," in Computational Linguistics and Intelligent Text Processing, ed: Springer, 2012, pp. 311-322.
http://dx.doi.org/10.1007/978-3-642-28604-9_26
Y. Benajiba, M. Diab, and P. Rosso, "Arabic Named Entity Recognition: A Feature-Driven Study," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 17, pp. 926-934, 2009.
http://dx.doi.org/10.1109/tasl.2009.2019927
Michael Chau, Jennifer J Xu, and Hsinchun Chen, "Extracting meaningful entities from police narrative reports," in Proceedings of the 2002 annual national conference on Digital government research, 2002, pp. 1-5.
Conrad Chen and Hsi-Jian Lee, "A Three-Phase System for Chinese Named Entity Recognition," in ROCLING, 2004.
Kiyoshi Sudo, Satoshi Sekine, and Ralph Grishman, "Automatic pattern acquisition for Japanese information extraction," in Proceedings of the first international conference on Human language technology research, 2001, pp. 1-7.
http://dx.doi.org/10.3115/1072133.1072142
Nattapong Tongtep and Thanaruk Theeramunkong, "Pattern-based extraction of named entities in Thai news documents," Thammasat Int. J. Sc. Tech, vol. 15, 2010.
http://dx.doi.org/10.1007/978-3-642-01393-5_16
Meshrif Alruily, Aladdin Ayesh, and Hussein Zedan, "Crime type document classification from arabic corpus," in Developments in eSystems Engineering (DESE), 2009 Second International Conference on, 2009, pp. 153-159.
http://dx.doi.org/10.1109/dese.2009.50
Meshrif Alruily, Aladdin Ayesh, and Abdulsamad Al-Marghilani, "Using self organizing map to cluster Arabic crime documents," in Computer Science and Information Technology (IMCSIT), Proceedings of the 2010 International Multiconference on, 2010, pp. 357-363.
http://dx.doi.org/10.1109/imcsit.2010.5679616
M Asharef, N Omar, and M Albared, "Arabic named entity recognition in crime documents," Journal of Theoretical and Applied Information Technology, vol. 44, pp. 1-6, 2012.
Hafedh Shabat, Nazlia Omar, and Khmael Rahem, "Named Entity Recognition in Crime Using Machine Learning Approach," in Information Retrieval Technology, ed: Springer, 2014, pp. 280-288.
http://dx.doi.org/10.1007/978-3-319-12844-3_24
M Asharef, N Omar, M Albared, Zhou Minhui, Wang Weiming, Zhou Jingjing, Zulkarnain Md Ali, Wu Fei, Zhang Dongsong, and Fu Yu, "Arabic named entity recognition in crime documents," Journal of Theoretical and Applied Information Technology, vol. 44, pp. 1-6, 2012.
Yassine Benajiba, Mona Diab, and Paolo Rosso, "Arabic named entity recognition using optimized feature sets," in Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2008, pp. 284-293.
http://dx.doi.org/10.3115/1613715.1613755
Christoph M Friedrich, Thomas Revillion, Martin Hofmann, and Juliane Fluck, "Biomedical and chemical named entity recognition with conditional random fields: The advantage of dictionary features," in Proceedings of the Second International Symposium on Semantic Mining in Biomedicine (SMBM 2006), 2006, pp. 85-89.
Mohammed Albared, Nazlia Omar, and Mohd Juzaiddin Ab Aziz, "Improving Arabic Part-of-Speech Tagging through Morphological Analysis," in Intelligent Information and Database Systems. vol. 6591, N. Nguyen, C.-G. Kim, and A. Janiak, Eds., ed: Springer Berlin Heidelberg, 2011, pp. 317-326.
http://dx.doi.org/10.1007/978-3-642-20039-7_32
Lior Rokach, Data mining with decision trees: theory and applications: World scientific, 2007.
http://dx.doi.org/10.1142/9789812771728
Geetha, V., Chandrakala, D., Nadarajan, R., Dass, C.K., A bayesian classification approach for handling uncertainty in adaptive E-assessment, (2013) International Review on Computers and Software (IRECOS), 8 (4), pp. 1045-1052.
Mikalai Yatskevich, "Preliminary evaluation of schema matching systems" 2003.
Refbacks
- There are currently no refbacks.
Please send any question about this web site to info@praiseworthyprize.com
Copyright © 2005-2024 Praise Worthy Prize