Arabic Named Entity Recognition for Crime Documents Using Classifiers Combination

Named Entity Recognition (NER) is the process of identifying proper names including person’s name, organization’s name, location’s name, dates and currencies. It plays an essential role in multiple domains such as Information Extraction (IE), Sentiment Analysis (SA) and Question Answering (QA). There have been various research efforts in terms of Arabic NER. However, identifying Arabic named entities is a challenging task regarding to the complexity that lies in Arabic language. Such complexity can be represented by the non-existence of capitalization feature which facilitates the process of NER. Furthermore, there is a lack of lexical corpora that may include all the Arabic NEs. On other hand, most of the approaches that have been proposed for Arabic NER were based on handcrafted rule-based methods which can be laborious and time consuming. Therefore, this paper aims to propose an Arabic NER based on a combination of classifiers and feature extraction for crime dataset. The dataset has been collected from online resources and undergone multiple pre-processing tasks including normalization, tokenization and stemming. Moreover, the feature extraction task which contains POS tagging, keyword trigger, definite articles and affixes has been performed in order to improve the process of recognition. Hence, three classifiers, which are Naïve Bayes (NB), Decision Tree (DT) and a sequential combination between them, will utilize these features as training set in order to classify the named entities. The experimental results have shown that the combination of DT and NB with feature extraction outperform the individual classifier by achieving 94.19% of F-measure. This shows that the combination of multiple classifiers has a significant impact on the effectiveness of Arabic NER.
Named Entity Recognition; Arabic; Naïve Bayes; Decision Tree

