Open Access Open Access  Restricted Access Subscription or Fee Access

Evaluating Knowledge-Based Semantic Measures on Arabic

Abdulgabbar Saif(1*), Mohd Juzaiddin Ab Aziz(2), Nazlia Omar(3)

(1) Knowledge Technology group, Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Malaysia
(2) Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), Malaysia
(3) Center for Artificial Intelligence Technology, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia (UKM), Malaysia
(*) Corresponding author



Semantic measures have received a wide attention from researchers to handle various issues in the different tasks of the computational linguistics and information retrieval. In this paper, we experimentally investigate the performances of the knowledge-based semantic measures on the Arabic language. The state-of-the-art semantic measures are adapted to two knowledge sources: a highly structured source (Arabic WordNet) and semi-structured source (Arabic Wikipedia). The performance of the different semantic measures is evaluated on four Arabic benchmark data sets of the word-to-word semantic similarity/relatedness task. The evaluation results show that Wikipedia is a competitive and promising knowledge source in terms of its high degree of coverage and the variety of the extractable semantic features.
Copyright © 2014 Praise Worthy Prize - All rights reserved.


Semantic Measures; Arabic WordNet; Wikipedia; Knowledge-Based Methods

Full Text:



M. Strube, S.P. Ponzetto, WikiRelate! Computing semantic relatedness using Wikipedia, (Page: 1419-1424, Year of Publication: 2006)

Y. Wang, J. Hodges, Document clustering with semantic analysis, (Page: 54c-54c, Year of Publication: 2006)

S. Banerjee, T. Pedersen, Extended gloss overlaps as a measure of semantic relatedness, (Page: 805-810, Year of Publication: 2003)

G. Varelas, E. Voutsakis, P. Raftopoulou, E.G. Petrakis, E.E. Milios, Semantic similarity methods in wordNet and their application to information retrieval on the web, (Page: 10-16, Year of Publication: 2005)

C. Fellbaum, WordNet: an electrical lexical database (The MIT Press, 1998).

Z. Wu, M. Palmer, Verbs semantics and lexical selection, (Page: 133-138, Year of Publication: 1994)

C. Leacock, M. Chodorow, Combining local context and WordNet similarity for word sense identification, WordNet: An electronic lexical database,Vol. 49, n. 2, pp. 265-283, 1998.

Y. Li, Z.A. Bandar, D. McLean, An approach for measuring semantic similarity between words using multiple information sources, Knowledge and Data Engineering, IEEE Transactions on,Vol. 15, n. 4, pp. 871-882, 2003.

M. Lesk, Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone, (Page: 24-26, Year of Publication: 1986)

T. Hughes, D. Ramage, Lexical Semantic Relatedness with Random Graph Walks, (Page, Year of Publication: 2007)

E. Agirre, E. Alfonseca, K. Hall, J. Kravalova, M. Paşca, A. Soroa, A study on similarity and relatedness using distributional and WordNet-based approaches, (Page: 19-27, Year of Publication: 2009)

E. Gabrilovich, S. Markovitch, Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis, (Page: 1606-1611, Year of Publication: 2007)

D. Milne, I. Witten, An effective, low-cost measure of semantic relatedness obtained from Wikipedia links, (Page: 25-30, Year of Publication: 2008)

S. Hassan, R. Mihalcea, Cross-lingual semantic relatedness using encyclopedic knowledge, (Page: 1192-1201, Year of Publication: 2009)

S. Hassan, R. Mihalcea, Semantic Relatedness Using Salient Semantic Analysis, (Page, Year of Publication: 2011)

R. Mihalcea, Using Wikipedia for Automatic Word Sense Disambiguation, (Page: 196-203, Year of Publication: 2007)

A. Saif, M.J. Ab Aziz, N. Omar, Measuring the Compositionality of Arabic Multiword Expressions, in: Soft Computing Applications and Intelligent Systems,(Springer, 2013, pp. 245-256).

S. Cucerzan, Large-Scale Named Entity Disambiguation Based on Wikipedia Data, (Page: 708-716, Year of Publication: 2007)

T. Zesch, I. Gurevych, Wisdom of crowds versus wisdom of linguists–measuring the semantic relatedness of words, Natural Language Engineering,Vol. 16, n. 1, pp. 25, 2010.

P. Resnik, Using information content to evaluate semantic similarity in a taxonomy, arXiv preprint cmp-lg/9511007,Vol., n., pp., 1995.

D. Lin, An information-theoretic definition of similarity, (Page: 296-304, Year of Publication: 1998)

J.J. Jiang, D.W. Conrath, Semantic similarity based on corpus statistics and lexical taxonomy, arXiv preprint cmp-lg/9709008,Vol., n., pp., 1997.

A. Budanitsky, G. Hirst, Evaluating wordnet-based measures of lexical semantic relatedness, Computational Linguistics,Vol. 32, n. 1, pp. 13-47, 2006.

T. Zesch, I. Gurevych, M. Mühlhäuser, Comparing wikipedia and german wordnet by evaluating semantic relatedness on multiple datasets, (Page: 205-208, Year of Publication: 2007)

T. Pedersen, S.V. Pakhomov, S. Patwardhan, C.G. Chute, Measures of semantic similarity and relatedness in the biomedical domain, Journal of biomedical informatics,Vol. 40, n. 3, pp. 288-299, 2007.

V. Garla, C. Brandt, Semantic similarity in the biomedical domain: an evaluation across knowledge sources, BMC Bioinformatics,Vol. 13, n. 1, pp. 1-13, 2012.

F.A. Almarsoomi, J.D. OShea, Z. Bandar, K. Crockett, AWSS: An Algorithm for Measuring Arabic Word Semantic Similarity, (Page: 504-509, Year of Publication: 2013)

Z. Zhang, A. Gentile, F. Ciravegna, Recent advances in methods of lexical semantic relatedness–a survey, Natural Language Engineering,Vol. 1, n. 1, pp. 1-69, 2012.

S. Elkateb, W. Black, H. Rodríguez, M. Alkhalifa, P. Vossen, A. Pease, C. Fellbaum, Building a wordnet for arabic, (Page, Year of Publication: 2006)

H. Rodríguez, D. Farwell, J. Farreres, M. Bertran, M. Alkhalifa, M.A. Martí, W. Black, S. Elkateb, J. Kirk, A. Pease, Arabic wordnet: Current state and future extensions, (Page, Year of Publication: 2008)

O. Medelyan, D. Milne, C. Legg, I.H. Witten, Mining meaning from Wikipedia, International Journal of Human-Computer Studies,Vol. 67, n. 9, pp. 716-754, 2009.

E. Niemann, I. Gurevych, The people’s web meets linguistic knowledge: Automatic sense alignment of Wikipedia and WordNet, (Page: 205-214, Year of Publication: 2011)

T. Zesch, I. Gurevych, Analysis of the Wikipedia category graph for NLP applications, (Page: 1-8, Year of Publication: 2007)

R. Rada, H. Mili, E. Bicknell, M. Blettner, Development and application of a metric on semantic nets, Systems, Man and Cybernetics, IEEE Transactions on,Vol. 19, n. 1, pp. 17-30, 1989.

X.-Y. Liu, Y.-M. Zhou, R.-S. Zheng, Measuring semantic similarity in WordNet, (Page: 3431-3435, Year of Publication: 2007)

H. Al-Mubaid, H.A. Nguyen, A Cluster-Based Approach for Semantic Similarity in the Biomedical Domain, (Page: 2713-2717, Year of Publication: 2006)

X. Wu, L. Zhu, J. Guo, D.-Y. Zhang, K. Lin, Prediction of yeast protein–protein interaction network: insights from the Gene Ontology and annotations, Nucleic acids research,Vol. 34, n. 7, pp. 2137-2150, 2006.

N. Seco, T. Veale, J. Hayes, An intrinsic information content metric for semantic similarity in WordNet, (Page: 1089, Year of Publication: 2004)

Z. Zhou, Y. Wang, J. Gu, A new model of information content for semantic similarity in WordNet, (Page: 85-89, Year of Publication: 2008)

D. Sánchez, M. Batet, D. Isern, Ontology-based information content computation, Knowledge-Based Systems,Vol. 24, n. 2, pp. 297-303, 2011.

D. Sánchez, M. Batet, A new model to compute the information content of concepts from taxonomic knowledge, International Journal on Semantic Web and Information Systems (IJSWIS),Vol. 8, n. 2, pp. 34-50, 2012.

H. Taieb, M. Ben Aouicha, M. Tmar, A.B. Hamadou, New information content metric and nominalization relation for a new WordNet-based method to measure the semantic relatedness, (Page: 51-58, Year of Publication: 2011)

L. Meng, J. Gu, Z. Zhou, A New Model of Information Content Based on Concept's Topology for Measuring Semantic Similarity in WordNet, International Journal of Grid & Distributed Computing,Vol. 5, n. 3, pp., 2012.

G. Pirró, J. Euzenat, A feature and information theoretic framework for semantic similarity and relatedness, in: The Semantic Web–ISWC 2010,(Springer, 2010, pp. 615-630).

I. Gurevych, Using the structure of a conceptual network in computing semantic relatedness, in: Natural Language Processing–IJCNLP 2005,(Springer, 2005, pp. 767-778).

M. Batet, D. Sánchez, A. Valls, An ontology-based measure to compute semantic similarity in biomedicine, Journal of biomedical informatics,Vol. 44, n. 1, pp. 118-125, 2011.

R.L. Cilibrasi, P.M. Vitanyi, The google similarity distance, Knowledge and Data Engineering, IEEE Transactions on,Vol. 19, n. 3, pp. 370-383, 2007.

M.A. Taieb, M. Ben Aouicha, A. Ben Hamadou, Computing semantic relatedness using Wikipedia features, Knowledge-Based Systems,Vol. 50, n., pp. 260-278, 2013.

Y. Alhanini, M.J. Ab Aziz, The Enhancement of Arabic Stemming by Using Light Stemming and Dictionary-Based Stemming, JSEA,Vol. 4, n. 9, pp. 522-526, 2011.

G.A. Miller, W.G. Charles, Contextual correlates of semantic similarity, Language and cognitive processes,Vol. 6, n. 1, pp. 1-28, 1991.

L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, E. Ruppin, Placing search in context: The concept revisited, (Page: 406-414, Year of Publication: 2001)

F.A. Almarsoomi, J.D. O'Shea, Z.A. Bandar, K.A. Crockett, Arabic word semantic similarity, (Page, Year of Publication: 2012)

L. Abouenour, K. Bouzoubaa, P. Rosso, On the evaluation and improvement of Arabic WordNet coverage and usability, Language resources and evaluation,Vol. 47, n. 3, pp. 891-917, 2013.

M. Jarmasz, S. Szpakowicz, S.: Roget’s thesaurus and semantic similarity, (Page, Year of Publication: 2003)


  • There are currently no refbacks.

Please send any questions about this web site to
Copyright © 2005-2017 Praise Worthy Prize