A Back Propagation Neural Network for Identifying Multi-Word Biomedical Named Entities

Baydaa Hashim; Nazlia Omar

doi:10.15866/irecos.v11i8.9650

A Back Propagation Neural Network for Identifying Multi-Word Biomedical Named Entities

Baydaa Hashim^(1*), Nazlia Omar⁽²⁾

^(*) Corresponding author

Authors' affiliations

DOI: https://doi.org/10.15866/irecos.v11i8.9650

Abstract

Biomedical Named Entity Recognition (BNER) is the task of classifying biomedical instances such as genes, proteins, diseases, chemical compounds and others. Several approaches have been proposed for BNER specifically supervised machine learning techniques. Most of these techniques demonstrated reasonable performance. However, there is still a gap that lies on the multi-word BNEs such as the chemical compound ‘Tri-acetyl-glucal-galactono-lactone’ in which the classifier could not recognize these instances due to the complex characters used to separate the words. Complex characters could be punctuation (e.g. – *_^) or numeric characters. Therefore, this paper aims to propose a Back-propagation Neural Network (BPNN) for identifying BNEs. BPNN has the ability to encode the characters which facilitate the identification of multi-word BNEs. For this purpose, this study proposed multiple features for the encoding task including digits, special characters, affixes and capitalization. Experiments have been conducted using two benchmark datasets including SCAI and GENIA. SCAI is a corpus that contains chemical compounds, whereas GENIA is a corpus that contains multiple biomedical instances such as genes, proteins, DNA and RNA. Using 80% training and 20% testing, BPNN has shown 90% f-measure for the SCAI corpus and 82% f-measure for the GENIA corpus. Such results emphasize an enhancement of f-measure when compared to other related work. This implies that BPNN is effective in classifying BNEs.
Copyright © 2016 Praise Worthy Prize - All rights reserved.

Keywords

Named Entity Recognition; Biomedical Named Entity Recognition; Back-Propagation Neural Network; SCAI; GENIA

Full Text:

PDF

References

David Campos, Sérgio Matos, and José Luís Oliveira, "Biomedical named entity recognition: a survey of machine-learning tools," Theory and Applications for Advanced Text Mining. InTech, pp. 175-195, 2012.
http://dx.doi.org/10.5772/51066

Al-Shoukry, S., Omar, N., Arabic Named Entity Recognition for Crime Documents Using Classifiers Combination, (2015) International Review on Computers and Software (IRECOS), 10 (6), pp. 628-634.
http://dx.doi.org/10.15866/irecos.v10i6.6767

Lishuang Li, Rongpeng Zhou, and Degen Huang, "Two-phase biomedical named entity recognition using CRFs," Computational Biology and Chemistry, vol. 33, pp. 334-338, 2009.
http://dx.doi.org/10.1016/j.compbiolchem.2009.07.004

Riza Batista-Navarro, Rafal Rak, and Sophia Ananiadou, "Optimising chemical named entity recognition with pre-processing analytics, knowledge-rich features and heuristics," J Chem Inf, vol. 7, p. S6, 2015.
http://dx.doi.org/10.1186/1758-2946-7-s1-s6

Anabel Usié, Joaquim Cruz, Jorge Comas, F Solson, and Rui Alves, "CheNER: a tool for the identification of chemical entities and their classes in biomedical literature," J Cheminform, vol. 7, p. S15, 2015.
http://dx.doi.org/10.1186/1758-2946-7-s1-s15

Martin Krallinger, Obdulia Rabal, Florian Leitner, Miguel Vazquez, David Salgado, Zhiyong Lu, Robert Leaman, Yanan Lu, Donghong Ji, and Daniel M Lowe, "The CHEMDNER corpus of chemicals and drugs and its annotation principles," Journal of cheminformatics, vol. 7, p. S2, 2015.
http://dx.doi.org/10.1186/1758-2946-7-s1-s2

Andre Lamurias, Tiago Grego, and Francisco M Couto, "Chemical compound and drug name recognition using CRFs and semantic similarity based on ChEBI," in BioCreative Challenge Evaluation Workshop, 2013, p. 75.doi.

Christoph M Friedrich, Thomas Revillion, Martin Hofmann, and Juliane Fluck, "Biomedical and chemical named entity recognition with conditional random fields: The advantage of dictionary features," in Proceedings of the Second International Symposium on Semantic Mining in Biomedicine (SMBM 2006), 2006, pp. 85-89.doi.

Kirill Degtyarenko, Paula De Matos, Marcus Ennis, Janna Hastings, Martin Zbinden, Alan Mcnaught, Rafael Alcántara, Michael Darsow, Mickaël Guedj, and Michael Ashburner, "ChEBI: a database and ontology for chemical entities of biological interest," Nucleic acids research, vol. 36, pp. D344-D350, 2008.
http://dx.doi.org/10.1093/nar/gkm791

Peter Corbett and Ann Copestake, "Cascaded classifiers for confidence-based chemical named entity recognition," BMC bioinformatics, vol. 9, p. S4, 2008.
http://dx.doi.org/10.1186/1471-2105-9-s11-s4

Paula De Matos, Rafael Alcántara, Adriano Dekker, Marcus Ennis, Janna Hastings, Kenneth Haug, Inmaculada Spiteri, Steve Turner, and Christoph Steinbeck, "Chemical entities of biological interest: an update," Nucleic acids research, p. gkp886, 2009.
http://dx.doi.org/10.1093/nar/gkp886

Tim Rocktäschel, Michael Weidlich, and Ulf Leser, "ChemSpot: a hybrid system for chemical named entity recognition," Bioinformatics, vol. 28, pp. 1633-1640, 2012.
http://dx.doi.org/10.1093/bioinformatics/bts183

Corinna Kolárik, Roman Klinger, Christoph M Friedrich, Martin Hofmann-Apitius, and Juliane Fluck, "Chemical names: terminological resources and corpora annotation," in Workshop on Building and evaluating resources for biomedical text mining (6th edition of the Language Resources and Evaluation Conference), 2008.

Tomoko Ohta, Yuka Tateisi, and Jin-Dong Kim, "The GENIA corpus: An annotated research abstract corpus in molecular biology domain," in Proceedings of the second international conference on Human Language Technology Research, 2002, pp. 82-86.doi.
http://dx.doi.org/10.3115/1289189.1289260

J-D Kim, Tomoko Ohta, Yuka Tateisi, and Jun’ichi Tsujii, "GENIA corpus—a semantically annotated corpus for bio-textmining," Bioinformatics, vol. 19, pp. i180-i182, 2003.
http://dx.doi.org/10.1093/bioinformatics/btg1023

E Alharbi and S Tiun, "A Hybrid Method of Linguistic Features and Clustering Approach for Identifying Biomedical Named Entities," Asian Journal of Applied Sciences, vol. 8, pp. 210-216, 2015.
http://dx.doi.org/10.3923/ajaps.2015.210.216

Xinglong Wang, Jun'ichi Tsujii, and Sophia Ananiadou, "Disambiguating the species of biomedical named entities using natural language parsers," Bioinformatics, vol. 26, pp. 661-667, 2010.
http://dx.doi.org/10.1093/bioinformatics/btq002

Hasan Muaidi, "Extraction Of Arabic Word Roots: An Approach Based on Computational Model and Multi-Backpropagation Neural Networks," PhD thesis, De Montfort University-UK, 2008Retrieved from.

Shuxiang Xu and Ling Chen, "A novel approach for determining the optimal number of hidden layer neurons for FNN’s and its application in data mining," 2008.

Aleksandar Savkov, John Carroll, and Jackie Cassell, "Chunking clinical text containing non-canonical language," 2014.
http://dx.doi.org/10.3115/v1/w14-3411

Sagara Sumathipala, Koichi Yamada, Muneyuki Unehara, and Izumi Suzuki, "Protein Named Entity Identification Based on Probabilistic Features Derived from GENIA Corpus and Medical Text on the Web," International Journal of Fuzzy Logic and Intelligent Systems, vol. 15, pp. 111-120, 2015.
http://dx.doi.org/10.5391/ijfis.2015.15.2.111

Sriparna Saha, Asif Ekbal, and Utpal Kumar Sikdar, "Named entity recognition and classification in biomedical text using classifier ensemble," International journal of data mining and bioinformatics, vol. 11, pp. 365-391, 2015.
http://dx.doi.org/10.1504/ijdmb.2015.067954

Khaldoon Mezher and Nazlia Omar, "A backpropagation neural network to improve arabic stemming," Journal of Theoretical and Applied Information Technology, vol. 82, p. 385, 2015.

Henry Feild, David Binkley, and Dawn Lawrie, "An empirical comparison of techniques for extracting concept abbreviations from identifiers," in Proceedings of IASTED International Conference on Software Engineering and Applications (SEA’06), 2006.

Refbacks

There are currently no refbacks.

Username
Password
Remember me