Open Access Open Access  Restricted Access Subscription or Fee Access

Arabic Speech Recognition for Connected Words Using HTK: Triphones Expanded to Gmm Based Quran Recognition


(*) Corresponding author


Authors' affiliations


DOI: https://doi.org/10.15866/irecos.v11i12.11091

Abstract


In this paper, the authors describe and propose an efficient and effective method of speaker-independent continuous Arabic speech recognition method, based on a phonetically rich speech corpus. This speech corpus contains two datasets: (1) the Spoken Arabic Digits (SAD), recorded by 66 speakers (33 male and 33 female), holds 6600 digit words in which each speaker pronounces each digit from zero to nine ten times, and (2) the second dataset is restrained to quranic sentences of three famous speakers with Tajweed rules (somehow of singing) of the last 30 chapters (Surat) of the Holy Quran. The problematic of co-articulation was treated in general with triphones as acoustic model and bigram as language model which was the most appropriate in this text; and the adjustment’s problem of sound duration in particular was also treated, with triphones expanded to Gaussians mixtures models (GMM). The proposed Arabic speech recognition system is based on the Cambridge Hidden Markov Model (HMM) Toolkit (HTK) tools. The Experimental tests show that the digit dataset using 3 emitting states per phone, has an excellent word recognition (WR) rate of 97.95% and sentence recognition (SR) rate of 93.14%. The best result of Quranic phrases obtained is 73.44% of WR rate and 14.38% of SR rate with 66.09% of accuracy within “Elmirigli” reader. Adapting the basic system to the speaker’s intonation voice, then applying an appropriate GMM to each tied-list triphone, it outperforms the Hidden Markov Models HMM-triphone acoustic model experiment by 16.93% of WR rate within “Alsudaissi” reader thanks to the solved duration vowels (mudud) problem. The contribution of this work is to apply triphones expanded to Gmms method which was compared to Maximum Likelihood Linear Regression (MLLR) and beat it by almost 5% in “Alsudaissi” reader.
Copyright © 2016 Praise Worthy Prize - All rights reserved.

Keywords


Arabic Speech Recognition; Hmm Toolkit; Acoustic Model; Context Dependent; Context Independent

Full Text:

PDF


References


Martin, J. H., & Jurafsky, D. (2000). Speech and language processing. International Edition, 710.
http://dx.doi.org/10.1515/zfsw.2002.21.1.134

Kumar, K., Aggarwal, R. K., & Jain, A. (2012). A Hindi speech recognition system for connected words using HTK. International Journal of Computational Systems Engineering, 1(1), 25-32.
http://dx.doi.org/10.1504/ijcsyse.2012.044740

Jafri, A., Sobh, I., & Alkhairy, A. (2015). Statistical Formant Speech Synthesis for Arabic. Arabian Journal for Science and Engineering, 40(11), 3151-3159.
http://dx.doi.org/10.1007/s13369-015-1771-1

Tahiry, K., Mounir, B., Mounir, I., & Farchi, A. (2016). Energy bands and spectral cues for Arabic vowels recognition. International Journal of Speech Technology, 19(4), 707-716.
http://dx.doi.org/10.1007/s10772-016-9363-3

Alsulaiman, M., Mahmood, A., & Muhammad, G. (2017). Speaker recognition based on Arabic phonemes. Speech Communication, 86, 42-51.
http://dx.doi.org/10.1016/j.specom.2016.11.004

Hyassat, H., & Zitar, R. A. (2006). Arabic speech recognition using SPHINX engine. International Journal of Speech Technology, 9(3-4), 133-150.
http://dx.doi.org/10.1007/s10772-008-9009-1

Labidi, M., Maraoui, M., & Zrigui, M. (2016, November). New birth of the Arabic phonetic dictionary. In Engineering & MIS (ICEMIS), International Conference on (pp. 1-9). IEEE.
http://dx.doi.org/10.1109/icemis.2016.7745322

Almisreb, A. A., Abidin, A. F., & Tahir, N. M. (2015, August). Investigation of Dynamic Time Warping and Neural Network for Arabic phonemes recognition based Malay speakers. In 2015 5th IEEE International Conference on System Engineering and Technology (ICSET) (pp. 60-65). IEEE.
http://dx.doi.org/10.1109/icsengt.2015.7412446

Alotaibi, Y. A., Alghamdi, M., & Alotaiby, F. (2010, June). Speech recognition system of Arabic alphabet based on a telephony Arabic corpus. In International Conference on Image and Signal Processing (pp. 122-129). Springer Berlin Heidelberg.
http://dx.doi.org/10.1007/978-3-642-13681-8_15

Alotaibi, Y. A. (2012). Comparing ANN to HMM in implementing limited Arabic vocabulary ASR systems. International Journal of Speech Technology, 15(1), 25-32.
http://dx.doi.org/10.1007/s10772-011-9107-3

Park, H., & Yoo, C. D. (2014, June). Tied triphone semi-Markov model for large vocabulary continuous speech recognition. In The 18th IEEE International Symposium on Consumer Electronics (ISCE 2014) (pp. 1-2). IEEE.
http://dx.doi.org/10.1109/isce.2014.6884532

El Moubtahij, H., Halli, A., & Satori, K. (2014). Arabic Handwriting Text Offline Recognition Using the HMM Toolkit (HTK). Computers and Software, 1214.

Abolohom, A., & Omar, N. (2014). A machine learning approach to anaphora resolution in Arabic. International Review on Computers and Software, 9(12), 1956-1963.
http://dx.doi.org/10.15866/irecos.v9i12.4786

Hassine, M., Boussaid, L., & Messaoud, H. (2016). Maghrebian dialect recognition based on support vector machines and neural network classifiers. International Journal of Speech Technology, 19(4), 687-695.
http://dx.doi.org/10.1007/s10772-016-9360-6

Nahar, K. M., Shquier, M. A., Al-Khatib, W. G., Al-Muhtaseb, H., & Elshafei, M. (2016). Arabic phonemes recognition using hybrid LVQ/HMM model for continuous speech recognition. International Journal of Speech Technology, 1-14.
http://dx.doi.org/10.1007/s10772-016-9337-5

Adetunmbi, O. A., Obe, O. O., & Iyanda, J. N. (2016). Development of Standard Yorùbá speech-to-text system using HTK. International Journal of Speech Technology, 19(4), 929-944.
http://dx.doi.org/10.1007/s10772-016-9380-2

Baig, M. M. A., Qazi, S. A., & Kadri, M. B. (2015). Discriminative Training for Phonetic Recognition of the Holy Quran. Arabian Journal for Science and Engineering, 40(9), 2629-2640.
http://dx.doi.org/10.1007/s13369-015-1693-y

Zarrouk, E., Benayed, Y., & Gargouri, F. (2015, June). Graphical models for the recognition of Arabic continuous speech based triphones modeling. In Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2015 16th IEEE/ACIS International Conference on (pp. 1-6). IEEE.
http://dx.doi.org/10.1109/snpd.2015.7176269

Mourtaga, E., Sharieh, A., & Abdallah, M. (2007, April). Speaker independent Quranic recognizer based on maximum likelihood linear regression. InProceedings of world academy of science, engineering and technology (Vol. 36, pp. 61-67).
http://dx.doi.org/10.1109/iscslp.2010.5684836

"Spoken Arabic Digit Data Set." ; http://archive.ics.uci.edulml/datasets/Spoken+Arabic+Digit
http://dx.doi.org/10.4197/eng.20-1.2

Yekache, Y., Mekelleche, Y., & Kouninef, B. (2012). Towards Quranic reader controlled by speech. arXiv preprint arXiv:1204.1566.
http://dx.doi.org/10.14569/ijacsa.2011.021123

Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., ... & Valtchev, V. (2006). The HTK book (v3. 4). Cambridge University.
http://dx.doi.org/10.1109/icassp.2004.1325969

Elhadj, Y. O. M., Alghamdi, M., & Alkanhal, M. (2014). Phoneme-based recognizer to assist reading the Holy Quran. In Recent Advances in Intelligent Informatics (pp. 141-152). Springer International Publishing.
http://dx.doi.org/10.1007/978-3-319-01778-5_15


Refbacks

  • There are currently no refbacks.



Please send any question about this web site to info@praiseworthyprize.com
Copyright © 2005-2024 Praise Worthy Prize