Embedded Mel Frequency Cepstral Coefficient Feature Extraction System for Speech Processing

Leonardo Gongora; Olga Ramos; Dario Amaya

doi:10.15866/irecos.v11i3.8786

Embedded Mel Frequency Cepstral Coefficient Feature Extraction System for Speech Processing

Leonardo Gongora⁽¹⁾, Olga Ramos⁽²⁾, Dario Amaya^(3*)

^(*) Corresponding author

Authors' affiliations

DOI: https://doi.org/10.15866/irecos.v11i3.8786

Abstract

The feature extraction process is the main step after acquiring speech signals and it is used for developing applications ranging from data compression to complex speech recognition systems. This work aims to describe the development of a system for extracting speech features, based on the Mel frequency cepstral coefficients. The system was designed to be implemented in a portable device, using an embedded system. The paper first describes the general methodology for the calculation of the Mel Frequency Cepstral Coefficients, then it explains each phase individually, and the adaptations performed for the embedded system. In the last part, a comparison of the performance between the embedded algorithm and the one implemented in MATLAB, is presented as a method to validate the results.
Copyright © 2016 Praise Worthy Prize - All rights reserved.

Keywords

Mel Frequency Cepstral Coefficients; Automatic Speech Recognition System; Windowing

Full Text:

PDF

References

S. Milinkovic, Pure software-based speech recognition for OS-less embedded systems, 3rd Mediterranean Conference on Embedded Computing (MECO)(Page: 40 Year of Publication: 2014 ISBN: 978-1-4799-4826-0)
http://dx.doi.org/10.1109/meco.2014.6862730

L. R. Rabiner and R. W. Schafer, Theory and Applications of Digital Speech Processing(Pearson, 2011).

B. Gold, N. Morgan, and D. Ellis, Speech and Audio Signal Processing: Processing and Perception of Speech and Music (Wiley, 2011)
http://dx.doi.org/10.1002/9781118142882

L. A. Góngora, D. A. Rojas, and O. L. Ramos, Algorithm for cepstral analysis and homomorphic filtering for glottal feature estimation in speech signals, (2015) International Review of Electrical Engineering (IREE), 10 (4), pp. 561-568.
http://dx.doi.org/10.15866/iree.v10i4.6961

S. Sunny, D. P. S., and K. P. Jacob, Feature Extraction Methods Based on Linear Predictive Coding and Wavelet Packet Decomposition for Recognizing Spoken Words in Malayalam,International Conference on Advances in Computing and Communications(Page: 27 Year of Publication: 2012 ISBN:978-1-4673-1911-9).
http://dx.doi.org/10.1109/icacc.2012.7

J.-D. Wu and B.-F. Lin, Speaker identification based on the frame linear predictive coding spectrum technique, (2009)Expert Systems with Applications,36 (4), pp. 8056–8063.
http://dx.doi.org/10.1016/j.eswa.2008.10.051

J. de Veth and L. Boves, On the efficiency of classical RASTA filtering for continuous speech recognition: Keeping the balance between acoustic pre-processing and acoustic modelling, (2003)Speech Communication, 39 (3–4), pp. 269–286.
http://dx.doi.org/10.1016/s0167-6393(02)00030-4

M. J. Alam, T. Kinnunen, P. Kenny, P. Ouellet, and D. O’Shaughnessy, Multitaper MFCC and PLP features for speaker verification using i-vectors, (2013)Speech Communication, 55 (2), pp. 237–251.
http://dx.doi.org/10.1016/j.specom.2012.08.007

P. M. Chauhan and N. P. Desai, Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter, International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE) (Page: 1 Year of Publication: 2014 ISBN:978-1-4799-4982-3).
http://dx.doi.org/10.1109/icgccee.2014.6921394

O. Chia Ai, M. Hariharan, S. Yaacob, and L. Sin Chee, Classification of speech dysfluencies with MFCC and LPCC features, (2012)Expert Systems with Applications, 39(2), pp. 2157–2165.
http://dx.doi.org/10.1016/j.eswa.2011.07.065

S. G. Koolagudi, D. Rastogi, and K. S. Rao, Identification of Language using Mel-Frequency Cepstral Coefficients (MFCC), (2012) Procedia Engineering, 38, pp. 3391–3398.
http://dx.doi.org/10.1016/j.proeng.2012.06.392

E. Pavez and J. F. Silva, Analysis and design of Wavelet-Packet Cepstral coefficients for automatic speech recognition, (2012) Speech Communication, 54 (6), pp. 814–835.
http://dx.doi.org/10.1016/j.specom.2012.02.002

C. Levy, G. Linares, P. Nocera, and J.-F. Bonastre, Reducing computational and memory cost for cellular phone embedded speech recognition system,IEEE International Conference on Acoustics, Speech, and Signal Processing(Page: 309 Year of Publication: 2004 ISBN: 0-7803-8484-9).
http://dx.doi.org/10.1109/icassp.2004.1327109

O. Cheng, W. Abdulla, and Z. Salcic,Hardware–Software Codesign of Automatic Speech Recognition System for Embedded Real-Time Applications, (2011) IEEE Transactions on Industrial Electronics, 58 (3), pp. 850–859.
http://dx.doi.org/10.1109/tie.2009.2022520

P. Li and H. Tang,Design of a Low-Power Coprocessor for Mid-Size Vocabulary Speech Recognition Systems, (2011) IEEE Transactions on Circuits and Systems I: Regular Papers, 58 (5), pp. 961–970.
http://dx.doi.org/10.1109/tcsi.2010.2090569

Peng Li, Hua Tang, and Weiqian Liang, Low power embedded speech recognition system based on a MCU and a coprocessor,IEEE International Conference on Acoustics, Speech and Signal Processing (Page: 625 Year of Publication: 2009 ISBN: 978-1-4244-2353-8).
http://dx.doi.org/10.1109/icassp.2009.4959661

H. Liu, Y. Qian, and J. Liu, English Speech Recognition System on Chip, (2011) Tsinghua Science and Technology, 16 (1), pp. 95–99.
http://dx.doi.org/10.1016/s1007-0214(11)70015-3

Q. Qu and L. Li, Realization of embedded speech recognition module based on STM32, 11th International Symposium on Communications & Information Technologies (ISCIT), (Page: 73 Year of Publication: 2011 ISBN: 978-1-4577-1295-1).
http://dx.doi.org/10.1109/iscit.2011.6092186

A. Gallardo-Antolin, A. I. Garcia-Moral, Y. Pereiro-Estevan, and F. Diaz-de-Maria, Design of an embedded speech-centric interface for applications in handheld terminals, (2013)IEEE Aerospace and Electronic Systems Magazine, 28 (2), pp. 24–33.
http://dx.doi.org/10.1109/maes.2013.6477866

Y. Qian, J. Liu, and M. Johnson, Efficient embedded speech recognition for very large vocabulary Mandarin car-navigation systems, (2009)IEEE Transactions on Consumer Electronics, 55 (3), pp. 1496–1500.
http://dx.doi.org/10.1109/tce.2009.5278018

Y. Zhang, C. He, Y. Luo, K. Chen, and W. Xing, Improved perceptually non-uniform spectral compression for robust speech recognition, (2013)The Journal of China Universities of Posts and Telecommunications, vol. 20, no. 4, pp. 122–126.
http://dx.doi.org/10.1016/s1005-8885(13)60080-1

X. Huang, A. Acero, and R. Hon, Hsiao-Wuen/Foreword By-Reddy, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development (Prentice Hall PTR, 2001).

D. Jurafsky and J. H. Martin, Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition, (Prentice Hall, 2009).
http://dx.doi.org/10.1162/089120100750105975

M. A. Hossan, S. Memon, and M. A. Gregory, A novel approach for MFCC feature extraction, 4th International Conference on Signal Processing and Communication Systems, (Page: 1 Year of Publication: 2010 ISBN: 978-1-4244-7908-5).
http://dx.doi.org/10.1109/icspcs.2010.5709752

Refbacks

There are currently no refbacks.

Username
Password
Remember me