Comparative Analysis of the Mel Frequency Cepstral Coefficients for Voiced and Silent Speech

Leonardo Gongora; Olga Ramos; Joao Mauricio

doi:10.15866/irecap.v6i5.10271

Comparative Analysis of the Mel Frequency Cepstral Coefficients for Voiced and Silent Speech

Leonardo Gongora⁽¹⁾, Olga Ramos⁽²⁾, Joao Mauricio^(3*)

^(*) Corresponding author

Authors' affiliations

DOI: https://doi.org/10.15866/irecap.v6i5.10271

Abstract

The extraction of representative characteristics from voice signals, is the basic step for any application where speech signal processing is performed. In this paper, a comparative analysis of the Mel frequency cepstral coefficients, extracted from samples of voiced and silent speech is exposed. The classical methodology for extracting Mel frequency cepstral coefficient is evaluated to show the implementation steps and the robustness of such kind of features representing in a unique form the statistical and representative information of time varying signals. To achieve this comprehensive evaluation, first, a contextualization in theoretical terms of the methodology is performed to show the mathematical basis for the coefficient extraction. Each step of the process is analyzed in terms of the representative information of the coefficients. Therefore an evaluation in comparative terms against the previous developments for data interpretation is done. The coefficients are extracted using the classical implementation of the Mel frequency cepstral coefficient feature extraction process in a free software framework implemented in Matlab®. The results show the robustness of the Mel frequency cepstral coefficient algorithm in the characterization of voiced and silent speech signals, which represents an advantage for the construction of any kind of speech processing systems.
Copyright © 2016 Praise Worthy Prize - All rights reserved.

Keywords

Acceleration Coefficients; Cepstral Analysis; Mel Frequency Cepstral Coefficients; Speed Coefficients

Full Text:

PDF

References

Madana Mohana, R., Rama Mohan Reddy, A., Hybrid Learning Model and Acoustic Approach to Spoken Language Identification Using Machine Learning, (2015) International Review on Computers and Software (IRECOS), 10 (5), pp. 502-512.
http://dx.doi.org/10.15866/irecos.v10i5.6091

Karpagavalli, V., Chandra, E., Stop Consonant-Short Vowel (SCSV) Classification for Tamil Speech Utterances, (2016) International Review on Computers and Software (IRECOS), 11 (2), pp. 151-159.
http://dx.doi.org/10.15866/irecos.v11i2.8481

Rojathai, S., Venkatesulu, M., An Effective Tamil Speech Word Recognition Technique with Aid of MFCC and HMM (Hidden Markov Model), (2013) International Review on Computers and Software (IRECOS), 8 (2), pp. 577-586.

Ben Abdallah, A., Hajaiej, Z., Ellouze, N., Conventional Acoustic Features Based Gammachirp Filterbank for Text Independent Speaker Identification System in Noisy Environments, (2015) International Review on Computers and Software (IRECOS), 10 (3), pp. 271-279.
http://dx.doi.org/10.15866/irecos.v10i3.5235

Belhaj, A., Bouzid, A., Ellouze, N., Edema and Nodule Pathological Voice Identification by SVM Classifier on Speech Signal, (2015) International Review on Computers and Software (IRECOS), 10 (5), pp. 495-501.
http://dx.doi.org/10.15866/irecos.v10i5.6061

L. R. Rabiner and R. W. Schafer, Theory and Applications of Digital Speech Processing (Pearson, 2011).
http://dx.doi.org/10.1109/dsp-spe.2013.6642606

B. Gold, N. Morgan, and D. Ellis, Speech and Audio Signal Processing: Processing and Perception of Speech and Music (Wiley, 2011).
http://dx.doi.org/10.1002/9781118142882

H. Hong, Z. Zhao, X. Wang, and Z. Tao, Detection of Dynamic Structures of Speech Fundamental Frequency in Tonal Languages, IEEE Signal Process. Lett., Volume 17, Number 10, October 2010, Pages 843–846.
http://dx.doi.org/10.1109/lsp.2010.2058799

R. Maia, M. Akamine, and M. J. F. Gales, Complex cepstrum analysis based on the minimum mean squared error, IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7972–7976, October 2013.
http://dx.doi.org/10.1109/icassp.2013.6639217

J.-D. Wu and B.-F. Lin, Speaker identification based on the frame linear predictive coding spectrum technique, Expert Syst. Appl., Volume 36, Number 4, May 2009, Pages 8056–8063.
http://dx.doi.org/10.1016/j.eswa.2008.10.051

S. Sunny, D. P. S., and K. P. Jacob, Feature Extraction Methods Based on Linear Predictive Coding and Wavelet Packet Decomposition for Recognizing Spoken Words in Malayalam, International Conference on Advances in Computing and Communications, pp. 27–30, August 2012.
http://dx.doi.org/10.1109/icacc.2012.7

S. Davis and P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Trans. Acoust., Volume 28, Number 4, August 1980, Pages 357–366.
http://dx.doi.org/10.1109/tassp.1980.1163420

Gongora, L., Ramos, O., Amaya, D., Embedded Mel Frequency Cepstral Coefficient Feature Extraction System for Speech Processing, (2016) International Review on Computers and Software (IRECOS), 11 (3), pp. 207-213.
http://dx.doi.org/10.15866/irecos.v11i3.8786

M. J. Alam, T. Kinnunen, P. Kenny, P. Ouellet, and D. O’Shaughnessy, Multitaper MFCC and PLP features for speaker verification using i-vectors, Speech Commun., vol. 55, no. 2, Febrary 2013, pp. 237–251.
http://dx.doi.org/10.1016/j.specom.2012.08.007

X.-C. Yuan, C.-M. Pun, and C. L. Philip Chen, Robust Mel-Frequency Cepstral coefficients feature detection and dual-tree complex wavelet transform for digital audio watermarking, Inf. Sci. (Ny)., Volume 298, March 2015,pp. 159–179.
http://dx.doi.org/10.1016/j.ins.2014.11.040

P. M. Chauhan and N. P. Desai, Mel Frequency Cepstral Coefficients (MFCC) based speaker identification in noisy environment using wiener filter, International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE), pp. 1–5, 2014.
http://dx.doi.org/10.1109/icgccee.2014.6921394

R. S. S. Kumari, S. S. Nidhyananthan, and A. G, Fused Mel Feature sets based Text-Independent Speaker Identification using Gaussian Mixture Model, Procedia Eng., Volume 30, 2012, Pages 319–326.
http://dx.doi.org/10.1016/j.proeng.2012.01.867

Z. Wu and Z. Cao, Improved MFCC-based feature for robust speaker identification, Tsinghua Sci. Technol., Volume 10, Number 2, April 2005, Pages 158–161.
http://dx.doi.org/10.1016/s1007-0214(05)70048-1

B. Milner and X. Shao, Clean speech reconstruction from MFCC vectors and fundamental frequency using an integrated front-end, Speech Commun., Volume 48, Number 6, June 2006, Pages 697–715.
http://dx.doi.org/10.1016/j.specom.2005.10.004

D. Chazan, R. Hoory, G. Cohen, and M. Zibulski, Speech reconstruction from mel frequency cepstral coefficients and pitch frequency, IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings, Vol. 3, pp. 1299–1302, 2000.
http://dx.doi.org/10.1109/icassp.2000.861816

E. Pavez and J. F. Silva, Analysis and design of Wavelet-Packet Cepstral coefficients for automatic speech recognition, Speech Commun., Volume. 54, Number 6, July 2012, Pages 814–835.
http://dx.doi.org/10.1016/j.specom.2012.02.002

L. D. Vignolo, H. L. Rufiner, D. H. Milone, and J. C. Goddard, Evolutionary cepstral coefficients, Appl. Soft Comput., Volume 11, Number 4, June 2011, Pages 3419–3428.
http://dx.doi.org/10.1016/j.asoc.2011.01.012

S. G. Koolagudi, D. Rastogi, and K. S. Rao, Identification of Language using Mel-Frequency Cepstral Coefficients (MFCC), Procedia Eng., Volume 38, 2012, Pages 3391–3398.
http://dx.doi.org/10.1016/j.proeng.2012.06.392

P. Heracleous, Y. Nakajima, a. Lee, H. Saruwatari, and K. Shikano, Accurate hidden Markov models for non-audible murmur (NAM) recognition based on iterative supervised adaptation, IEEE Work. Autom. Speech Recognit. Underst., pp. 5–8, 2003.
http://dx.doi.org/10.1109/asru.2003.1318406

B. Denby, T. Schultz, K. Honda, T. Hueber, J. M. Gilbert, and J. S. Brumberg, Silent speech interfaces, Speech Commun., Volume 52, Number 4, April 2010, pp. 270–287.
http://dx.doi.org/10.1016/j.specom.2009.08.002

P. Heracleous, Y. Nakajima, A. Lee, H. Saruwatari, and K. Shikano, Non-audible murmur (nam) speech recognition using a stethoscopic nam microphone, Word J. Int. Linguist. Assoc.
http://dx.doi.org/10.1109/asru.2003.1318406

C.-Y. Yang, G. Brown, L. Lu, J. Yamagishi, and S. King, Noise-robust whispered speech recognition using a non-audible-murmur microphone with VTS compensation, 8th International Symposium on Chinese Spoken Language Processing, 2012, pp. 220–223.
http://dx.doi.org/10.1109/iscslp.2012.6423522

X. Fan and J. H. L. Hansen, Speaker Identification Within Whispered Speech Audio Streams, IEEE Trans. Audio. Speech. Lang. Processing, Volume 19, Number 5, July 2011, Pages 1408–1421.
http://dx.doi.org/10.1109/tasl.2010.2091631

X. Zhou, D. Garcia-Romero, R. Duraiswami, C. Espy-Wilson, and S. Shamma, Linear versus mel frequency cepstral coefficients for speaker recognition, IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 559–564, 2011.
http://dx.doi.org/10.1109/asru.2011.6163888

D. Ellis, “PLP and RASTA and MFCC, and inversion in Matlab.” 2005.
http://dx.doi.org/10.1109/iccci.2016.7480007

Y. Zhang, C. He, Y. Luo, K. Chen, and W. Xing, Improved perceptually non-uniform spectral compression for robust speech recognition, J. China Univ. Posts Telecommun., Volume 20, Number 4, August 2013, Pages 122–126.
http://dx.doi.org/10.1016/s1005-8885(13)60080-1

D. Jurafsky and J. H. Martin, Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (Prentice Hall, 2009).
http://dx.doi.org/10.1162/089120100750105975

X. Huang, A. Acero, and R. Hon, Hsiao-Wuen/Foreword By-Reddy, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development (Prentice Hall PTR, 2001).
http://dx.doi.org/10.1109/icslp.1996.607289

M. A. Hossan, S. Memon, and M. A. Gregory, A novel approach for MFCC feature extraction, 4th International Conference on Signal Processing and Communication Systems, pp. 1–5, 2010.
http://dx.doi.org/10.1109/icspcs.2010.5709752

P. Heracleous, Y. Nakajima, H. Saruwatari, and K. Shikano, A tissue-conductive acoustic sensor applied in speech recognition for privacy, Joint Conference on Smart Objects and Ambient Intelligence: Innovative Context-Aware Services: Usages And Technologies, pp. 93–97, 2005.
http://dx.doi.org/10.1145/1107548.1107577

D. O’Shaughnessy, “Acoustic Analysis for Automatic Speech Recognition,” Proc. IEEE, Volume 101, Nu,ber 5, May 2013, Pages 1038–1053.
http://dx.doi.org/10.1109/jproc.2013.2251592

V. A. Tran, G. Bailly, H. Lœvenbruck, and C. Jutten, “Improvement to a NAM captured whisper-to-speech system,” Proc. Annu. Conf. Int. Speech Commun. Assoc. INTERSPEECH, vol. 52, pp. 1465–1468, 2008.
http://dx.doi.org/10.1016/j.specom.2009.11.005

Refbacks

There are currently no refbacks.

Username
Password
Remember me