On an Advanced System of Speech Recognition Based on Gammachirp Wavelet Transform as Filter Bank

Khaireddine Salhi; Zied Hajaiej; Noureddine Ellouze

doi:10.15866/irecos.v9i11.4370

On an Advanced System of Speech Recognition Based on Gammachirp Wavelet Transform as Filter Bank

Khaireddine Salhi^(1*), Zied Hajaiej⁽²⁾, Noureddine Ellouze⁽³⁾

^(*) Corresponding author

Authors' affiliations

DOI: https://doi.org/10.15866/irecos.v9i11.4370

Abstract

Several techniques of parameterization in speech recognition have been developed to modeled human auditory system such as the Mel Frequency Cepstral MFCC and Perceptual Linear Predictive PLP which dominate the speech recognition fields. The success of acoustic features derived from MFCC and PLP turned them into a standard choice in specific conditions but their performances degrade with additive noise. Recently, we have proposed an auditory feature extractor based on gammachirp wavelet transform, they are obtained by replacement of the filter bank used in above methods by a gammachirp wavelet transforms. We found that proposed feature give a significant improvement in robust speech recognition than conventional acoustic feature.
Copyright © 2014 Praise Worthy Prize - All rights reserved.

Keywords

Gammachirp; Wavelet Transform; Hidden Markov Models; Perceptual Linear Predictive

Full Text:

PDF

References

A. Varga, H.J.M. Steeneken, M.T. Omlison, D. Jones, “The NOISEX-92 Study on the Effect of Additive Noise on Automatic Speech Recognition,” Documentation included in the NOISEX-92 CD-ROM Set, 1992.
http://dx.doi.org/10.1016/0167-6393(93)90095-3

F. Baumgarte, “Application of a physiological ear model to irrelevance reduction in audio coding”, Proc. AES 17th International Conference on High Quality Audio Coding, Signa, Italy, pp. 171-181, September 1999.

E. Zwicker, H. Fastl, “Psychoacoustics, Facts and Models”, Springer Verlag, 2nd ed., (1999).
http://dx.doi.org/10.1007/978-3-662-09562-1_15

R. D. Patterson, “Auditory filter shapes derived with noise stimuli, J. Acoust. Soc. Am. 59, 640–654, 1976.
http://dx.doi.org/10.1121/1.380914

R. D. Patterson, I. Nimmo Smith, “Off-frequency listening and auditory-filter asymmetry’’ J. Acoust. Soc. Am, Vol. 67, No. 1, pp. 229-245, 1980.
http://dx.doi.org/10.1121/1.383732

B.R. Glasberg, B.C.J. Moore, ‘‘Derivation of auditory filter shapes from notched-noise data’’, Hearing Research, 47, 103-198, 1990.
http://dx.doi.org/10.1016/0378-5955(90)90170-t

T. Irino, R. D. Patterson, ‘‘A time-domain, Level-dependent auditory filter: The gammachirp,’’ J. A. coust. Soc. Am. 101(1): 412-419, January, 1997.
http://dx.doi.org/10.1121/1.417975

T. Irino, M. Unoki, "A time-varying, analysis/synthesis auditory filter bank using the gammachirp,” IEEE Int. Conf. Acoust., Speech Signal Processing, pp3653-3656. ICASSP 1998.
http://dx.doi.org/10.1109/icassp.1998.679675

B. C. J. Moore, R. W. Peters, and B. R. Glasberg, “Auditory filter shapes at low center frequencies,” J. Acoust. Soc. Am. 88, 132 –140, 1990.
http://dx.doi.org/10.1121/1.399960

T. Irino, R. D. Patterson, ‘‘A compressive gammachirp auditory filter for both physiological and psychophysical data,’’ J. Acoust Soc. Am. 109(5): 2008-2022, may 2001.
http://dx.doi.org/10.1121/1.1367253

Z.Hajaiej, K. Ouni, N. ELLOUZE, “ Isolated Word Recognition by Asymmetric Gammachirp Parameters,’’ Journal of Signal Processing Japan , Volume 11, Number 1, pp 75-81, January 2007.

T. Irino, R. D.Patterson, ‘‘Temporal asymmetry in the auditory system,’’ J. Acoust. Soc. Am. 99(4): 2316-2331, April, 1997.
http://dx.doi.org/10.1121/1.415419

T.Irino, M. Unoki, ‘‘An Analysis Auditory Filter bank Based on an IIR Implementation of the Gammachirp,’’ J. Acoust. Soc Japan. 20(6): 397-406, November, 1999.
http://dx.doi.org/10.1250/ast.20.397

P. I. M. Johannesma, “The pre-response stimulus ensemble of neurons in the cochlear nucleus,” in Symposium on Hearing Theory (IPO, Eindhoven, Holland), pp. 58–69, 1972.

H. Hermansky, ‘‘Perceptual Linear predictive (PLP) analysis of speech’’, J. Acoust. Soc. Am. Vol. 87, No. 4, pp. 1738-1752., April 1990.
http://dx.doi.org/10.1121/1.399423

Kais. Ouni, ‘‘Contribution to the Vocal Signal Analysis Using Knowledges on the Auditory Perception and Multiresolution Time Frequency Representation of the Speech Signals, ’’ (in french), PhD Thesis on Electrical ENIT Tunis. February 2003.

S. Mallat, “A Wavelet Tour of Signal Processing”, second edition, Academic Press, 1999.
http://dx.doi.org/10.1016/b978-012466606-1/50011-8

H. Hermansky, ‘‘Perceptual Linear predictive (PLP) analysis of speech’’, J. Acoust. Soc. Am. Vol. 87, No. 4, pp. 1738-1752., April 1990.
http://dx.doi.org/10.1121/1.399423

H. Fletcher, “Auditory patterns,” Rev. Mod. Phys. 12, 47–65, 1940.
http://dx.doi.org/10.1103/revmodphys.12.47

J. O. Smith III, J, S. Abel, “Bark and ERB Bilinear Transforms, “IEEE Tran. On speech and Audio Processing, Vol. 7, No. 6, November 1999.
http://dx.doi.org/10.1109/89.799695

NIST. The DARPA TIMIT Acoustic-phonetic Continuous Speech Recognition Database, 1990.

S. Young, G. Evermann, Gales, T. D. Hain, X. Liu. Kershaw, G. Moore, J. D. Odell, D. Ollason, and P.Woodland. The HTK book (for HTK version 3.4). Cambridge University Engineering Department, Cambridge, UK, 2006.

M. J. F. Gales, S. J. Young, “Hmm recognition in noise using parallel model combination,” in Proceedings of EUROSPEECH, pp. 837–840. , 1993.

K. K. Paliwal, “Decorrelated and Liftered Filter-Bank Energies for Robust Speech Recognition,” Proc. Eurospeech, pp. 85-88. 1.

Chehresa, S., Savoji, M.H., Speech enhancement based on Gaussian mixture modeling and wiener filtering, (2012) International Journal on Communications Antenna and Propagation (IRECAP), 2 (2), pp. 111-122.

Nawaz, T., Baig, S., Khan, A., The performance comparison of coded WP-OFDM and DFT-OFDM in frequency selective rayleigh fading channel, (2011) International Journal on Communications Antenna and Propagation (IRECAP), 1 (6), pp. 500-505.

Torabi, N., Karrari, M., Menhaj, M.B., A novel wavelet fuzzy fault location method for partially observable transmission networks based on WAMS/PMU, (2011) International Journal on Communications Antenna and Propagation (IRECAP), 1 (6), pp. 478-487.

Djebbari, A., Bereksi-Reguig, F., A new chirp-Based wavelet for heart sounds time-Frequency analysis, (2011) International Journal on Communications Antenna and Propagation (IRECAP), 1 (1), pp. 92-102.

H. Fletcher, “Auditory patterns,” Rev. Mod. Phys. 12, 47–65, 1940.
http://dx.doi.org/10.1103/revmodphys.12.47

Refbacks

There are currently no refbacks.

Username
Password
Remember me