Conventional Acoustic Features Based Gammachirp Filterbank for Text Independent Speaker Identification System in Noisy Environments

Amina Ben Abdallah; Zied Hajaiej; N. Ellouze

doi:10.15866/irecos.v10i3.5235

Conventional Acoustic Features Based Gammachirp Filterbank for Text Independent Speaker Identification System in Noisy Environments

Amina Ben Abdallah⁽¹⁾, Zied Hajaiej^(2*), N. Ellouze⁽³⁾

^(*) Corresponding author

Authors' affiliations

DOI: https://doi.org/10.15866/irecos.v10i3.5235

Abstract

In this paper we study the performance of a novel feature extraction for both of MFCC and PLP features based Gammachirp filterbank in a state-of-the-art text independent speaker identification system (SID). The novel feature for extracting these parameters is based on an auditory periphery model for robust speaker identification system which mimics the human auditory system characteristics and relies on the Gammachirp filterbank to imitate the cochlea frequency resolution with nonlinear resolution according to the equivalent rectangular bandwidth (ERB) scale. Our evaluations show that the proposed feature performs considerably better than conventional acoustic features. We further demonstrate that integrating the proposed feature in various noisy environments yields promising recognition performance.
Copyright © 2015 Praise Worthy Prize - All rights reserved.

Keywords

Speaker Identification; Gammachirp; MFCC; PLP; Feature Extraction; HMM/GMM

Full Text:

PDF

References

T. Kinnunen, V. Hautamäki, P. Fränti, Fusion of Spectral Feature Sets for Accurate Speaker Identification, Proceedings of the 9th International Conference on Speech and Computer (SPECOM 2004), pp. 361-365, St. Petersburg, Russia, September 20-22, 2004.

T. Kinnunen, T. Kilpeläinen, P. Fränti, Comparison of Clustering Algorithms in Speaker Identification, Proceedings of the IASTED International Conference on Signal Processing and Communications (SPC 2000), pp. 222-227, Marbella, Spain, September 19-22, 2000.

Varga, A., Steeneken, H,J,M., Omlison, M,T., Jones, D., The NOISEX-92 Study on the Effect of Additive Noise on Automatic Speech Recognition, Documentation included in the NOISEX-92 CD-ROM Set.,1992 .
http://dx.doi.org/10.1016/0167-6393(93)90095-3

Smith III, J, O., Abel, J, S., Bark and ERB Bilinear Transforms. IEEE Tran. On speech and Audio Processing, Vol. 7, No. 6, November 1999.
http://dx.doi.org/10.1109/89.799695

Irino, T., Patterson, R, D., A time-domain, Level-dependent auditory filter: The gammachirp. J. Acoust. Soc. Am. 101(1): 412-419, January, 1997.
http://dx.doi.org/10.1121/1.417975

Johannesma, P, I, M., The pre-response stimulus ensemble of neurons in the cochlear nucleus, in Symposium on Hearing Theory (IPO, Eindhoven, Holland), pp. 58–69, 1972.

Irino, T., Unoki, M,. A time-varying, analysis/synthesis auditory filterbank using the gammachirp, IEEE Int. Conf. Acoust., Speech Signal Processing, pp3653-3656. ICASSP 1998.
http://dx.doi.org/10.1109/icassp.1998.679675

Hajaiej, Z., Ouni, K., ELLOUZE, N., Isolated Word Recognition by Asymmetric Gammachirp Parameters, Journal of Signal Processing Japan , Volume 11, Number 1, pp 75-81, January 2007.

E.C. Gordon, Signal and Linear System Analysis. Copyright © 1998 John Wiley &Sons Ltd., NewYork, USA.

M. H. hermansky, Perceptual linear predictive(PLP) analysis of speech, journal of the Acoustical Society of America, vol 87 no.4,pp 1738- 1752,1990.
http://dx.doi.org/10.1121/1.399423

Sid Ahmed Selouani and Jean Caelen, Un système connexionniste modulaire pour la reconnaissance des traits phonétiques de l’Arabe.

F.Z. Chelali, A.Djeradi, R.Djeradi, Speaker Identification System based on PLP Coefficients and Artificial Neural Network, Proceedings of the World Congress on Engineering 2011 Vol II WCE 2011, July 6 - 8, 2011, London, U.K.

T. Irino et M. Unoki. An analysis auditory filterbank based on an IIR implementation of the gammachirp. J. Acoust. Soc Japan. 20(6): 397- 406, November, 1999.
http://dx.doi.org/10.1250/ast.20.397

T. Irino, R. D. Patterson. A compressive gammachirp auditory filter for both physiological andpsychophysical data. J. Acoust Soc. Am. 109(5): 2008-2022, may 2001.
http://dx.doi.org/10.1121/1.1367253

T. Irino, R. D. Patterson. Temporal asymmetry in the auditory system. J. Acoust. Soc. Am. 99(4): 2316-2331, April, 1997.
http://dx.doi.org/10.1121/1.415419

R. D. Patterson, I. Nimmo-Smith. Off-frequency listening and auditory-filter asymmetry, J. Acoust.Soc. Am., Vol. 67, No. 1, pp. 229-245, 1980.
http://dx.doi.org/10.1121/1.383732

T. Irino, D. Patterson. A time-domain, level dependent auditory filter: the gammachirp. J. Acoust Soc.Am. 101(1): 412-419, January, 1997.
http://dx.doi.org/10.1121/1.417975

Fletcher, H., Auditory patterns, Rev. Mod. Phys. 12,47–65, 1940.
http://dx.doi.org/10.1103/revmodphys.12.47

B.R. Glasberg, B. C. J. Moore. Derivation of auditory filter shapes from notched-noise data. Hearing Research, 47103-198, 1990.
http://dx.doi.org/10.1016/0378-5955(90)90170-t

Umesh, S., Cohen, L. and Nelson, D., Fitting the Mel Scale, International Conference on Acoustics, Speech, and Signal Processing, ICASSP'99, vol.1, pp. 217-220, Phoenix Arizona (USA), mars 1999.
http://dx.doi.org/10.1109/icassp.1999.758101

Fletcher, H., Auditory Patterns, Review of Modern Physics, 12, pp. 47-65, 1940.
http://dx.doi.org/10.1103/revmodphys.12.47

D.A.Reynold, Speaker identification and verification using Gaussian mixture models, Speech Communication, vol.17, pp.91-108, 1995
http://dx.doi.org/10.1016/0167-6393(95)00009-d

HTK tutorial book,http://htk.eng.cam.ac.uk/

Baum, L. E., Eagon, J. A., An Inequality with Applications to Statistical Estimation for Probabilistic Functions of Markov Processes, Inequalities, vol. 3, pp. 1¬8, 1972.
http://dx.doi.org/10.1090/s0002-9904-1967-11751-8

TIMIT: Acoustic-phonetic Continuous Speech Corpus CD-ROM. U.S. Dept. of Commerce, NIST, Gaithersburg, MD, 1993.

S.Young, The HTK Hidden Markov Model Toolkit: Design and Philosophy, Rapport technique, Cambridge university Engineering Departement, 1994.

Varga, A., Steeneken, H,J,M., Omlison, M,T., Jones, D., The NOISEX-92 Study on the Effect of Additive Noise on Automatic Speech Recognition, Documentation included in the NOISEX-92 CD-ROM Set.,1992.
http://dx.doi.org/10.1016/0167-6393(93)90095-3

Refbacks

There are currently no refbacks.

Username
Password
Remember me