Study of the Effects of Control Factors on Speech Features Using Taguchi Method

Wael Hasan Al-Sawalmeh; Mohd Alrashdan; Mahmoud A. Alnaanah; Haitham A. Alasha'ary; Khaled Daqrouq

doi:10.15866/iree.v17i1.21145

Study of the Effects of Control Factors on Speech Features Using Taguchi Method

Wael Hasan Al-Sawalmeh^(1*), Mohd Alrashdan⁽²⁾, Mahmoud A. Alnaanah⁽³⁾, Haitham A. Alasha'ary⁽⁴⁾, Khaled Daqrouq⁽⁵⁾

^(*) Corresponding author

Authors' affiliations

DOI: https://doi.org/10.15866/iree.v17i1.21145

Abstract

In this paper, Taguchi optimization method is used for the first time to study the effect of some factors on the features included in speech signal. These factors are ambient temperature, speaker's age, number of words, and number of letters in each word. Taguchi method has the advantage of using a small number of experiments while getting good results for finding which factor has more effect on the phenomena under study. In this method, the number of experiments is based on the orthogonal array that depends on the number and level of the control factors. The main advantage of this method is evident in relation to reducing number of needed experiments where time, cost and material resources are minimized compared to other methods that rely mainly on trial-and-error approach. Nine representative experiments have been carried out. Each one is repeated three times. MATLAB has been employed to find the speech features under study including entropy and linear predictive coding (LPC). Minitab statistical software has been used to analyze the results based on Taguchi method. It has been found out that the order of factors according to their effect on the abovementioned speech features from higher to lower is number of words, number of letters, age, and ambient temperature. In order to validate these results, a linear regression analysis, which has confirmed the results obtained using Taguchi method, has been carried out. The results are immensely beneficial for researchers in improving speech analysis related applications by understanding which factors have more effect on speech features.
Copyright © 2022 Praise Worthy Prize - All rights reserved.

Keywords

Entropy; Linear Predictive Coding; Speech Analysis; Taguchi Method

Full Text:

PDF

References

Prenger R, Valle R, Catanzaro B. Waveglow: A flow-based generative network for speech synthesis. InICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019 May 12 (pp. 3617-3621). IEEE.
https://doi.org/10.1109/ICASSP.2019.8683143

Anumanchipalli GK, Chartier J, Chang EF. Speech synthesis from neural decoding of spoken sentences. Nature. 2019 Apr;568(7753):493-8.
https://doi.org/10.1038/s41586-019-1119-1

Picart B, Drugman T, Dutoit T. Analysis and synthesis of hypo and hyperarticulated speech. arXiv preprint arXiv:2006.04136. 2020 Jun 7.

Valin JM, Skoglund J. LPCNet: Improving neural speech synthesis through linear prediction. InICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019 May 12 (pp. 5891-5895). IEEE.
https://doi.org/10.1109/ICASSP.2019.8682804

Park DS, Chan W, Zhang Y, Chiu CC, Zoph B, Cubuk ED, Le QV. Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779. 2019 Apr 18.
https://doi.org/10.21437/Interspeech.2019-2680

Watanabe S, Mandel M, Barker J, Vincent E, Arora A, Chang X, Khudanpur S, Manohar V, Povey D, Raj D, Snyder D. CHiME-6 challenge: Tackling multispeaker speech recognition for unsegmented recordings. arXiv preprint arXiv:2004.09249. 2020 Apr 20.
https://doi.org/10.21437/CHiME.2020-1

Gârbacea C, van den Oord A, Li Y, Lim FS, Luebs A, Vinyals O, Walters TC. Low bit-rate speech coding with VQ-VAE and a WaveNet decoder. InICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019 May 12 (pp. 735-739). IEEE.
https://doi.org/10.1109/ICASSP.2019.8683277

Klejsa J, Hedelin P, Zhou C, Fejgin R, Villemoes L. High-quality speech coding with sample RNN. InICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019 May 12 (pp. 7155-7159). IEEE.
https://doi.org/10.1109/ICASSP.2019.8682435

Xia Y, Braun S, Reddy CK, Dubey H, Cutler R, Tashev I. Weighted speech distortion losses for neural-network-based real-time speech enhancement. InICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020 May 4 (pp. 871-875). IEEE.
https://doi.org/10.1109/ICASSP40776.2020.9054254

Afouras T, Chung JS, Zisserman A. My lips are concealed: Audio-visual speech enhancement through obstructions. arXiv preprint arXiv:1907.04975. 2019 Jul 11.
https://doi.org/10.21437/Interspeech.2019-3114

Liao CF, Tsao Y, Lu X, Kawai H. Incorporating symbolic sequential modeling for speech enhancement. arXiv preprint arXiv:1904.13142. 2019 Apr 30.
https://doi.org/10.21437/Interspeech.2019-1777

Dias JW, McClaskey CM, Harris KC. Time-Compressed Speech Identification is predicted by auditory neural processing, Perceptuomotor speed, and executive functioning in younger and older listeners. Journal of the Association for Research in Otolaryngology. 2019 Feb;20(1):73-88.
https://doi.org/10.1007/s10162-018-00703-1

Yellamsetty A, Bidelman GM. Brainstem correlates of concurrent speech identification in adverse listening conditions. Brain research. 2019 Jul 1;1714:182-92.
https://doi.org/10.1016/j.brainres.2019.02.025

Westbury C, Hollis G. Conceptualizing syntactic categories as semantic categories: Unifying part-of-speech identification and semantics using co-occurrence vector averaging. Behavior research methods. 2019 Jun;51(3):1371-98.
https://doi.org/10.3758/s13428-018-1118-4

Kasim, Z.R. (2019). An Acoustic Investigation Of The Glottal Stop In Arabic, ICPhS2019, 3275-3279.

Zahn CJ, Hopper R. Measuring language attitudes: The speech evaluation instrument. Journal of language and social psychology. 1985 Jun;4(2):113-23.
https://doi.org/10.1177/0261927X8500400203

Deshpande, P., Chitode, J., Spectral Correlative Mapping Approach for Transformation of Expressivity in Marathi Speech, (2018) International Journal on Communications Antenna and Propagation (IRECAP), 8 (1), pp. 41-51.
https://doi.org/10.15866/irecap.v8i1.13895

Yang W, Moon HJ. Cross-modal effects of noise and thermal conditions on indoor environmental perception and speech recognition. Applied Acoustics. 2018 Dec 1;141:1-8.
https://doi.org/10.1016/j.apacoust.2018.06.019

Long MA, Katlowitz KA, Svirsky MA, Clary RC, Byun TM, Majaj N, Oya H, Howard III MA, Greenlee JD. Functional segregation of cortical regions underlying speech timing and articulation. Neuron. 2016 Mar 16;89(6):1187-93.
https://doi.org/10.1016/j.neuron.2016.01.032

Christensen JA, Sis J, Kulkarni AM, Chatterjee M. Effects of age and hearing loss on the recognition of emotions in speech. Ear and hearing. 2019 Sep;40(5):1069.
https://doi.org/10.1097/AUD.0000000000000694

Hall AC, Kenway B, Sanli H, Birman CS. Cochlear Implant Outcomes in Large Vestibular Aqueduct Syndrome-Should We Provide Cochlear Implants Earlier?. Otology & Neurotology. 2019 Sep 1;40(8):e769-73.
https://doi.org/10.1097/MAO.0000000000002314

Alsharhan E, Ramsay A. Improved Arabic speech recognition system through the automatic generation of fine-grained phonetic transcriptions. Information Processing & Management. 2019 Mar 1;56(2):343-53.
https://doi.org/10.1016/j.ipm.2017.07.002

Fletcher A, McAuliffe M, Kerr S, Sinex D. Effects of vocabulary and implicit linguistic knowledge on speech recognition in adverse listening conditions. American Journal of Audiology. 2019 Oct 16;28(3S):742-55.
https://doi.org/10.1044/2019_AJA-HEAL18-18-0169

Kalia A, Sharma S, Pandey SK, Jadoun VK, Das M. Comparative analysis of speaker recognition system based on voice activity detection technique, MFCC and PLP features. In Intelligent Computing Techniques for Smart Energy Systems 2020 (pp. 781-787). Springer, Singapore.
https://doi.org/10.1007/978-981-15-0214-9_82

Baroi OL, Kabir MS, Niaz A, Islam MJ, Rahimi MJ. Effects of Different coefficients on MFCC and PLP for Bangla Speech Corpus using Tied-state Triphone Model. In 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE) 2019 Feb 7 (pp. 1-6). IEEE.
https://doi.org/10.1109/ECACE.2019.8679395

Ahmed S, Siddique F, Waqas M, Hasan M, ur Rehman S. Viterbi algorithm performance analysis for different constraint length. In 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST) 2019 Jan 8 (pp. 930-932). IEEE.
https://doi.org/10.1109/IBCAST.2019.8667209

Archana VG. Embedded Signcommunication Recognition Using KNN and HMM-VITERBI Fusion Classifiers. In Proceedings of International Conference on Recent Trends in Computing, Communication & Networking Technologies (ICRTCCNT) 2019 Aug 1.
https://doi.org/10.2139/ssrn.3430255

Nassif AB, Shahin I, Attili I, Azzeh M, Shaalan K. Speech recognition using deep neural networks: A systematic review. IEEE Access. 2019 Feb 1;7:19143-65.
https://doi.org/10.1109/ACCESS.2019.2896880

Sirignano J, Spiliopoulos K. Mean field analysis of deep neural networks. Mathematics of Operations Research. 2021 Apr 21.

Scharenborg O, van der Gouw N, Larson M, Marchiori E. The representation of speech in deep neural networks. In International Conference on Multimedia Modeling 2019 Jan 8 (pp. 194-205). Springer, Cham.
https://doi.org/10.1007/978-3-030-05716-9_16

Vandbakk M, Olaff HS, Holth P. Conditioned reinforcement: the effectiveness of stimulus-stimulus pairing and operant discrimination procedures. The Psychological Record. 2019 Mar;69(1):67-81.
https://doi.org/10.1007/s40732-018-0318-8

Sztahó D, Kiss G, Tulics MG, Hajduska-Dér B, Vicsi K. Automatic discrimination of several types of speech pathologies. In 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) 2019 Oct 10 (pp. 1-6). IEEE.
https://doi.org/10.1109/SPED.2019.8906556

Ogawa A, Hirao T, Nakatani T, Nagata M. ILP-based Compressive Speech Summarization with Content Word Coverage Maximization and Its Oracle Performance Analysis. InICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019 May 12 (pp. 7190-7194). IEEE.
https://doi.org/10.1109/ICASSP.2019.8683543

Theodosis E, Maragos P. Tropical modeling of weighted transducer algorithms on graphs. InICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019 May 12 (pp. 8653-8657). IEEE.
https://doi.org/10.1109/ICASSP.2019.8683127

Meng F, Jia Y, Wang W, Ning X. Evaluation Methods of English Advanced Pronunciation Skills Based on Speech Recognition. In International Conference on Intelligent and Interactive Systems and Applications 2019 Jun 28 (pp. 213-221). Springer, Cham.
https://doi.org/10.1007/978-3-030-34387-3_26

Cárdenas, J., Valencia, G., Forero, J., Hydraulic Performance Prediction Methodology in Regenerative Pumps Through CFD Analysis, (2019) International Journal on Energy Conversion (IRECON), 7 (6), pp. 253-262.
https://doi.org/10.15866/irecon.v7i6.18341

Sabry, I., Idrisi, A., Mourad, A., Friction Stir Welding Process Parameters Optimization Through Hybrid Multi-Criteria Decision-Making Approach, (2021) International Review on Modelling and Simulations (IREMOS), 14 (1), pp. 32-43.
https://doi.org/10.15866/iremos.v14i1.19537

Alrashdan MH, Hamzah AA, Majlis B. Design and optimization of cantilever based piezoelectric micro power generator for cardiac pacemaker. Microsystem Technologies. 2015 Aug;21(8):1607-17.
https://doi.org/10.1007/s00542-014-2334-1

El Aissaoui El Meliani, M., Debab, A., Benhamou, A., Amen, T., Terashima, M., Yasui, H., Modelling the (α)-factor in a Pneumatic Bioreactor Using the Taguchi Approach, (2020) International Review on Modelling and Simulations (IREMOS), 13 (4), pp. 252-259.
https://doi.org/10.15866/iremos.v13i4.19173

Alrashdan, M., Quality and Damping Factors Optimization Using Taguchi Methods in Cantilever Beam Based Piezoelectric Micro-Power Generator for Cardiac Pacemaker Applications, (2020) International Review on Modelling and Simulations (IREMOS), 13 (2), pp. 74-84.
https://doi.org/10.15866/iremos.v13i2.18347

https://www.york.ac.uk/depts/maths/tables/orthogonal.htm, [Accessed 7, Jul, 2021].

https://ccrma.stanford.edu/~hskim08/lpc/, [Accessed 7, Jul, 2021].

Refbacks

There are currently no refbacks.

Username
Password
Remember me