Study of the Effects of Control Factors on Speech Features Using Taguchi Method
(*) Corresponding author
DOI: https://doi.org/10.15866/iree.v17i1.21145
Abstract
In this paper, Taguchi optimization method is used for the first time to study the effect of some factors on the features included in speech signal. These factors are ambient temperature, speaker's age, number of words, and number of letters in each word. Taguchi method has the advantage of using a small number of experiments while getting good results for finding which factor has more effect on the phenomena under study. In this method, the number of experiments is based on the orthogonal array that depends on the number and level of the control factors. The main advantage of this method is evident in relation to reducing number of needed experiments where time, cost and material resources are minimized compared to other methods that rely mainly on trial-and-error approach. Nine representative experiments have been carried out. Each one is repeated three times. MATLAB has been employed to find the speech features under study including entropy and linear predictive coding (LPC). Minitab statistical software has been used to analyze the results based on Taguchi method. It has been found out that the order of factors according to their effect on the abovementioned speech features from higher to lower is number of words, number of letters, age, and ambient temperature. In order to validate these results, a linear regression analysis, which has confirmed the results obtained using Taguchi method, has been carried out. The results are immensely beneficial for researchers in improving speech analysis related applications by understanding which factors have more effect on speech features.
Copyright © 2022 Praise Worthy Prize - All rights reserved.
Keywords
Full Text:
PDFReferences
Prenger R, Valle R, Catanzaro B. Waveglow: A flow-based generative network for speech synthesis. InICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019 May 12 (pp. 3617-3621). IEEE.
https://doi.org/10.1109/ICASSP.2019.8683143
Anumanchipalli GK, Chartier J, Chang EF. Speech synthesis from neural decoding of spoken sentences. Nature. 2019 Apr;568(7753):493-8.
https://doi.org/10.1038/s41586-019-1119-1
Picart B, Drugman T, Dutoit T. Analysis and synthesis of hypo and hyperarticulated speech. arXiv preprint arXiv:2006.04136. 2020 Jun 7.
Valin JM, Skoglund J. LPCNet: Improving neural speech synthesis through linear prediction. InICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019 May 12 (pp. 5891-5895). IEEE.
https://doi.org/10.1109/ICASSP.2019.8682804
Park DS, Chan W, Zhang Y, Chiu CC, Zoph B, Cubuk ED, Le QV. Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779. 2019 Apr 18.
https://doi.org/10.21437/Interspeech.2019-2680
Watanabe S, Mandel M, Barker J, Vincent E, Arora A, Chang X, Khudanpur S, Manohar V, Povey D, Raj D, Snyder D. CHiME-6 challenge: Tackling multispeaker speech recognition for unsegmented recordings. arXiv preprint arXiv:2004.09249. 2020 Apr 20.
https://doi.org/10.21437/CHiME.2020-1
Gârbacea C, van den Oord A, Li Y, Lim FS, Luebs A, Vinyals O, Walters TC. Low bit-rate speech coding with VQ-VAE and a WaveNet decoder. InICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019 May 12 (pp. 735-739). IEEE.
https://doi.org/10.1109/ICASSP.2019.8683277
Klejsa J, Hedelin P, Zhou C, Fejgin R, Villemoes L. High-quality speech coding with sample RNN. InICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019 May 12 (pp. 7155-7159). IEEE.
https://doi.org/10.1109/ICASSP.2019.8682435
Xia Y, Braun S, Reddy CK, Dubey H, Cutler R, Tashev I. Weighted speech distortion losses for neural-network-based real-time speech enhancement. InICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2020 May 4 (pp. 871-875). IEEE.
https://doi.org/10.1109/ICASSP40776.2020.9054254
Afouras T, Chung JS, Zisserman A. My lips are concealed: Audio-visual speech enhancement through obstructions. arXiv preprint arXiv:1907.04975. 2019 Jul 11.
https://doi.org/10.21437/Interspeech.2019-3114
Liao CF, Tsao Y, Lu X, Kawai H. Incorporating symbolic sequential modeling for speech enhancement. arXiv preprint arXiv:1904.13142. 2019 Apr 30.
https://doi.org/10.21437/Interspeech.2019-1777
Dias JW, McClaskey CM, Harris KC. Time-Compressed Speech Identification is predicted by auditory neural processing, Perceptuomotor speed, and executive functioning in younger and older listeners. Journal of the Association for Research in Otolaryngology. 2019 Feb;20(1):73-88.
https://doi.org/10.1007/s10162-018-00703-1
Yellamsetty A, Bidelman GM. Brainstem correlates of concurrent speech identification in adverse listening conditions. Brain research. 2019 Jul 1;1714:182-92.
https://doi.org/10.1016/j.brainres.2019.02.025
Westbury C, Hollis G. Conceptualizing syntactic categories as semantic categories: Unifying part-of-speech identification and semantics using co-occurrence vector averaging. Behavior research methods. 2019 Jun;51(3):1371-98.
https://doi.org/10.3758/s13428-018-1118-4
Kasim, Z.R. (2019). An Acoustic Investigation Of The Glottal Stop In Arabic, ICPhS2019, 3275-3279.
Zahn CJ, Hopper R. Measuring language attitudes: The speech evaluation instrument. Journal of language and social psychology. 1985 Jun;4(2):113-23.
https://doi.org/10.1177/0261927X8500400203
Deshpande, P., Chitode, J., Spectral Correlative Mapping Approach for Transformation of Expressivity in Marathi Speech, (2018) International Journal on Communications Antenna and Propagation (IRECAP), 8 (1), pp. 41-51.
https://doi.org/10.15866/irecap.v8i1.13895
Yang W, Moon HJ. Cross-modal effects of noise and thermal conditions on indoor environmental perception and speech recognition. Applied Acoustics. 2018 Dec 1;141:1-8.
https://doi.org/10.1016/j.apacoust.2018.06.019
Long MA, Katlowitz KA, Svirsky MA, Clary RC, Byun TM, Majaj N, Oya H, Howard III MA, Greenlee JD. Functional segregation of cortical regions underlying speech timing and articulation. Neuron. 2016 Mar 16;89(6):1187-93.
https://doi.org/10.1016/j.neuron.2016.01.032
Christensen JA, Sis J, Kulkarni AM, Chatterjee M. Effects of age and hearing loss on the recognition of emotions in speech. Ear and hearing. 2019 Sep;40(5):1069.
https://doi.org/10.1097/AUD.0000000000000694
Hall AC, Kenway B, Sanli H, Birman CS. Cochlear Implant Outcomes in Large Vestibular Aqueduct Syndrome-Should We Provide Cochlear Implants Earlier?. Otology & Neurotology. 2019 Sep 1;40(8):e769-73.
https://doi.org/10.1097/MAO.0000000000002314
Alsharhan E, Ramsay A. Improved Arabic speech recognition system through the automatic generation of fine-grained phonetic transcriptions. Information Processing & Management. 2019 Mar 1;56(2):343-53.
https://doi.org/10.1016/j.ipm.2017.07.002
Fletcher A, McAuliffe M, Kerr S, Sinex D. Effects of vocabulary and implicit linguistic knowledge on speech recognition in adverse listening conditions. American Journal of Audiology. 2019 Oct 16;28(3S):742-55.
https://doi.org/10.1044/2019_AJA-HEAL18-18-0169
Kalia A, Sharma S, Pandey SK, Jadoun VK, Das M. Comparative analysis of speaker recognition system based on voice activity detection technique, MFCC and PLP features. In Intelligent Computing Techniques for Smart Energy Systems 2020 (pp. 781-787). Springer, Singapore.
https://doi.org/10.1007/978-981-15-0214-9_82
Baroi OL, Kabir MS, Niaz A, Islam MJ, Rahimi MJ. Effects of Different coefficients on MFCC and PLP for Bangla Speech Corpus using Tied-state Triphone Model. In 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE) 2019 Feb 7 (pp. 1-6). IEEE.
https://doi.org/10.1109/ECACE.2019.8679395
Ahmed S, Siddique F, Waqas M, Hasan M, ur Rehman S. Viterbi algorithm performance analysis for different constraint length. In 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST) 2019 Jan 8 (pp. 930-932). IEEE.
https://doi.org/10.1109/IBCAST.2019.8667209
Archana VG. Embedded Signcommunication Recognition Using KNN and HMM-VITERBI Fusion Classifiers. In Proceedings of International Conference on Recent Trends in Computing, Communication & Networking Technologies (ICRTCCNT) 2019 Aug 1.
https://doi.org/10.2139/ssrn.3430255
Nassif AB, Shahin I, Attili I, Azzeh M, Shaalan K. Speech recognition using deep neural networks: A systematic review. IEEE Access. 2019 Feb 1;7:19143-65.
https://doi.org/10.1109/ACCESS.2019.2896880
Sirignano J, Spiliopoulos K. Mean field analysis of deep neural networks. Mathematics of Operations Research. 2021 Apr 21.
Scharenborg O, van der Gouw N, Larson M, Marchiori E. The representation of speech in deep neural networks. In International Conference on Multimedia Modeling 2019 Jan 8 (pp. 194-205). Springer, Cham.
https://doi.org/10.1007/978-3-030-05716-9_16
Vandbakk M, Olaff HS, Holth P. Conditioned reinforcement: the effectiveness of stimulus-stimulus pairing and operant discrimination procedures. The Psychological Record. 2019 Mar;69(1):67-81.
https://doi.org/10.1007/s40732-018-0318-8
Sztahó D, Kiss G, Tulics MG, Hajduska-Dér B, Vicsi K. Automatic discrimination of several types of speech pathologies. In 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) 2019 Oct 10 (pp. 1-6). IEEE.
https://doi.org/10.1109/SPED.2019.8906556
Ogawa A, Hirao T, Nakatani T, Nagata M. ILP-based Compressive Speech Summarization with Content Word Coverage Maximization and Its Oracle Performance Analysis. InICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019 May 12 (pp. 7190-7194). IEEE.
https://doi.org/10.1109/ICASSP.2019.8683543
Theodosis E, Maragos P. Tropical modeling of weighted transducer algorithms on graphs. InICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2019 May 12 (pp. 8653-8657). IEEE.
https://doi.org/10.1109/ICASSP.2019.8683127
Meng F, Jia Y, Wang W, Ning X. Evaluation Methods of English Advanced Pronunciation Skills Based on Speech Recognition. In International Conference on Intelligent and Interactive Systems and Applications 2019 Jun 28 (pp. 213-221). Springer, Cham.
https://doi.org/10.1007/978-3-030-34387-3_26
Cárdenas, J., Valencia, G., Forero, J., Hydraulic Performance Prediction Methodology in Regenerative Pumps Through CFD Analysis, (2019) International Journal on Energy Conversion (IRECON), 7 (6), pp. 253-262.
https://doi.org/10.15866/irecon.v7i6.18341
Sabry, I., Idrisi, A., Mourad, A., Friction Stir Welding Process Parameters Optimization Through Hybrid Multi-Criteria Decision-Making Approach, (2021) International Review on Modelling and Simulations (IREMOS), 14 (1), pp. 32-43.
https://doi.org/10.15866/iremos.v14i1.19537
Alrashdan MH, Hamzah AA, Majlis B. Design and optimization of cantilever based piezoelectric micro power generator for cardiac pacemaker. Microsystem Technologies. 2015 Aug;21(8):1607-17.
https://doi.org/10.1007/s00542-014-2334-1
El Aissaoui El Meliani, M., Debab, A., Benhamou, A., Amen, T., Terashima, M., Yasui, H., Modelling the (α)-factor in a Pneumatic Bioreactor Using the Taguchi Approach, (2020) International Review on Modelling and Simulations (IREMOS), 13 (4), pp. 252-259.
https://doi.org/10.15866/iremos.v13i4.19173
Alrashdan, M., Quality and Damping Factors Optimization Using Taguchi Methods in Cantilever Beam Based Piezoelectric Micro-Power Generator for Cardiac Pacemaker Applications, (2020) International Review on Modelling and Simulations (IREMOS), 13 (2), pp. 74-84.
https://doi.org/10.15866/iremos.v13i2.18347
https://www.york.ac.uk/depts/maths/tables/orthogonal.htm, [Accessed 7, Jul, 2021].
https://ccrma.stanford.edu/~hskim08/lpc/, [Accessed 7, Jul, 2021].
Refbacks
- There are currently no refbacks.
Please send any question about this web site to info@praiseworthyprize.com
Copyright © 2005-2024 Praise Worthy Prize