Open Access Open Access  Restricted Access Subscription or Fee Access

A Text-to-Audiovisual Synthesizer for Indonesian by Morphing Viseme

(*) Corresponding author

Authors' affiliations



There are many researches held on the text-to-audiovisual, but only a few are applied on Indonesian language. The results of the present research can be applied to a very wide field, e.g. gaming industry, animation industry, human computer interaction systems, etc. The correspondence among speech, mouth movements (visual phoneme/viseme) and phoneme spoken is needed to produce a realistic text-to-audiovisual. This research aims to develop a text-to-audiovisual synthesizer for Indonesian language based on inputted Indonesian text called TTAVI (Text-To-AudioVisual synthesizer for Indonesian language). The method consists of four major parts, namely, building the models of Indonesian’s viseme, converting a text-to-speech, synchronization process, and stringing the visemes by using the morphing viseme algorithm. Morphing viseme algorithm shows that a virtual character of the phonemes pronunciation resulting from the TTAVI synthesizer is smoother. 10 Indonesian texts inputted to TTAVI synthesizer were examined by 30 users. The appraisal results of  users were calculated by applying Mean Opinion Score (MOS) methods. The average of the MOS score is 4.106 with a value range from 1 to 5. This shows that TTAVI synthesizer is considered good, and morphing viseme algorithm is able to make the result of TTAVI synthesizer smoother.
Copyright © 2015 Praise Worthy Prize - All rights reserved.


Audiovisual; Viseme; A Model of Indonesian’s Visemes; Indonesian Text; Morphing Viseme

Full Text:



Jian-Qing Wang; Ka-Ho Wong; Pheng-Ann Heng; Meng, H.M.; Tien-Tsin Wong, "A real-time Cantonese text-to-audiovisual speech synthesizer," in Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on , vol.1, no., pp.I-653-6 vol.1, 17-21 May 2004.

Hui Zhao and Chaojing Tang, Visual Speech Synthesis Based On Chinese Dynamic Visemes, Proceeding of the IEEE International Conference on Information and Automation, June 20-23, Zhanjiajie, China, pp 139-143, 2008.

Luo, R.C.; Chien-Chieh Huang; Shu-Ruei Chang; Yi-Jeng Tsai, "Speech synchronization between speech and lip shape movements for service robotics applications," in IECON 2011 - 37th Annual Conference on IEEE Industrial Electronics Society , vol., no., pp.2261-2266, 7-10 Nov. 2011.

M. Nuh, Mendikbud : Internasionalisasi Bahasa Indonesia, Surat Kabar : Suara Pembaharuan Tanggal 29 Oktober 2013, 2013.

Wai Chee Yau, March, Video Analysis of Mouth Movement Using Motion Templates for Computer-based Lip-Reading, RMIT University, Australia, 2008.

Bernard Tiddeman and David Perrett, Prototyping and Transforming Visemes for Animated Speech, Proceedings of The Computer Animation 2002 (CA 2002), IEEE Computer Society, 2002.

Sarah L. Taylor, Moshe Mahler, Barry-John Theobald and Iain Matthews, Dynamic Unit of Visual Speech, ACM SIGGRAPH Symposium on Computer Animation, 2012.

H. Prendinger, M. Ishizuka, Life-Like Character; Tools, Affective Functions, and Application, Springer, ISBN : 978-3-540-00867-5, pp 3-16, 2004.

Mohammad Aghaahmadi, Mohammad Mahdi Dehshibi, Azam Bastanfard and Mahmood Fazlali, Clustering Persian Viseme Using Phoneme Subspace for Developing Visual Speech Application, Multimedia Tools and Applications An international Journal, ISSN 1380-7501, Vol. 51, No. 3, pp 1-23, February 2012.

Bianca Aschenberner and Christian Weiss, Phoneme-Viseme Mapping for German Video-Realistic Audio-Visual-Speech- Synthesis, IKP-Working Paper NF 11, Institut für Kommunications for Schung und Phonetik, Universität Bonn, pp 1-11, 2005.

I. Mazonaviciute, R. Bausys, Translingual Visemes Mapping for Lithuanian Speech Animation, Department of Graphical Systems, Vilnius Gediminas Technical University, ISSN 1392-1215, pp. 95-98, 2011.

Arifin, Mulyono, Surya Sumpeno, Mochamad Hariadi, Towards Building Indonesian Viseme : A Clustering-Based Approach, CYBERNETICSCOM 2013 IEEE International Conference on Computational Intelegence and Cybernetics, pp 57-61, December 2013.

T. Ezzat and T. Poggio, Visual Speech Synthesis by Morphing Visemes, International Journal of Computer Vision, vol.38, no.1, pp.45-57, 2000.

N. Dahiya and K. Khanna, Implementation of Image Morphing Algorithm, International Journal of Engineering Trends and Technology (IJETT), vol. 4, no. 6, pp. 2481-2484, 2013.

Handi Dwi Rachma, Zonda Rugmiaga, Miftahul Huda, The Text-To-Speech Synthesis To Indoneian Speakers, The 13th Industrial Electronics Seminar (IES 2011) Electrnic Engineering Polytechnic Institute of Surabya, pp. 311-315, Indonesia, October 26, 2011.

I. Mazonaviciute, R. Bausys, Translingual Visemes Mapping for Lithuanian Speech Animation, Electronics & Electrical Engineering, vol.27, no.111, pp. 95-98, 2011.

42_3_08, Advance online publication : 10 July 2015._Endang Setyati, Surya Sumpeno, Mauridhi Hery Purnomo, Koji Mikami, Masanori Kakimoto, Kunio Kondo, Phoneme-Viseme Mapping for Indonesian Language Based on Blend Shape Animation, IAENG International Journal of Computer Science, 42:3, IJCS

Undang-Undang Dasar Republik Indonesia Tahun 1945, Pasal 36 : “Bahasa Negara ialah Bahasa Indonesia”.

Naskah Sumpah Pemuda, Secarik Kertas untuk Indonesia, Majalah Tempo, 27 Oktober 2008.

Kridalaksana H., Pendekatan Historis dalam Kajian Bahasa Melayu dan Bahasa Indonesia, Sebuah Bunga Rampai, Penerbit Kanisius, Yogyakarta, 1991.

Soenjono Dardjowidjojo, Bahasa Indonesia sebagai BahasaIinternasional?, Menabur Benih Menuai Kasih, Yayasan Obor Indonesia, hlm 65-81, Jakarta, 2004.

Subaryani D.H. Soedirdjo, Hasballah Zakaria, Richard Mengko, Indonesian Text-to-Speech Syllable Concatenation for PC-based Low Vision Aid, 2011 International Conference on Electrical Engineering and Informatics, Bandung, Indonesia, 17-19 July 2011.

Luca Capellena, Naomi Harie, ”Viseme Definitions Comparison for Visual-Only Speech Recognition”, 19th European Signal Processing Conf., Barcelona, Spain, 2011.

Arifin and Ketut Eddy Purnama, Classification of Emotions in Indonesian Texts Using K-NN Method, International Journal of Information and Electronics Engineering (IJIEE), vol. 2, No. 6, pp 899-903, 6 November 2012.

H. Alwi, S. Dardjowidjojo, H. Lapoliwa, and A.M. Moeliono, Tata Bahasa Baku Bahasa Indonesia (Indonesian Grammar), Balai Pustaka, Jakarta, Indonesia, 2003.

Putra Negara A.B. and Arman, A.A., Syntax Based Prosody Modelling Using HMM for Bahasa Indonesia, 2013 Conference on Asian Spoken Language Research and Evaluation, Gurgaon, pp 1 – 4, 25-27 November 2013.

Aamir Khan, Hasan Farooq, PCA-LDA Feature Extractor for Pattern Recognition, IJCSI International Journal of Computer Science Issues, Vol 8, ISSN : 1694-0814, pp. 267-270, 2011.

Muntasa, A., Facial recognition using square diagonal matrix based on Two-Dimensional linear discriminant analysis, (2015) International Review on Computers and Software (IRECOS), 10 (7), pp. 718-725.


  • There are currently no refbacks.

Please send any question about this web site to
Copyright © 2005-2023 Praise Worthy Prize