Hidden Markov Model based Speech Synthesis: A Review

Sangramsing Kayte; Monica Mundada; Jayesh Gujrathi

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Hidden Markov Model based Speech Synthesis: A Review

by Sangramsing Kayte, Monica Mundada, Jayesh Gujrathi

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 130 - Number 3

Year of Publication: 2015

Authors: Sangramsing Kayte, Monica Mundada, Jayesh Gujrathi

10.5120/ijca2015906965

Sangramsing Kayte, Monica Mundada, Jayesh Gujrathi . Hidden Markov Model based Speech Synthesis: A Review. International Journal of Computer Applications. 130, 3 ( November 2015), 35-39. DOI=10.5120/ijca2015906965

@article{ 10.5120/ijca2015906965,

author = { Sangramsing Kayte, Monica Mundada, Jayesh Gujrathi },

title = { Hidden Markov Model based Speech Synthesis: A Review },

journal = { International Journal of Computer Applications },

issue_date = { November 2015 },

volume = { 130 },

number = { 3 },

month = { November },

year = { 2015 },

issn = { 0975-8887 },

pages = { 35-39 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume130/number3/23191-2015906965/ },

doi = { 10.5120/ijca2015906965 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:24:03.426796+05:30

%A Sangramsing Kayte

%A Monica Mundada

%A Jayesh Gujrathi

%T Hidden Markov Model based Speech Synthesis: A Review

%J International Journal of Computer Applications

%@ 0975-8887

%V 130

%N 3

%P 35-39

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

A Text-to-speech (TTS) synthesis system is the artificial production of human system. This paper reviews recent research advances in field of speech synthesis with related to statistical parametric approach to speech synthesis based on HMM. In this approach, Hidden Markov Model based Text to speech synthesis (HTS) is reviewed in brief. The HTS is based on the generation of an optimal parameter sequence from subword HMMs. The quality of HTS framework relies on the accurate description of the phoneset. The most attractive part of HTS system is the prosodic characteristics of the voice can be modified by simply varying the HMM parameters, thus reducing the large storage requirement.

References

T. Dutoit, “An Introduction to Text-to-Speech Synthesis”, Kluwer Academic Publishers, 1997.
Black, A. Zen, H., Tokuda, K. “Statistical Parametric Synthesis”, in proc. ICASSP, Honululu, USA,2007.
X.Huang, A.Acero, H.-W. Hon, “Spoken Language Processing”, Prentice Hall PTR, 2001.
D. Jurafsky and J.H. Martin, “Speech and Language Processing”, Pearson Education, 2000.
Paul Taylor, “Text to Speech Synthesis”, University of Cambridge, pp.442-446.
Newton, “Review of methods of Speech Synthesis”, M.Tech Credit Seminar Report, Electronic Systems Group, November, 2011, pp. 1-15
Christopher Richards, “Normalization of non-standard words”. Computer Speech and Language (2001), pp.287–333
M.B.Chandak, Dr.R.V.Dharaskar and Dr.V.M.Thakre,”Text to Speech with Prosody Feature: Implementation of Emotion in Speech Output using Forward Parsing”, International Journal of Computer science and Security, Volume (4), Issue (3)
Ramani Boothalingam,V Sherlin Solomi, Anushiya Rachel Gladston,S Lilly Christina, “Development and Evaluation of Unit Selection and HMM-Based Speech Synthesis Systems for Tamil”, 978-1-4673-5952-8/13, IEEE 2013 National Conference
Heiga Zen, Tomoki Toda and Keiichi Tokuda. “The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006”, INTERSPEECH 2005.
J. Ferguson, Ed., “Hidden Markov Models for speech” IDA, Princeton, NJ, 1980
L.R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition” Proc. IEEE, 77(2), pp.257-286, 1989
L.R.Rabiner and B.H. Juang, “Fundamentals of speech recognition”, Prentice-Hall, Englewood Cliff,New Jersey,1993.
Furtado X A & Sen A, “Synthesis of unlimited speech in Indian Languages using formant-based rules”’ Sadhana,1996,pp 345-362 .
Agrawal S S & Stevens K, “Towards synthesis of Hindi consonants using KLSYN88”, Proc ICSLP92, Canada, 1992, pp.177-180 .
Dan T K, Datta A K & Mukherjee, B, “Speech synthesis using signal concatenation”, J ASI, vol. XVIII (3&4), 1995, pp 141-145 .
Kishore S. P., Kumar R & Sanghal R, “A data driven synthesis approach for Indian language using syllable as basic unit”, Proc ICON 2002, Mumbai, 2002 .
Agrawal S. S. 2010, “Recent Developments in Speech Corpora in Indian Languages: Country Report of India”, O-COCOSDA, Nepal.
B. Ramani, S.Lilly Christina, G Anushiya Rachel, V Sherlin Solomi,Mahesh Kumar Nandwana, Anusha Prakash,, Aswin Shanmugam S, Raghava Krishnan, S P Kishore, K Samudravijaya, P Vijayalakshmi, T Nagarajan and Hema A Murthy.” A Common Attribute based Unified HTS framework for Speech Synthesis in Indian Languages”. 8th ISCA Speech Synthesis Workshop. August 31 – September 2, 2013,Barcelona, Spain
Zen HeigaNose, Takashi Yamagishi, Junichi Sako, Shinji Masuko, Takashi, Black, Alan W.” The HMM-based speech synthesis system (HTS) version 2.0”. 6th ISCA Workshop on Speech Synthesis, Bonn, Germany, August 22-24, 2007.
K. Tokuda , H. Zen, J. Yamagishi, T. Masuko, S. Sako, T. Toda, A.W. Black, T. Nose , and K. Oura, “The HMM based synthesis system(HTS)” http://hts.sp.nitech.ac.jp/.
H. Zen, K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura,“A hidden semi-Markov model-based speech synthesis system.” IEICE Trans. Inf.Syst., E90-D (5):825–834, 2007.
J. Yamagishi and T. Kobayashi. Average-voice based speech synthesis using HSMM-based speaker adaptation and adaptive training. IEICE Trans. Inf. Syst., E90-D (2):533–543, 2007.
J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, “Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm”, IEEE Trans. Audio Speech Lang. Process., 17(1), pp.66–83, 2009.
H. Kawahara, I. Masuda-Katsuse, and A.de Cheveign´e, “Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds”, Speech Comm., 27:187–207, 1999.
H. Zen, T. Toda, M. Nakamura, and K. Tokuda, “Details of Nitech HMM based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst., E90-D(1):325–333, Jan. 2007.
H. Zen, T. Toda, and K. Tokuda, “The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006”, In Blizzard Challenge Workshop, 2006.
J. Yamagishi, T. Nose, H. Zen, Z.-H. Ling, T. Toda, K. Tokuda, S. King, and S. Renals,“A robust speaker-adaptive HMM-based text-to-speech synthesis”, IEEE Trans. Audio Speech Lang. Process., 2009. (accept for publication).
T.Yoshimura, K.Tokuda, T. Masuko, T. Kobayashi and T. Kitamura,“Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-Based Speech Synthesis”In Proc. of ICASSP 2000, vol 3, pp.1315-1318, June 2000.
Dempster, A., Laird, N., Rubin, D., 1977,“ Maximum likelihood from incomplete data via the EM algorithm”, Journal of Royal Statistics Society 39, 1–38.
Fukada,T., Tokuda, K., Kobayashi, T., Imai, S., 1992, “An adaptive algorithm for mel-cepstral analysis of speech”, In Proc. ICASSP. pp. 137–140.
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X.-Y., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P., 2006,“The Hidden Markov Model Toolkit (HTK) version 3.4. http://htk.eng.cam.ac.uk/.
Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T. 1998, “Duration modeling for HMM-based speech synthesis”, In Proc. ICSLP. pp. 29–32.
Ishimatsu, Y., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T., 2001,“Investigation of state duration model based on gamma distribution for HMM based speech synthesis”, In Tech. Rep. of IEICE. vol. 101 of SP 2001-81. pp. 57–62, (In Japanese).
Odell, J., 1995,“The use of context in large vocabulary speech recognition”, Ph.D. thesis, University of Cambridge.
Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T., 2000,“Speech parameter generation algorithms for HMM-based speech synthesis”In Proc. ICASSP. pp. 1315–1318.
Tachiwa, W., Furui, S., “A study of speech synthesis using HMMs” In: Proc. Spring Meeting of ASJ. pp. 239–240,(In Japanese),1999.
Imai, S., Sumita, K., Furuichi, C., “Mel log spectrum approximation (MLSA) filter for speech synthesis”, Electronics and Communications in Japan 66 (2), 10–18, 1983 .
Stylianou, Y., Cap´pe,O., Moulines, E., 1998, “Continuous probabilistic transform for voice conversion”, IEEE Trans. Speech Audio Process. 6 (2), 131–142.
Sangramsing Kayte , Kavita Waghmare , Dr. Bharti Gawali "Marathi Speech Synthesis: A review" International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169 Volume: 3 Issue: 6
Monica Mundada, Bharti Gawali, Sangramsing Kayte "Recognition and classification of speech and its related fluency disorders" Monica Mundada et al, / (IJCSIT)International Journal of Computer Science and Information Technologies, Vol. 5 (5) , 2014, 6764-6767
Monica Mundada, Sangramsing Kayte, Dr. Bharti Gawali "Classification of Fluent and Dysfluent Speech Using KNN Classifier" International Journal of Advanced Research in Computer Science and Software Engineering Volume 4, Issue 9, September 2014
Sangramsing Kayte, Monica Mundada "Study of Marathi Phones for Synthesis of Marathi Speech from Text" International Journal of Emerging Research in Management &Technology ISSN: 2278-9359 (Volume-4, Issue-10) October 2015.

Index Terms

Computer Science

Information Sciences

Keywords

TTS speech corpus Marathi phonemes.