CFP last date
20 June 2024
Reseach Article

A Technical Guide to Concatenative Speech Synthesis for Hindi using Festival

by Somnath Roy
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 86 - Number 8
Year of Publication: 2014
Authors: Somnath Roy

Somnath Roy . A Technical Guide to Concatenative Speech Synthesis for Hindi using Festival. International Journal of Computer Applications. 86, 8 ( January 2014), 30-34. DOI=10.5120/15008-3287

@article{ 10.5120/15008-3287,
author = { Somnath Roy },
title = { A Technical Guide to Concatenative Speech Synthesis for Hindi using Festival },
journal = { International Journal of Computer Applications },
issue_date = { January 2014 },
volume = { 86 },
number = { 8 },
month = { January },
year = { 2014 },
issn = { 0975-8887 },
pages = { 30-34 },
numpages = {9},
url = { },
doi = { 10.5120/15008-3287 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
%0 Journal Article
%1 2024-02-06T22:03:43.123373+05:30
%A Somnath Roy
%T A Technical Guide to Concatenative Speech Synthesis for Hindi using Festival
%J International Journal of Computer Applications
%@ 0975-8887
%V 86
%N 8
%P 30-34
%D 2014
%I Foundation of Computer Science (FCS), NY, USA

Speech is the most natural and effective medium of communication among the human beings. Speech has played a great role in the evolution of human civilization. Speech synthesis is an artificial way of producing speech. The native speakers of any language use their knowledge base of various prosodic features during the speech production. These features they acquire unconsciously in childhood. With the help of these features, they are capable of expressing the meaning of any utterance and emotional states. It is still a challenging task to bring similar naturalness in artificial speech production (speech synthesis). This paper covers the details of how to develop a speech synthesizer using Festival tool. Two approaches have been discussed in details: Limited domain synthesis technique and Unit selection synthesis for Hindi. Apart from that how to configure Festival tool on Linux so that one can start working for a TTS also has been discussed. The purpose this paper is to give the full insights of technicalities involved while manipulating festival for a new language to the naive speech researchers.

  1. Black & Lenzo. (2003). Building synthetic voices. http://festvox. org/bsv/.
  2. Anderson, M. , Pierrehumbert, J. & Liberman, M. Y. (1984). Synthesis by rule of English intonation patterns. IEEE Congress on Acoustics, Speech, and Signal Processing: pp 77-80.
  3. Ashwin Bellur, K Badri Narayan, Raghava Krishnan K, Hema A Murthy. Prosody Modeling for Syllable-Based Concatenative Speech Synthesis of Hindi and Tamil. IEEE 2011.
  4. Anumanchipalli K. Gopala, Cheng Ying-Chang, Fernandez Joseph, Huang Xiaohan, Mao Qi, Black W. Alan, (n. d. ). Klattstat: Knowledge based parametric speech synthesis.
  5. Atal B. S and Hanauer Suzanne L. (1971). Speech analysis and synthesis by linear prediction of the speech wave. The journal of acoustic society of America: pp 637-655.
  6. Bagshaw Christopher Paul. (1994). Automatic prosodic analysis for computer aided pronunciation teaching, Ph. d thesis.
  7. Black & Kominek. (2009). Optimizing Segment Label Boundaries forStatistical Speech Synthesis. IEEE: pp. 3785-3788.
  8. Black W Alan, Hunt J Andrew. (1996). Unit Selection Synthesis in a Concatenative Speech Synthesis Using a Large Speech Database. IEEE: pp. 373-376.
  9. Buric M. R, Kohut J & Olive J. P. (1981). Digital Signal Processor: Speech Synthesis, The Bell system technical journal, vol. 60. pp. 1621 -1631.
  10. Buza Ovidiu, Gavril Toderean, Jozsef Domokos . (2010). A rule based approach to build a text to speech system for Romanian, in proceedings of international Conference on communications. pp. 33-36.
  11. Chomsky & Halle. (1968). The sound pattern of English. New York: Harper & Row Publishers.
  12. Cutler, A. , Dahan D. & Donselaar. (1997),Prosody in the comprehension of spoken language: A literature review, Language and Speech,141-201.
  13. Dutoit Theirry. (1993). High quality text-to-speech synthesis of the french language; Ph. d thesis.
  14. Fabio Tamburini. (2003). Automatic prosodic prominence detection in speech usingacoustic Features: an unsupervised system. Eurospeech 2003: pp. 129-132.
  15. Fujisaki, Hirose & Kawai (1986). Generation of prosodic symbols for rule synthesis of connected speech of japenese. IEEE: pp. 2415-2418.
  16. Fujisaki,Ohno. (2002) A preliminary study on the modelling of fundamental frequency of Thai utterances. IEEE: pp. 516-519.
  17. Fujisaki, Ljungqvisi, Murata(1993). Analysis and modelling of word accent and sentence intonation in Swedish. IEEE: pp. 211-214.
  18. Harrington Jonathan (2010). The Phonetic Analysis of Speech Corpora. Delhi: Jhon & Wiley.
  19. Huang X. , Acero A. , Hon H. , (2001). Spoken Language Processing. Prentice-Hall, PTR, Upper Saddle River, NJ.
  20. Jonathan Allen, M Sharron Hunnicutt & Klaat Dennis. (1987). From Text to Speech: The MITalk System.
  21. Klatt H. Dennis. (1987). Review of text-to-speech conversion for English, JASA, vol. 82(3): pp. 737-793.
Index Terms

Computer Science
Information Sciences


Speech synthesis Prosody Speech production Linguistic module Phonemic Phonetic F0 Pitch Festival