A Technical Guide to Concatenative Speech Synthesis for Hindi using Festival

Somnath Roy

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

A Technical Guide to Concatenative Speech Synthesis for Hindi using Festival

by Somnath Roy

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 86 - Number 8

Year of Publication: 2014

Authors: Somnath Roy

10.5120/15008-3287

Somnath Roy . A Technical Guide to Concatenative Speech Synthesis for Hindi using Festival. International Journal of Computer Applications. 86, 8 ( January 2014), 30-34. DOI=10.5120/15008-3287

@article{ 10.5120/15008-3287,

author = { Somnath Roy },

title = { A Technical Guide to Concatenative Speech Synthesis for Hindi using Festival },

journal = { International Journal of Computer Applications },

issue_date = { January 2014 },

volume = { 86 },

number = { 8 },

month = { January },

year = { 2014 },

issn = { 0975-8887 },

pages = { 30-34 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume86/number8/15008-3287/ },

doi = { 10.5120/15008-3287 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:03:43.123373+05:30

%A Somnath Roy

%T A Technical Guide to Concatenative Speech Synthesis for Hindi using Festival

%J International Journal of Computer Applications

%@ 0975-8887

%V 86

%N 8

%P 30-34

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Speech is the most natural and effective medium of communication among the human beings. Speech has played a great role in the evolution of human civilization. Speech synthesis is an artificial way of producing speech. The native speakers of any language use their knowledge base of various prosodic features during the speech production. These features they acquire unconsciously in childhood. With the help of these features, they are capable of expressing the meaning of any utterance and emotional states. It is still a challenging task to bring similar naturalness in artificial speech production (speech synthesis). This paper covers the details of how to develop a speech synthesizer using Festival tool. Two approaches have been discussed in details: Limited domain synthesis technique and Unit selection synthesis for Hindi. Apart from that how to configure Festival tool on Linux so that one can start working for a TTS also has been discussed. The purpose this paper is to give the full insights of technicalities involved while manipulating festival for a new language to the naive speech researchers.

References

Black & Lenzo. (2003). Building synthetic voices. http://festvox. org/bsv/.
Anderson, M. , Pierrehumbert, J. & Liberman, M. Y. (1984). Synthesis by rule of English intonation patterns. IEEE Congress on Acoustics, Speech, and Signal Processing: pp 77-80.
Ashwin Bellur, K Badri Narayan, Raghava Krishnan K, Hema A Murthy. Prosody Modeling for Syllable-Based Concatenative Speech Synthesis of Hindi and Tamil. IEEE 2011.
Anumanchipalli K. Gopala, Cheng Ying-Chang, Fernandez Joseph, Huang Xiaohan, Mao Qi, Black W. Alan, (n. d. ). Klattstat: Knowledge based parametric speech synthesis.
Atal B. S and Hanauer Suzanne L. (1971). Speech analysis and synthesis by linear prediction of the speech wave. The journal of acoustic society of America: pp 637-655.
Bagshaw Christopher Paul. (1994). Automatic prosodic analysis for computer aided pronunciation teaching, Ph. d thesis.
Black & Kominek. (2009). Optimizing Segment Label Boundaries forStatistical Speech Synthesis. IEEE: pp. 3785-3788.
Black W Alan, Hunt J Andrew. (1996). Unit Selection Synthesis in a Concatenative Speech Synthesis Using a Large Speech Database. IEEE: pp. 373-376.
Buric M. R, Kohut J & Olive J. P. (1981). Digital Signal Processor: Speech Synthesis, The Bell system technical journal, vol. 60. pp. 1621 -1631.
Buza Ovidiu, Gavril Toderean, Jozsef Domokos . (2010). A rule based approach to build a text to speech system for Romanian, in proceedings of international Conference on communications. pp. 33-36.
Chomsky & Halle. (1968). The sound pattern of English. New York: Harper & Row Publishers.
Cutler, A. , Dahan D. & Donselaar. (1997),Prosody in the comprehension of spoken language: A literature review, Language and Speech,141-201.
Dutoit Theirry. (1993). High quality text-to-speech synthesis of the french language; Ph. d thesis.
Fabio Tamburini. (2003). Automatic prosodic prominence detection in speech usingacoustic Features: an unsupervised system. Eurospeech 2003: pp. 129-132.
Fujisaki, Hirose & Kawai (1986). Generation of prosodic symbols for rule synthesis of connected speech of japenese. IEEE: pp. 2415-2418.
Fujisaki,Ohno. (2002) A preliminary study on the modelling of fundamental frequency of Thai utterances. IEEE: pp. 516-519.
Fujisaki, Ljungqvisi, Murata(1993). Analysis and modelling of word accent and sentence intonation in Swedish. IEEE: pp. 211-214.
Harrington Jonathan (2010). The Phonetic Analysis of Speech Corpora. Delhi: Jhon & Wiley.
Huang X. , Acero A. , Hon H. , (2001). Spoken Language Processing. Prentice-Hall, PTR, Upper Saddle River, NJ.
Jonathan Allen, M Sharron Hunnicutt & Klaat Dennis. (1987). From Text to Speech: The MITalk System.
Klatt H. Dennis. (1987). Review of text-to-speech conversion for English, JASA, vol. 82(3): pp. 737-793.

Index Terms

Computer Science

Information Sciences

Keywords

Speech synthesis Prosody Speech production Linguistic module Phonemic Phonetic F0 Pitch Festival