CFP last date
22 April 2024
Reseach Article

Concatenative Speech Synthesis: A Review

by Rubeena A. Khan, J. S. Chitode
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 136 - Number 3
Year of Publication: 2016
Authors: Rubeena A. Khan, J. S. Chitode
10.5120/ijca2016907992

Rubeena A. Khan, J. S. Chitode . Concatenative Speech Synthesis: A Review. International Journal of Computer Applications. 136, 3 ( February 2016), 1-6. DOI=10.5120/ijca2016907992

@article{ 10.5120/ijca2016907992,
author = { Rubeena A. Khan, J. S. Chitode },
title = { Concatenative Speech Synthesis: A Review },
journal = { International Journal of Computer Applications },
issue_date = { February 2016 },
volume = { 136 },
number = { 3 },
month = { February },
year = { 2016 },
issn = { 0975-8887 },
pages = { 1-6 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume136/number3/24130-2016907992/ },
doi = { 10.5120/ijca2016907992 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:36:00.132091+05:30
%A Rubeena A. Khan
%A J. S. Chitode
%T Concatenative Speech Synthesis: A Review
%J International Journal of Computer Applications
%@ 0975-8887
%V 136
%N 3
%P 1-6
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The primary objective of this paper is to provide an overview of existing Concatenative Text-To-Speech synthesis techniques. Concatenative speech synthesis can be broadly categorized into three categories, Diphone Based, Corpus based and Hybrid. Diphone based speech synthesis relies on different signal processing techniques such as PSOLA, FD-PSOLA etc. These signal processing techniques introduce unwanted artifacts in the synthesized speech. The most popularly used method is the Unit selection synthesis which is a corpus based synthesis method. This method produces the most natural sounding synthetic speech.

References
  1. Sak, Hasim, Tunga GUNGOR, and Yasar SAFKAN. "A corpus-based concatenative speech synthesis system for Turkish." Turk J Elec Engin 14.2 (2006).
  2. Newton, P.S.R. :Review of methods of Speech Synthesis, EE Dept., IIT Bombay(November 2011)
  3. Tabet Y, and Mohamed Boughazi. "Speech synthesis techniques. A survey." Systems, Signal Processing and their Applications (WOSSPA), 2011 7th International Workshop on. IEEE, 2011.”.
  4. Indumathi A., and E. Chandra. "Survey on speech synthesis." Int J Signal Process 6 (2012): 140-5.
  5. Samuel Thomas, "Natural sounding Text-to-speech synthesis based on syllable-like units," M S Thesis, IIT, Madras,2007.
  6. Indumathi, A., and E. Chandra. "Survey on speech synthesis." Int J Signal Process 6 (2012): 140-5.
  7. M. Nageshwara Rao, Samuel Thomas, T. Nagarajan, and Hema A. Murthy, Text-to-Speech Synthesis using syllable-like units Proceedings of National Conference on Communications, IIT, India. 2005.
  8. Narendra, N. P., and K. Sreenivasa Rao. "Optimal weight tuning method for unit selection cost functions in syllable based text-to-speech synthesis." Applied Soft Computing 13.2 (2013): 773-781.
  9. A.J. Hunt and A.W. Black, “Unit selection in a concatenative speech synthesis system using a large speech database,” in Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing, vol. 1, pp. 373–376, 1996.
  10. Heather Cryer and Sarah Home (2010),"Review of methods for evaluating synthetic speech", RNIB Centre for Accessible Information (CAI) Technical report No. 8, 1-12.
  11. Aimilios Chalamandaris, Sotiris Karabetsos, Pirros Tsiakoulis,and Spyros Raptis(2010)," A Unit Selection Text-to-Speech Synthesis System Optimized for Use with Screen Readers ", IEEE Transactions on Consumer Electronics, Vol. 56, No. 3, 1890-1897.
  12. E.veera raghavendra, srinivas Desai, B.yegnanarayana , Alan W.Black, Kishore Prahallad ,“Global Syllable Set For Building Speech Synthesis In Indian Languages “,in Proceedings of IEEE 2008 workshop on Spoken Language Technologies, Goa, India, December 2008.
  13. John Kominek, Tanja Schultz, and Alan W. Black. "Synthesizer voice quality of new languages calibrated with mean mel cepstral distortion." In SLTU, pp. 63-68. 2008.
  14. Shinnosuke Takamichi et. al, “Parameter generation methods with rich context models for high-quality and flexible text-to-speech synthesis”, IEEE Journal Of Selected Topics In Signal Processing, Vol. 8,No.2, April 2014 pp 239-250.
  15. Stas Tiomkin, David Malah, Slava Shechtman, and Zvi Kons, “A hybrid text-to-speech system that combines concatenative and statistical synthesis units” IEEE Transactions on Audio, SPEECH, and Language Processing, vol. 19, no. 5, JULY 2011 pp 1278-1288.
  16. Mandal, Shyamal Kumar Das and Datta, Asoke kumar, “Epoch Synchronous non-overlap-add (ESNOLA) method based concatenative speech synthesis system for Bangla”. ISCA workshop on Speech Synthesis, Bonn, Germany, August 22-24, 2007
  17. Toma, Ştefan-Adrian, et al. "A TD-PSOLA based method for speech synthesis and compression." Communications (COMM), 2010 8th International Conference on. IEEE, 2010.
  18. Mattheyses, We-sley, Werner Verhelst, and Piet Verhoeve. "Robust pitch marking for prosodic modification of speech using TD-PSOLA." Proceedings of the IEEE Benelux/DSP Valley Signal Processing Symposium, SPS-DARTS. 2006. pp. 43-46.
  19. Schnell, Norbert, et al. "Synthesizing a choir in real-time using Pitch Synchronous Overlap Add (PSOLA)." Proceedings of the International Computer Music Conference. 2000, pp.102-108
  20. Mukherjee, Sankar, and Shyamal Kumar Das Mandal. "A Bengali speech synthesizer on Android OS." Proceedings of the 1st Workshop on Speech and Multimodal Interaction in Assistive Environments. Association for Computational Linguistics, 2012, pp. 43–46.
  21. Language technological journal of TDIL : vishvabharat Epoch Synchronous Non-Overlapping Add (ESNOLA) Approach Concatenative Text to Speech Synthesis-A Technical Report.[Online]http://tdil.mit.gov.in/
  22. Pammi, Sathish, Marc Schröder, Marcela Charfuelan, Oytun Türk, and Ingmar Steiner. "Synthesis of listener vocalisations with imposed intonation contours." In SSW, 2010, pp. 240-245.
  23. Rao, K. Sreenivasa, and B. Yegnanarayana. "Prosody modification using instants of significant excitation." Audio, Speech, and Language Processing, IEEE Transactions on vol.14, no. ``3 (2006): 972-980.
  24. Heiga Zen, Keiichi Tokuda, Alan W. Black ,“Statistical parametric speech synthesis”, Speech Communication vol.51,no.11,2009,pp. 1039–1064.
  25. Raitio, Tuomo, et al. "HMM-based speech synthesis utilizing glottal inverse filtering." Audio, Speech, and Language Processing, IEEE Transactions on vol.19, no.1, 2011, pp. 153-165.
  26. Yu, Kai, Heiga Zen, François Mairesse, and Steve Young. "Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis." Speech communication, vol.53, no. 6, 2011, pp. 914-923.
  27. Lu, H., Ling, Z. H., Wei, S., Dai, L. R., & Wang, R. H.,” Automatic error detection for unit selection speech synthesis using log likelihood ratio based SVM classifier”. In INTERSPEECH, Vol. 10, 2010, and pp. 162-165).
  28. Fu-Chiang Chou, Chiu-Yu Tseng, and Lin-Shan Lee, “A Set of Corpus-Based Text-to-Speech Synthesis Technologies for Mandarin Chinese”, IEEE Transactions on Speech and Audio Processing, vol. 10, no. 7, 2002, pp 481-494.
  29. V. Kamakshi Prasad, T. Nagarajan and Hema A. Murthy, “Automatic segmentation of continuous speech using minimum phase group delay functions”, Speech Communications, Elsevier publications, Vol.42, 2004,pp.429-446.
  30. Size of Speech Corpora ( As on july 2014) , [Online] http://www.ldcil.org/resourcesSpeechCorp.aspx
  31. John Kominek, Alan W Black,” THE CMU ARCTIC SPEECH DATABASES”, 5th ISCA Speech Synthesis Workshop – Pittsburgh, 2004, pp 223-224.
  32. [Online],CMU ARCTIC speech synthesis databases, http://festvox.org/cmu arctic/
  33. Online],CMU FAF speech synthesis databases, http://festvox.org/cmu_faf/
  34. [ Online],CMU SIN speech synthesis databases, http://festvox.org/cmu_sin/
  35. http://festvox.org/dbs/dbs_kdt.html
  36. Catherine Stevens, Nicole Lees, Julie Vonwiller , Denis Burnham ,” On-line experimental methods to evaluate text-to-speech (TTS) synthesis: effects of voice gender and signal quality on intelligibility, naturalness and preference”, Computer Speech and Language Elsevier publications Vol. 19, 2005,pp 129–146.
  37. Azis, Nur Aziza, et al. "Evaluation of text-to-speech synthesizer for Indonesian language using semantically unpredictable sentences test: IndoTTS, eSpeak, and google translate TTS." Advanced Computer Science and Information System (ICACSIS), 2011 International Conference on. IEEE, pp. 237-242.
  38. Heiga Zen, Norbert Braunschweiler, Sabine Buchholz, Mark J. F. Gales, Kate Knill, Sacha Krstulovic, and Javier Latorre, “Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization”, ." Audio, Speech, and Language Processing, IEEE Transactions on 20, no. 6 (2012),pp 1713-1724.
  39. http://tcts.fpms.ac.be/synthesis/mbrola.html
  40. Marc Schröder and Jürgen Trouvain. "The German text-to-speech synthesis system MARY: A tool for research, development and teaching." International Journal of Speech Technology 6, no. 4 (2003),pp- 365-377.
  41. Black Alan. Paul Taylor, Richard Caley, and Rob Clark. "The festival speech synthesis system." University of Edinburgh 1 (2002).
  42. Black. Alan. and Kevin A. Lenzo. "Flite: a small fast run-time synthesis engine." 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis. 2001.
  43. Black Alan and Paul Taylor. "CHATR: a generic speech synthesis system." Proceedings of the 15th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1994.
  44. Spector, A. Z. 1989. Achieving application requirements. In Distributed Systems, S. Mullender
Index Terms

Computer Science
Information Sciences

Keywords

TTS PSOLA TD-PSOLA FD-PSOLA ESNOLA MOS SUS DRT HMM.