CFP last date
21 October 2024
Reseach Article

Incorporating Dialectal Features in Synthesized Speech using Voice Conversion Techniques

by Nath Sanghamitra, Sharma Utpal
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 180 - Number 19
Year of Publication: 2018
Authors: Nath Sanghamitra, Sharma Utpal
10.5120/ijca2018916443

Nath Sanghamitra, Sharma Utpal . Incorporating Dialectal Features in Synthesized Speech using Voice Conversion Techniques. International Journal of Computer Applications. 180, 19 ( Feb 2018), 1-8. DOI=10.5120/ijca2018916443

@article{ 10.5120/ijca2018916443,
author = { Nath Sanghamitra, Sharma Utpal },
title = { Incorporating Dialectal Features in Synthesized Speech using Voice Conversion Techniques },
journal = { International Journal of Computer Applications },
issue_date = { Feb 2018 },
volume = { 180 },
number = { 19 },
month = { Feb },
year = { 2018 },
issn = { 0975-8887 },
pages = { 1-8 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume180/number19/29037-2018916443/ },
doi = { 10.5120/ijca2018916443 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:01:04.752873+05:30
%A Nath Sanghamitra
%A Sharma Utpal
%T Incorporating Dialectal Features in Synthesized Speech using Voice Conversion Techniques
%J International Journal of Computer Applications
%@ 0975-8887
%V 180
%N 19
%P 1-8
%D 2018
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The paper explores to what extent Voice Conversion techniques can help incorporate dialect specific features into synthesized speech. A popular Voice Conversion technique using Gaussian Mixture Models, has been used to develop mapping functions, between speech synthesized by a Text-to-Speech System for the standard form of the language to parallel speech recorded from a speaker of the target dialect. Mel Cepstral Coefficients are used to represent the spectral envelope and pitch, intensity and duration values have been selected to represent the prosody of speech.

References
  1. Dang Cong Zheng. Accent conversion via formant-based spectral mapping and pitch contour modification. 2011.
  2. Elisabeth Zetterholm. Same speaker–different voices. a study of one impersonator and some of his different imitations. In Proceedings of the 11th Australian International Conference on Speech Science & Technology, pages 70–75, 2006.
  3. Yannis Stylianou, Olivier Capp´e, and Eric Moulines. Continuous probabilistic transform for voice conversion. Speech and Audio Processing, IEEE Transactions on, 6(2):131–142, 1998.
  4. Srinivas Desai, Alan W Black, B Yegnanarayana, and Kishore Prahallad. Spectral mapping using artificial neural networks for voice conversion. Audio, Speech, and Language Processing, IEEE Transactions on, 18(5):954–964, 2010.
  5. Oytun T¨urk and Marc Schr¨oder. A comparison of voice conversion methods for transforming voice quality in emotional speech synthesis. In INTERSPEECH, pages 2282–2285, 2008.
  6. Ronanki Srikanth, B Bajibabu, and Kishore Prahallad. Duration modelling in voice conversion using artificial neural networks. In Systems, Signals and Image Processing (IWSSIP), 2012 19th International Conference on, pages 556–559. IEEE, 2012.
  7. Krothapalli S Rao, Shashidhar G Koolagudi, et al. Selection of suitable features for modeling the durations of syllables. Journal of Software Engineering and Applications, 3(12):1107, 2010.
  8. Zeynep Inanoglu. Transforming pitch in a voice conversion framework. St. Edmonds College, University of Cambridge, Tech. Rep, 2003.
  9. Bajibabu Bollepalli, Jonas Beskow, and Joakim Gustafson. Non-linear pitch modification in voice conversion using artificial neural networks. In Advances in Nonlinear Speech Processing, pages 97–103. Springer, 2013.
  10. V Ramu Reddy and K Sreenivasa Rao. Intensity modeling for syllable based text-to-speech synthesis. In Contemporary Computing, pages 106–117. Springer, 2012.
  11. Gunnar Fant. Acoustic theory of speech production: with calculations based on X-ray studies of Russian articulations, volume 2. Walter de Gruyter, 1971.
  12. Masanobu Abe, Satoshi Nakamura, Kiyohiro Shikano, and Hisao Kuwabara. Voice conversion through vector quantization. Journal of the Acoustical Society of Japan (E), 11(2):71–76, 1990.
  13. M Narendranath, Hema A Murthy, S Rajendran, and B Yegnanarayana. Voice conversion using artificial neural networks. In Automatic Speaker Recognition, Identification and Verification, 1994.
  14. Alexander Kain and Michael W Macon. Spectral voice conversion for text-to-speech synthesis. In Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, volume 1, pages 285–288. IEEE, 1998.
  15. Tomoki Toda, Alan W Black, and Keiichi Tokuda. Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. Audio, Speech, and Language Processing, IEEE Transactions on, 15(8):2222–2235, 2007.
Index Terms

Computer Science
Information Sciences

Keywords

Voice Conversion Gaussian mixture models Mel Cepstral Coefficients Formants F0 Assamese Nalbaria Dialect Pitch Intensity Duration Text-to-Speech System