Call for Paper - November 2023 Edition
IJCA solicits original research papers for the November 2023 Edition. Last date of manuscript submission is October 20, 2023. Read More

Incorporating Dialectal Features in Synthesized Speech using Voice Conversion Techniques

Print
PDF
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2018
Authors:
Nath Sanghamitra, Sharma Utpal
10.5120/ijca2018916443

Nath Sanghamitra and Sharma Utpal. Incorporating Dialectal Features in Synthesized Speech using Voice Conversion Techniques. International Journal of Computer Applications 180(19):1-8, February 2018. BibTeX

@article{10.5120/ijca2018916443,
	author = {Nath Sanghamitra and Sharma Utpal},
	title = {Incorporating Dialectal Features in Synthesized Speech using Voice Conversion Techniques},
	journal = {International Journal of Computer Applications},
	issue_date = {February 2018},
	volume = {180},
	number = {19},
	month = {Feb},
	year = {2018},
	issn = {0975-8887},
	pages = {1-8},
	numpages = {8},
	url = {http://www.ijcaonline.org/archives/volume180/number19/29037-2018916443},
	doi = {10.5120/ijca2018916443},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

The paper explores to what extent Voice Conversion techniques can help incorporate dialect specific features into synthesized speech. A popular Voice Conversion technique using Gaussian Mixture Models, has been used to develop mapping functions, between speech synthesized by a Text-to-Speech System for the standard form of the language to parallel speech recorded from a speaker of the target dialect. Mel Cepstral Coefficients are used to represent the spectral envelope and pitch, intensity and duration values have been selected to represent the prosody of speech.

References

  1. Dang Cong Zheng. Accent conversion via formant-based spectral mapping and pitch contour modification. 2011.
  2. Elisabeth Zetterholm. Same speaker–different voices. a study of one impersonator and some of his different imitations. In Proceedings of the 11th Australian International Conference on Speech Science & Technology, pages 70–75, 2006.
  3. Yannis Stylianou, Olivier Capp´e, and Eric Moulines. Continuous probabilistic transform for voice conversion. Speech and Audio Processing, IEEE Transactions on, 6(2):131–142, 1998.
  4. Srinivas Desai, Alan W Black, B Yegnanarayana, and Kishore Prahallad. Spectral mapping using artificial neural networks for voice conversion. Audio, Speech, and Language Processing, IEEE Transactions on, 18(5):954–964, 2010.
  5. Oytun T¨urk and Marc Schr¨oder. A comparison of voice conversion methods for transforming voice quality in emotional speech synthesis. In INTERSPEECH, pages 2282–2285, 2008.
  6. Ronanki Srikanth, B Bajibabu, and Kishore Prahallad. Duration modelling in voice conversion using artificial neural networks. In Systems, Signals and Image Processing (IWSSIP), 2012 19th International Conference on, pages 556–559. IEEE, 2012.
  7. Krothapalli S Rao, Shashidhar G Koolagudi, et al. Selection of suitable features for modeling the durations of syllables. Journal of Software Engineering and Applications, 3(12):1107, 2010.
  8. Zeynep Inanoglu. Transforming pitch in a voice conversion framework. St. Edmonds College, University of Cambridge, Tech. Rep, 2003.
  9. Bajibabu Bollepalli, Jonas Beskow, and Joakim Gustafson. Non-linear pitch modification in voice conversion using artificial neural networks. In Advances in Nonlinear Speech Processing, pages 97–103. Springer, 2013.
  10. V Ramu Reddy and K Sreenivasa Rao. Intensity modeling for syllable based text-to-speech synthesis. In Contemporary Computing, pages 106–117. Springer, 2012.
  11. Gunnar Fant. Acoustic theory of speech production: with calculations based on X-ray studies of Russian articulations, volume 2. Walter de Gruyter, 1971.
  12. Masanobu Abe, Satoshi Nakamura, Kiyohiro Shikano, and Hisao Kuwabara. Voice conversion through vector quantization. Journal of the Acoustical Society of Japan (E), 11(2):71–76, 1990.
  13. M Narendranath, Hema A Murthy, S Rajendran, and B Yegnanarayana. Voice conversion using artificial neural networks. In Automatic Speaker Recognition, Identification and Verification, 1994.
  14. Alexander Kain and Michael W Macon. Spectral voice conversion for text-to-speech synthesis. In Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, volume 1, pages 285–288. IEEE, 1998.
  15. Tomoki Toda, Alan W Black, and Keiichi Tokuda. Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. Audio, Speech, and Language Processing, IEEE Transactions on, 15(8):2222–2235, 2007.

Keywords

Voice Conversion, Gaussian mixture models, Mel Cepstral Coefficients, Formants, F0, Assamese, Nalbaria, Dialect, Pitch, Intensity, Duration, Text-to-Speech System