Incorporating Dialectal Features in Synthesized Speech using Voice Conversion Techniques

Nath Sanghamitra; Sharma Utpal

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Optimal Assistive Drive System using Mobile Cloud Computing

Mar

2019

Low Leakage Multi Threshold Level Shifter Design using Sleepy Keeper

June

2013

Service based Model using Context Awareness for Ubiquitous Computing

July

2014

Optimum Performance Bounds of Routing Protocols for VANET through Realistic Fading Channel

July

2015

Reseach Article

Incorporating Dialectal Features in Synthesized Speech using Voice Conversion Techniques

by Nath Sanghamitra, Sharma Utpal

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 180 - Number 19

Year of Publication: 2018

Authors: Nath Sanghamitra, Sharma Utpal

10.5120/ijca2018916443

Nath Sanghamitra, Sharma Utpal . Incorporating Dialectal Features in Synthesized Speech using Voice Conversion Techniques. International Journal of Computer Applications. 180, 19 ( Feb 2018), 1-8. DOI=10.5120/ijca2018916443

@article{ 10.5120/ijca2018916443,

author = { Nath Sanghamitra, Sharma Utpal },

title = { Incorporating Dialectal Features in Synthesized Speech using Voice Conversion Techniques },

journal = { International Journal of Computer Applications },

issue_date = { Feb 2018 },

volume = { 180 },

number = { 19 },

month = { Feb },

year = { 2018 },

issn = { 0975-8887 },

pages = { 1-8 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume180/number19/29037-2018916443/ },

doi = { 10.5120/ijca2018916443 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T01:01:04.752873+05:30

%A Nath Sanghamitra

%A Sharma Utpal

%T Incorporating Dialectal Features in Synthesized Speech using Voice Conversion Techniques

%J International Journal of Computer Applications

%@ 0975-8887

%V 180

%N 19

%P 1-8

%D 2018

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The paper explores to what extent Voice Conversion techniques can help incorporate dialect specific features into synthesized speech. A popular Voice Conversion technique using Gaussian Mixture Models, has been used to develop mapping functions, between speech synthesized by a Text-to-Speech System for the standard form of the language to parallel speech recorded from a speaker of the target dialect. Mel Cepstral Coefficients are used to represent the spectral envelope and pitch, intensity and duration values have been selected to represent the prosody of speech.

References

Dang Cong Zheng. Accent conversion via formant-based spectral mapping and pitch contour modification. 2011.
Elisabeth Zetterholm. Same speaker–different voices. a study of one impersonator and some of his different imitations. In Proceedings of the 11th Australian International Conference on Speech Science & Technology, pages 70–75, 2006.
Yannis Stylianou, Olivier Capp´e, and Eric Moulines. Continuous probabilistic transform for voice conversion. Speech and Audio Processing, IEEE Transactions on, 6(2):131–142, 1998.
Srinivas Desai, Alan W Black, B Yegnanarayana, and Kishore Prahallad. Spectral mapping using artificial neural networks for voice conversion. Audio, Speech, and Language Processing, IEEE Transactions on, 18(5):954–964, 2010.
Oytun T¨urk and Marc Schr¨oder. A comparison of voice conversion methods for transforming voice quality in emotional speech synthesis. In INTERSPEECH, pages 2282–2285, 2008.
Ronanki Srikanth, B Bajibabu, and Kishore Prahallad. Duration modelling in voice conversion using artificial neural networks. In Systems, Signals and Image Processing (IWSSIP), 2012 19th International Conference on, pages 556–559. IEEE, 2012.
Krothapalli S Rao, Shashidhar G Koolagudi, et al. Selection of suitable features for modeling the durations of syllables. Journal of Software Engineering and Applications, 3(12):1107, 2010.
Zeynep Inanoglu. Transforming pitch in a voice conversion framework. St. Edmonds College, University of Cambridge, Tech. Rep, 2003.
Bajibabu Bollepalli, Jonas Beskow, and Joakim Gustafson. Non-linear pitch modification in voice conversion using artificial neural networks. In Advances in Nonlinear Speech Processing, pages 97–103. Springer, 2013.
V Ramu Reddy and K Sreenivasa Rao. Intensity modeling for syllable based text-to-speech synthesis. In Contemporary Computing, pages 106–117. Springer, 2012.
Gunnar Fant. Acoustic theory of speech production: with calculations based on X-ray studies of Russian articulations, volume 2. Walter de Gruyter, 1971.
Masanobu Abe, Satoshi Nakamura, Kiyohiro Shikano, and Hisao Kuwabara. Voice conversion through vector quantization. Journal of the Acoustical Society of Japan (E), 11(2):71–76, 1990.
M Narendranath, Hema A Murthy, S Rajendran, and B Yegnanarayana. Voice conversion using artificial neural networks. In Automatic Speaker Recognition, Identification and Verification, 1994.
Alexander Kain and Michael W Macon. Spectral voice conversion for text-to-speech synthesis. In Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, volume 1, pages 285–288. IEEE, 1998.
Tomoki Toda, Alan W Black, and Keiichi Tokuda. Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. Audio, Speech, and Language Processing, IEEE Transactions on, 15(8):2222–2235, 2007.

Index Terms

Computer Science

Information Sciences

Keywords

Voice Conversion Gaussian mixture models Mel Cepstral Coefficients Formants F0 Assamese Nalbaria Dialect Pitch Intensity Duration Text-to-Speech System