Phonotactic Model for Spoken Language Identification in Indian Language Perspective

Sanghamitra Mohanty

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 20 July 2026

Submit your paper

Know more

The week's pick

Quantifying Label-Induced Bias in Large Language Model Self and Cross Evaluations

Muskan Saraf Sajjad Rezvani Boroujeni Justin Beaudry Hossein Abedi Tom Bush

Random Articles

On Chain Folding Problems of Chain Mapper and Chain Reducer Meta Expressions

April

2015

A Supervised Approach to Zero-Shot Learning for Field Classification of Texts: Leveraging File Data for Improved Text Categorization

Sep

2024

Optimized kNN Query Processing using Clustering in Untrusted Cloud Environment

April

2015

Development of an Instrument for Enterprise Resource Planning (ERP) Implementation in Indian Small and Medium Enterprises (SMEs)

July

2012

Reseach Article

Phonotactic Model for Spoken Language Identification in Indian Language Perspective

by Sanghamitra Mohanty

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 19 - Number 9

Year of Publication: 2011

Authors: Sanghamitra Mohanty

10.5120/2389-3164

Sanghamitra Mohanty . Phonotactic Model for Spoken Language Identification in Indian Language Perspective. International Journal of Computer Applications. 19, 9 ( April 2011), 18-24. DOI=10.5120/2389-3164

@article{ 10.5120/2389-3164,

author = { Sanghamitra Mohanty },

title = { Phonotactic Model for Spoken Language Identification in Indian Language Perspective },

journal = { International Journal of Computer Applications },

issue_date = { April 2011 },

volume = { 19 },

number = { 9 },

month = { April },

year = { 2011 },

issn = { 0975-8887 },

pages = { 18-24 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume19/number9/2389-3164/ },

doi = { 10.5120/2389-3164 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:06:32.285349+05:30

%A Sanghamitra Mohanty

%T Phonotactic Model for Spoken Language Identification in Indian Language Perspective

%J International Journal of Computer Applications

%@ 0975-8887

%V 19

%N 9

%P 18-24

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Indian Languages are Indo-Aryan being influenced by Sanskrit or Dravidian being influenced by Tamil. Dravidian Languages have the influence of Sanskrit also. All Indian Languages have the influence of Pali language for which the graphemes are being influenced Brahmi. All the Indian languages are phonetic in nature. Every Indian language has its distinctive phone sets. North Indian languages are Indo- Aryan and South Indian Languages are Dravidian. Considering their respective Phonetic properties during speaking we have tried to consider the special CV behaviour of the language in their syllables and are able to identify the Language analysing it with the limited training data set available using the SVM Classifier. During this process we have analysed the PPR Language Modelling concept for four major Indian languages like Hindi, Bengali, Oriya, and Telugu and the results are quite appreciable.

References

X. Huang, et al, “Spoken Language Processing”, Prentice Hall PTR, NJ, 2001.
Jelinek. F, “Statistical Methods for Speech Recognition”, MIT Press, Cambridge, 1997.
Rabiner, L.R, Schafer, R.W, “Digital Processing of Speech Signals”, Pearson education, 1st Edition, 2004.
O’Shaughnessy, D, “Speech Communications Human and Machine”, Universities Press, 2nd Edition, 2001.
Mohanty, S. and Swain , B. K. “Language Identification using Support Vector Machine”, Proceedings of O-COCOSDA-2010, Nepal, 2010.
Mohanty, S., Bhattacharya, S., Bose, S., Swain, S., “An Approach To Parametric based Mood Analysis In Oriya Speech Processing” ,Proceedings of the International Symposium Frontiers of Research on Speech and Music(FRSM-2005).
M.A. Zissman, ”Comparison of Four Approaches to Automatic Language Identification of Telephone speech, IEEE Transactions on Speech and Audio Processing”,1996.
Navratil. J, ”Spoken Language Recognition - A Step Toward Multilinguality in Speech Processing”, IEEE Transactions on Speech and Ausio Processing, Sept. 2001.
Muthusamy, Y.K, et al, ”Reviewing Automatic Language Identification”, IEEE Signal Processing Magazine, 1994.
Schultz.T, et al, ”Language Independent and Language Adaptive Large Vocabulary Speech Recognition”, Proc. EuroSpeech, 1999, Hungary.
Schultz, T and Kirchhoff, K “Multilingual Speech Processing”, Academic Press, 2006.
Mak. B, et al, “Multilingual Speech Recognition with Language Identification”, Proc. ICSLP 2002.
Ken Stevens, “Acoustic Phonetics”, MIT Press, Cambridge, MA, 1999.
V. Vapnik. “The Nature of Statistical Learning Theory”. Springer-Verlag,1995.
R. Duda, P. Hart, and D.Stork, “Pattern Classification”, Wiley, New York, 2001.
N. Smith, M. Niranjan, “Data-dependent kernels in SVM classification of speech patterns”, in: Proceedings of the International Conference on Spoken Language Processing (ICSLP), Vol. 1, Beijing, China, 2000.
William M. Campbell, Joseph P. Campbell, Douglas A. Reynolds, E. Singer, and P. A. Torres-Carrasquillo, “Support vector machines for speaker and language recognition” Computer Speech and Language, vol. 20, no. 2-3, 2006.
OSU-SVM website: http://svm.sourceforge.net/license.shtml
Praat software website: http://www.fon.hum.uva.nl/praat/.
A. Montero-Asenjo, D.T. Toledano, J. Gonzalez- Dominguez, J. Gonzalez-Rodriguez, and J. Ortega- Garcia, “Exploring PPRLM performance for NIST 2005 language recognition evaluation,” in IEEE Odyssey 2006:The Speaker and Language Recognition Workshop, 2006.
Keshet,J., Bengio, S. “Automatic Speech and Speaker Recognition Large Margin and Kernel Methods”, John Wiley and Sons, Ltd, Publication,1st edition, 2009.
Pavel Matejka, Petr Schwarz, Jan Cernock, and Pavel Chytil, “Phonotactic language identification using high quality phoneme recognition,” in Interspeech, 2005.

Index Terms

Computer Science

Information Sciences

Keywords

LID Indian Language Support Vector Machine Phonotactic