Audio Visual Arabic Speech Recognition using KNN Model by Testing different Audio Features

Esra J. Harfash; Diyar H. Shakir

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Audio Visual Arabic Speech Recognition using KNN Model by Testing different Audio Features

by Esra J. Harfash, Diyar H. Shakir

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 180 - Number 1

Year of Publication: 2017

Authors: Esra J. Harfash, Diyar H. Shakir

10.5120/ijca2017915901

Esra J. Harfash, Diyar H. Shakir . Audio Visual Arabic Speech Recognition using KNN Model by Testing different Audio Features. International Journal of Computer Applications. 180, 1 ( Dec 2017), 33-38. DOI=10.5120/ijca2017915901

@article{ 10.5120/ijca2017915901,

author = { Esra J. Harfash, Diyar H. Shakir },

title = { Audio Visual Arabic Speech Recognition using KNN Model by Testing different Audio Features },

journal = { International Journal of Computer Applications },

issue_date = { Dec 2017 },

volume = { 180 },

number = { 1 },

month = { Dec },

year = { 2017 },

issn = { 0975-8887 },

pages = { 33-38 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume180/number1/28766-2017915901/ },

doi = { 10.5120/ijca2017915901 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:59:27.192334+05:30

%A Esra J. Harfash

%A Diyar H. Shakir

%T Audio Visual Arabic Speech Recognition using KNN Model by Testing different Audio Features

%J International Journal of Computer Applications

%@ 0975-8887

%V 180

%N 1

%P 33-38

%D 2017

%I Foundation of Computer Science (FCS), NY, USA

Abstract

The most important challenges in AVSR and the focus of most research are the features that are extracted, and when combined give better results. The other challenge is the resulted feature here of nature are large in size, then prefers here to reduce the features by use of an appropriate way to reduce these data with ensure have their properties after downsizing. The System that is presented in this research is for recognition a group of Arabic words voices, from one to ten words. In the acoustic parts the features were extracted of coefficients MFCC, LPC,FFT to be determine which type of these features is efficient in AVSR .All these types of feature are showed efficient results but MFCC is the best. The visual features are calculated of DCT matrix, and the features are extracted by applying the zigzag scan. In the reduction features stage, several methods of data reducing have been implemented; they are LDA, PCA and SVD. Each method are applied to the data separately. The KNN models are used in the stage of recognition, where the testing is implemented on dependent and independent database of words from one to ten. The final results that obtained are efficient and encouraging.

References

Vorwerk A., Wang X., Kolossa D., Zeiler S., and Orglmeister R., "WAPUSK20 – A database for robust audiovisual speech recognition", Chair of Electronics and Medical Signal Processing , EMSP, University of Berlin, Einsteinufer 17, 10587 Berlin, 2011.
Potamianos G., Neti C., Luettin J., and Matthews I., "Audio-visual automatic speech recognition: an overview". Issues in audio-visual speech processing. MIT Press, 2004.
Lucey S., Chen T., Sirdharan S., and Chardran V.," Integration Strategies for Audio-visual Speech Processing: Applied to Text Dependent Speaker Recognition", Queensland University of Technology, Australia, 2004.
Pao T.L., and Liao W.Y., "AVSR for Testing AV Database", Department of Computer Science and Engineering, University of Tatung, Taipei, Taiwan, R.O.C, 2006.
Kratt J., Metze F., Stiefelhagen R., and Waibel A.," Large Vocabulary Audio-Visual Speech Recognition Using the Janus Speech Recognition Toolkit", Interactive Systems Laboratories University of Karlsruhe , Germany, 2004.
Potamianos G., Neti C., and Deligne S., " Joint Audio Visual Speech Processing for Recognition and Enhancement". Proceedings of AVSP, 2003.
Goecke R., and Potamianos G., " Neti. Noisy Audio Feature Enhancement using Audio-Visual Speech Data". ICASSP 02, 2002.
Bord P., Varp A., Manz R., and Yannawar P., "Recognition of Isolated Words using Zernike and MFCC features for AVSR", Department of Science and Technology (DST), India, 2011.
Gagnon L., S., Foucher F. L., and Boulianne G., "A simplified audiovisual fusion model with application to large-vocabulary recognition of French Canadian speech", CAN.J.ELECT. COMPUT. ENG., VOL. 33, NO. 2, SPRING 2008.
Galatas G, Potamianos G., and Makedon F., "AVSR Incorporating Facial Depth Information Captured by the Kinect", 20th European Signal Processing Conference EUSIPCO, Bucharest, Romania, August 2012.
Silber-Varod V, and Geri N., "Can ASR be Satisficing for Audio/Visual Search? Keyword-Focused Analysis of Hebrew Automatic and Manual Transcription",Online Journal of Applied Knowledge Management, Vol. 2, Issue 1, 2014.
Potamiano G., and Neti Ch., "AVSR In Challenging Environment", Processing of the European Conference on Speech Communication and Technology (EUROSPEECH), PP. 1293-1296, Geneva, Switzerland, sept. 2003.
Reikeras H., Engelbrecht H., Herbst B., and Preez J.D., "AVSR using SciPy", University of Stellenbosch, http://www.SciPy.org/, 2008.

Index Terms

Computer Science

Information Sciences

Keywords

Audio-Video Speech Processing Automatic Speech recognition Mouth detection Discrete cosine transformation Visual Features