Visual Lip Reading using 3D-DCT and 3D-DWT and LSDA

Sunil S. Morade; Suprava Patnaik

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Visual Lip Reading using 3D-DCT and 3D-DWT and LSDA

by Sunil S. Morade, Suprava Patnaik

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 136 - Number 4

Year of Publication: 2016

Authors: Sunil S. Morade, Suprava Patnaik

10.5120/ijca2016908308

Sunil S. Morade, Suprava Patnaik . Visual Lip Reading using 3D-DCT and 3D-DWT and LSDA. International Journal of Computer Applications. 136, 4 ( February 2016), 7-15. DOI=10.5120/ijca2016908308

@article{ 10.5120/ijca2016908308,

author = { Sunil S. Morade, Suprava Patnaik },

title = { Visual Lip Reading using 3D-DCT and 3D-DWT and LSDA },

journal = { International Journal of Computer Applications },

issue_date = { February 2016 },

volume = { 136 },

number = { 4 },

month = { February },

year = { 2016 },

issn = { 0975-8887 },

pages = { 7-15 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume136/number4/24139-2016908308/ },

doi = { 10.5120/ijca2016908308 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:36:06.430964+05:30

%A Sunil S. Morade

%A Suprava Patnaik

%T Visual Lip Reading using 3D-DCT and 3D-DWT and LSDA

%J International Journal of Computer Applications

%@ 0975-8887

%V 136

%N 4

%P 7-15

%D 2016

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Human uses visual information while trying to understand speech, especially in noisy conditions or in situations where the audio signal is not available. Lip reading is the technique of a comprehensive understanding the underlying speech by processing on the movement of lips. However, the recognition of lip motion is a difficult task since the region of interest (ROI) is nonlinear and noisy. In proposed method lip reading system we have used two stage feature extraction model which is precised, discriminative and computation efficient. The first stage 3D Discrete Wavelet Transform (3D-DWT) or 3D Discrete Cosine Transform (3D-DCT) is used and the second stage is Locality Sensitive Discriminant Analysis (LSDA) to trim down the feature dimensions. These features make a novel lip reading system with small feature vector size. In addition to the novel feature extraction technique, the performance of Naive Bayes and SVM classifier is compared. CUAVE database of 0 to 9 utterances in English is used for experimentation. Results of 3 dimension transform with LSDA are compared with 2 dimension transform with LSDA. Experimental results show that 3D-DWT+LSDA feature mining are compared with 3D-DWT with PCA or LDA. 3D-DWT+LSDA result is also compared with 3D-DCT + LSDA.

References

E. D. Petajan, Automatic lip-reading to enhance speech recognition, Ph.D. Thesis University of Illinois, 1984.
I. Matthews, G. Potamianos, C. Neti and J. Luettin, “A comparison of model and transform-based visual features for audio-visual LVCSR”, IEEE International Conference on Multimedia and Expo, 825–828, 2001.
C. Bergler and Y. Konig, ““Eigenlips” For robust speech recognition,” in Proc. IEEE Int. Conference on Acustics , Speech and signal processing, 1994.
G. Potamianos, H. Graf, and E. Cosatto, “An image transform approach for HMM based automatic lip reading,” International Conference on Image Processing, 173–177, 1998.
G. Potamianos, C. Neti, J. Huang, J. H. Connell, S. Chu, V. Libal, E. Marcheret, N. Haas, J. Jiang, “Towards practical deployment of audio-visual speech recognition”, ICASSP-2004.
R. Seymour, D. Stewart, and Ji Ming, “Comparison of image transform-based features for visual speech recognition in clean and corrupted videos,” EURASIP Journal on Video Processing, Vol. 2008, 1-9, 2008.
X. Wang, Y. Hao, D. Fu, and C. Yuan, “ROI processing for visual features extraction in lip-reading”, IEEE Int. Conference Neural Networks & Signal Processing, 178-181, 2008.
N. Puviarasan, S. Palanivel, Lip reading of hearing impaired persons using HMM, Elsevier Journal on Expert Systems with Applications, 1-5, 2010.
A. Shaikh and J. Gubbi, “Lip reading using optical flow and support vector machines”, CISP 2010, 327-310 (2010).
G. F. Meyor, J. B. Mulligan and S. M. Wuerger, “Continuous audio-visual using N test decision fusion”, Elsevier Journal on Information Fusion, 91-100 (2004).
L. Rothkrantz, J. Wojdel, and P. Wiggers, “Comparison between different feature extraction techniques in lipreading applications”, SPECOM- 2006, 25-29 (2006).
M. Heckmann, K. Kroschel, C. Savariaux, and F. Berthommier, “DCT-based video features for audio-visual speech recognition,” 7th International Conference on Spoken Language Processing, 1925–1928, 2002.
P. Viola, M. Jones, “Rapid Object Detection using a Boosted Cascade of Simple features”, IEEE Int. Conference, 511-517, 2001.
S. Morade and S. Patnaik, “Lip reading by using 3-D Discrete Wavelet Transform with Dmey wavelet” , IJIP, Vol 8, 385-396, 2014.
M. C. Weeks “Architectures For The 3-D Discrete Wavelet Transform” , Ph.D. Thesis University of Southwestern Louisiana ,1998.
Y. Fan, S. Chen, K. Wu, and J. You, “3D-DCT Chip Design for 3D Multi-view Video Compression”, Appl. Math. Inf. Sci. 6 No., 2S, pp. 567S-572S, 2012.
K. Min and M. Fac, “A lip reading method based3D DCT and 3-D HMM” ,IEEE conf. ICIOE, 115-119, 2012.
L.Yaling, Y. Wenjuan, D. Minghui, “Feature Extraction Based on LSDA for lipreading”, Proceedings of IEEE International conference, 2010.
H. Jun, Z. Hua, l. Jizhong, “LDA based feature extraction method in DCT domain in lipredaing”, computer engineering and application, 45(32), 150-152, 2009.
Deng Cai, X. He, K. Zhou, J.Han, H. Bao, “Loaclity discriminant analysis”, International joint conference on artificial Intelligence Hydrabad Morgankauffimann Publishers, 2007.
V. Kechman, “Learning and soft computing, support vector machines, Neural Networks and Fuzzy logic models”, MIT Press Cambridge,1-58, 2001
J. C. Platt, “Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines”, Microsoft research reports, 1-21,1998.
E. Osuna, R.Freund and F.Girosi, An Improved Training Algorithm for Support Vector Machines, , Neural networks for signal processing , Proc. of IEEE 1997, 276-285, 1997.
T. M. Mitchell,” Generative and Discriminative Classifiers: Naive Bayes and Logistic Regression”,1-15, 2010.
E. Patterson, S. Gurbuz, Z. Tufekci, and J. Gowdy, “CUAVE: a new audio-visual database for multimodal human computer- interface research”, Proceedings of IEEE International conference on Acoustics, speech and Signal Processing, 2017-2020, 2002.

Index Terms

Computer Science

Information Sciences

Keywords

LSDA LDA 3D-DWT 3D-DCT SVM Naive Bayes Lip reading.