CFP last date
22 April 2024
Reseach Article

Article:Hybrid Feature and Decision Fusion Based Audio-Visual Speaker Identification in Challenging Environment

by Md. Rabiul Islam, Md. Fayzur Rahman
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 9 - Number 5
Year of Publication: 2010
Authors: Md. Rabiul Islam, Md. Fayzur Rahman
10.5120/1384-1864

Md. Rabiul Islam, Md. Fayzur Rahman . Article:Hybrid Feature and Decision Fusion Based Audio-Visual Speaker Identification in Challenging Environment. International Journal of Computer Applications. 9, 5 ( November 2010), 9-15. DOI=10.5120/1384-1864

@article{ 10.5120/1384-1864,
author = { Md. Rabiul Islam, Md. Fayzur Rahman },
title = { Article:Hybrid Feature and Decision Fusion Based Audio-Visual Speaker Identification in Challenging Environment },
journal = { International Journal of Computer Applications },
issue_date = { November 2010 },
volume = { 9 },
number = { 5 },
month = { November },
year = { 2010 },
issn = { 0975-8887 },
pages = { 9-15 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume9/number5/1384-1864/ },
doi = { 10.5120/1384-1864 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T19:57:50.169685+05:30
%A Md. Rabiul Islam
%A Md. Fayzur Rahman
%T Article:Hybrid Feature and Decision Fusion Based Audio-Visual Speaker Identification in Challenging Environment
%J International Journal of Computer Applications
%@ 0975-8887
%V 9
%N 5
%P 9-15
%D 2010
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The contribution of this paper is to propose a novel approach of evaluating the performance of a noise robust audio-visual speaker identification system in challenging environment. Though the traditional HMM based audio-visual speaker identification system is very sensitive to the speech parameter variation, the proposed hybrid feature and decision fusion based audio-visual speaker identification is found to be stance and performs well for improving the robustness and naturalness of human-computer-interaction. Linear Prediction Cepstral Coefficients and Mel Frequency Cepstral Coefficients are used to extract the audio features and Active Appearance Model and Active Shape Model have been used to extract the appearance and shape based features for the facial image. Principal Component Analysis method has been used to reduce the dimensionality of large feature vector and to normalize, the vector normalization algorithm has been used. Features and decision both are fused in two different levels and finally four different classifier outputs are combined in parallel fashion to achieve the identification result. The performances of all these uni-modal and multi-modal system performance have been evaluated and compared with each other on VALID audio-visual multi-modal database, containing both vocal and visual biometric modalities.

References
Index Terms

Computer Science
Information Sciences

Keywords

Hybrid Feature and Decision Fusion Audio-Visual Speaker Identification Cepstral Base Audio Features Appearance and Shape Based Facial Features Likelihood Ratio Based Score Fusion Discrete Hidden Markov Model