| International Journal of Computer Applications |
| Foundation of Computer Science (FCS), NY, USA |
| Volume 187 - Number 77 |
| Year of Publication: 2026 |
| Authors: Sayyada Sara Banu, Ratnadeep R. Deshmukh |
10.5120/ijca2026926227
|
Sayyada Sara Banu, Ratnadeep R. Deshmukh . Experimental Analysis of an Interactive MFCC + AHC Speaker Diarization Framework Across Multi-Domain Audio Conditions. International Journal of Computer Applications. 187, 77 ( Jan 2026), 35-43. DOI=10.5120/ijca2026926227
Automatic Speaker Diarization (ASD)—the process of determining “who spoke when”—is essential for transcription, conversational analytics, call-center monitoring, courtroom recordings, and multilingual human–computer interaction. Classical systems based on MFCCs, GMMs, and hierarchical clustering are interpretable but struggle in noisy, overlapping, and diverse audio conditions, while modern deep-learning approaches like x-vectors, ECAPA-TDNN, and Wav2Vec 2.0 offer higher accuracy but lack transparency. This study evaluates a visualization-enhanced MFCC–GMM–AHC diarization framework across AMI, VoxCeleb, CALLHOME, Mozilla Common Voice, and a custom English–Hindi dataset. The system integrates adaptive VAD, MFCC + Δ + Δ² features, GMM modeling, AHC clustering, and Viterbi re-segmentation with rich diagnostic tools. Results show strong segmentation quality and speaker separability, with DER improving from 12.8% (MFCC–GMM) to 4.7% (Wav2Vec 2.0). The framework demonstrates robust, interpretable, and multi-domain performance.