| International Journal of Computer Applications |
| Foundation of Computer Science (FCS), NY, USA |
| Volume 187 - Number 77 |
| Year of Publication: 2026 |
| Authors: Sayyada Sara Banu, Ratnadeep R. Deshmukh |
10.5120/ijca2026926226
|
Sayyada Sara Banu, Ratnadeep R. Deshmukh . An Interactive MFCC-Driven Hierarchical Clustering Framework for Automatic Speaker Diarization with Visual Analytics. International Journal of Computer Applications. 187, 77 ( Jan 2026), 28-34. DOI=10.5120/ijca2026926226
Automatic Speaker Diarization (ASD) is the task of determining “who spoke when” in multi-speaker audio recordings without prior speaker labels. This paper presents a transparent, tunable, and GUI-driven diarization framework that integrates MFCC + Δ + Δ² embeddings, adaptive percentile-based Voice Activity Detection (VAD), and Agglomerative Hierarchical Clustering (AHC) with configurable distance metrics and linkage strategies. The system provides complete control over preprocessing, segmentation, clustering, and post-processing, while offering rich visual analytics including waveform-aligned speaker timelines, spectrograms, MFCC heatmaps, PCA-based embedding scatter plots, Silhouette-driven cluster diagnostics, and conversational metrics. Experimental evaluation shows that the proposed MFCC + AHC pipeline achieves stable speaker grouping with clear cluster separation and reduced fragmentation after post-processing, achieving a diarization error rate between 5.8% and 8.1% on test recordings. The tool supports RTTM/CSV/JSON export and is suitable for research, education, conversational analysis, and domain-specific diarization studies requiring interpretability and flexibility.