Automatic Multi-Label Stuttering Detection from Speech using Attention-Enhanced Deep Neural Networks

Ali Diyaa; Engy Refaai; Alyaa Tamer; Soher Mohamed; Aya Adel Muhammed Hassan; Rana Ehab; Mohamed AbdelFattah

Call for Paper

September Edition

IJCA solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 20 August 2026

Submit your paper

Know more

The week's pick

Quantifying Label-Induced Bias in Large Language Model Self and Cross Evaluations

Muskan Saraf Sajjad Rezvani Boroujeni Justin Beaudry Hossein Abedi Tom Bush

Random Articles

Optimization Algorithm in Traditional Card Game Rummy 21

Jul

2016

Impact of Energy-Efficient and Eco-Friendly Green Computing

Jun

2016

Impact of Question Classification on Accuracy of Question Answering System

Dec

2016

Performance Comparison of various levels of Fusion of Multi-focused Images using Wavelet Transform

February

2010

Reseach Article

Automatic Multi-Label Stuttering Detection from Speech using Attention-Enhanced Deep Neural Networks

by Ali Diyaa, Engy Refaai, Alyaa Tamer, Soher Mohamed, Aya Adel Muhammed Hassan, Rana Ehab, Mohamed AbdelFattah

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 187 - Number 110

Year of Publication: 2026

Authors: Ali Diyaa, Engy Refaai, Alyaa Tamer, Soher Mohamed, Aya Adel Muhammed Hassan, Rana Ehab, Mohamed AbdelFattah

10.5120/ijca28d82db3210b

Ali Diyaa, Engy Refaai, Alyaa Tamer, Soher Mohamed, Aya Adel Muhammed Hassan, Rana Ehab, Mohamed AbdelFattah . Automatic Multi-Label Stuttering Detection from Speech using Attention-Enhanced Deep Neural Networks. International Journal of Computer Applications. 187, 110 ( May 2026), 1-8. DOI=10.5120/ijca28d82db3210b

@article{ 10.5120/ijca28d82db3210b,

author = { Ali Diyaa, Engy Refaai, Alyaa Tamer, Soher Mohamed, Aya Adel Muhammed Hassan, Rana Ehab, Mohamed AbdelFattah },

title = { Automatic Multi-Label Stuttering Detection from Speech using Attention-Enhanced Deep Neural Networks },

journal = { International Journal of Computer Applications },

issue_date = { May 2026 },

volume = { 187 },

number = { 110 },

month = { May },

year = { 2026 },

issn = { 0975-8887 },

pages = { 1-8 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume187/number110/automatic-multi-label-stuttering-detection-from-speech-using-attention-enhanced-deep-neural-networks/ },

doi = { 10.5120/ijca28d82db3210b },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2026-05-30T22:32:55+05:30

%A Ali Diyaa

%A Engy Refaai

%A Alyaa Tamer

%A Soher Mohamed

%A Aya Adel Muhammed Hassan

%A Rana Ehab

%A Mohamed AbdelFattah

%T Automatic Multi-Label Stuttering Detection from Speech using Attention-Enhanced Deep Neural Networks

%J International Journal of Computer Applications

%@ 0975-8887

%V 187

%N 110

%P 1-8

%D 2026

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Speech disorders like stuttering interfere with normal speech patterns. Repeating sounds, syllables, or words; prolonging sounds for an excessive amount of time; becoming trapped in silent blocks where no sound is produced despite the speaker’s best efforts to speak; or employing interjections. The speech muscles don’t work properly, even though the speaker usually knows exactly what they want to say. Stuttering, which affects almost 80 million people globally, can make everyday communication feel challenging and frustrating. If left untreated, it frequently causes problems with social connections and self-confidence. A hybrid deep learning system for automatically identifying stuttering disfluencies in speech recordings is presented in this work. The method combines bidirectional long short-term memory (BiLSTM) layers, an attention mechanism (AM), and convolutional neural networks (CNN) for local acoustic feature extraction. Thirteen Mel-frequency cepstral coefficients (MFCCs) and their first-order delta and secondorder delta-delta derivatives are among the many acoustic features used in the model. Evaluations on benchmark datasets, such as SEP-28K and FluencyBank, reveal F1 scores of 97.3% to 98.9% for important disfluency types and accuracy between 97.0% and 98.2%,these results are comparable to human expert agreement.

References

N. Vasylieva, T. Marieieva, L. Zahorodnia, V. Melikhova, Y. Taniavska, and O. Dzhus, “Examining stuttering in preschool children from the perspective of speech therapy and neurology,” Revista Romaneasca pentru Educatie Multidimensionala, vol. 17, no. 2, pp. 712–731, 2025.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, 2012.
A. Sherstinsky, “Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network,” Physica D: Nonlinear Phenomena, vol. 404, p. 132306, 2020.
M. Wi´sniewski, W. Kuniszyk-J´o´zkowiak, E. Smołka, and W. Suszy´nski, “Improved approach to automatic detection of speech disorders based on the Hidden Markov Models approach,” Journal of Medical Informatics & Technologies, vol. 15, pp. 145–152, 2010.
T. Kourkounakis, A. Hajavi, and A. Etemad, “Detecting multiple speech disfluencies using a deep residual network with bidirectional long short-term memory,” in Proc. ICASSP 2020 - IEEE International Conference on Acoustics, Speech and Signal Processing, 2020, pp. 6089–6093.
S. A. Sheikh, Md. Sahidullah, F. Hirsch, and S. Ouni, “StutterNet: stuttering detection using time delay neural network,” in Proc. 2021 29th European Signal Processing Conference (EUSIPCO), 2021, pp. 426–430.
T. Kourkounakis, A. Hajavi, and A. Etemad, “FluentNet: endto- end detection of stuttered speech disfluencies with deep learning,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 2986–2999, 2021.
M. Jouaiti and K. Dautenhahn, “Dysfluency classification in stuttered speech using deep learning for real-time applications,” in Proc. ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing, 2022, pp. 6482–6486.
K. Basak, N. Mishra, and H. T. Chang, “TranStutter: a convolution-free transformer-based deep learning method to classify stuttered speech using 2D mel-spectrogram visualization and attention-based feature representation,” Sensors, vol. 23, no. 19, p. 8033, 2023.
N. Alhakbani, R. Alnashwan, A. Al-Nafjan, and A. Almudhi, “Automated stuttering detection using deep learning techniques,” Journal of Clinical Medicine, vol. 14, no. 10, p. 3552, 2025.
C. Lea, V. Mitra, A. Joshi, S. Kajarekar, and J. P. Bigham, “SEP-28K: a dataset for stuttering event detection from podcasts with people who stutter,” in Proc. ICASSP 2021 - IEEE International Conference on Acoustics, Speech and Signal Processing, 2021, pp. 6798–6802.
N. B. Ratner and B. MacWhinney, “Fluency Bank: a new resource for fluency research and practice,” Journal of Fluency Disorders, vol. 56, pp. 69–80, 2018.
P. Howell, S. Davis, and J. Bartrip, “The University College London archive of stuttered speech (UCLASS),” Journal of Speech, Language, and Hearing Research, vol. 52, no. 2, pp. 556–569, 2009.
S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 28, no. 4, pp. 357–366, 1980.
J. Li, L. Deng, R. Haeb-Umbach, and Y. Gong, “Fundamentals of speech recognition,” in Robust Automatic Speech Recognition, pp. 9–40, 2016.
A. Graves and J. Schmidhuber, “Framewise phoneme classification with bidirectional LSTM and other neural network architectures,” Neural Networks, vol. 18, no. 5–6, pp. 602–610, 2005.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Information Processing & Management, vol. 45, no. 4, pp. 427–437, 2009.
D. M. W. Powers, “Evaluation: from precision, recall and Fmeasure to ROC, informedness, markedness and correlation,” arXiv preprint arXiv:2010.16061, 2020.

Index Terms

Computer Science

Information Sciences

Keywords

Stuttering Detection Deep Learning CNN BiLSTM Attention Mechanism Speech Disfluency Classification