A Revisit to Speech Processing and Analysis

Aniruddha Mohanty; Ravindranath C. Cherukuri

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

A Revisit to Speech Processing and Analysis

by Aniruddha Mohanty, Ravindranath C. Cherukuri

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 175 - Number 30

Year of Publication: 2020

Authors: Aniruddha Mohanty, Ravindranath C. Cherukuri

10.5120/ijca2020920840

Aniruddha Mohanty, Ravindranath C. Cherukuri . A Revisit to Speech Processing and Analysis. International Journal of Computer Applications. 175, 30 ( Nov 2020), 1-6. DOI=10.5120/ijca2020920840

@article{ 10.5120/ijca2020920840,

author = { Aniruddha Mohanty, Ravindranath C. Cherukuri },

title = { A Revisit to Speech Processing and Analysis },

journal = { International Journal of Computer Applications },

issue_date = { Nov 2020 },

volume = { 175 },

number = { 30 },

month = { Nov },

year = { 2020 },

issn = { 0975-8887 },

pages = { 1-6 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume175/number30/31638-2020920840/ },

doi = { 10.5120/ijca2020920840 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T00:39:50.324296+05:30

%A Aniruddha Mohanty

%A Ravindranath C. Cherukuri

%T A Revisit to Speech Processing and Analysis

%J International Journal of Computer Applications

%@ 0975-8887

%V 175

%N 30

%P 1-6

%D 2020

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Speech recognition is an active area in signal processing.Various researchers have been invested different concepts in speech recognition system as part of feature extraction techniques, speech classifiers, statistical analysis, encompassing mathematical models, signal processing and transformations, database and performance evaluation. In the current era, multi speaker analysis is the newly focused area in speech processing and analysis. It includes audio segmentation, extraction of relevant features, classification of features, template generation and training. Also, other techniques like Bank-of-filters, Linear Predictive Coding Model, Vector Quantization, Hidden Markov Model and Gaussian Mixture Model to get better result. In this paper, various approaches have been analyzed based on acoustic and articular features focusing on Human Auditory System (HAS). Even focusing on the cross functional approach by using machine learning, artificial intelligence-based techniques and neural networks.

References

Praphulla A Sawakare, Ratndeep R Deshmukh, and Pukhraj P Shrishrimal. Speech recognition techniques: A review. International Journal of Scientific & Engineering Research, 6(8):1693–1698, 2015.
MA Anusuya and Shriniwas K Katti. Speech recognition by machine, a review. arXiv preprint arXiv:1001.2267, 2010.
UG Patil, SD Shirbahadurkar, and AN Paithane. Automatic speech recognition of isolated words in hindi language using mfcc. In 2016 International Conference on Computing, Analytics and Security Trends (CAST), pages 433–438. IEEE, 2016.
Shobha Bhatt, Amita Dev, and Anurag Jain. Confusion analysis in phoneme based speech recognition in hindi. Journal of Ambient Intelligence and Humanized Computing, pages 1–26, 2020.
PS Praveen Kumar and HS Jayanna. Performance analysis of hybrid automatic continuous speech recognition framework for kannada dialect. In 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pages 1–6. IEEE, 2019.
Mahda Nasrolahzadeh, Zeynab Mohammadpoory, and Javad Haddadnia. Higher-order spectral analysis of spontaneous speech signals in alzheimers disease. Cognitive neurodynamics, 12(6):583–596, 2018.
Muhammad Atif Imtiaz and Gulistan Raja. Isolated word automatic speech recognition (asr) system using mfcc, dtw & knn. In 2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast), pages 106–110. IEEE, 2016.
Ratnadeep Deshmukh and Abdulmalik Alasadi. Automatic speech recognition techniques: A review, 2018.
Aidar Khusainov and Alfira Khusainova. Speech analysis and synthesis systems for the tatar language. In 2016 IEEE Artificial Intelligence and Natural Language Conference (AINL), pages 1–6. IEEE, 2016.
Vinod Chandran and Boualem Boashash. Time-frequency methods in radar, sonar, and acoustics. Time-frequency signal analysis and processing (Second Edition): A comprehensive reference, pages 793–856, 2016.
Udo Z¨olzer. Digital audio signal processing, volume 9. Wiley Online Library, 2008.
Jibin Wu, Yansong Chua, and Haizhou Li. A biologically plausible speech recognition framework based on spiking neural networks. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2018.
Mehmet Berkehan Akc¸ay and Kaya O?guz. Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication, 116:56–76, 2020.
Zulfiqar Ali, M Shamim Hossain, Ghulam Muhammad, and Arun Kumar Sangaiah. An intelligent healthcare system for detection and classification to discriminate vocal fold disorders. Future Generation Computer Systems, 85:19–28, 2018.
Ronald B¨ock, Olga Egorow, Ingo Siegert, and Andreas Wendemuth. Comparative study on normalisation in emotion recognition from speech. In International Conference on Intelligent Human Computer Interaction, pages 189–201. Springer, 2017.
Win Lai Lai Phyu and Win Pa Pa. Building speaker identification dataset for noisy conditions. In 2020 IEEE Conference on Computer Applications (ICCA), pages 1–6. IEEE, 2020.
Chengli Sun, Qi Zhu, and Minghua Wan. A novel speech enhancement method based on constrained low-rank and sparse matrix decomposition. Speech Communication, 60:44– 55, 2014.
Bo Zheng, Jinsong Hu, Ge Zhang, YulingWu, and Jianshuang Deng. Analysis of noise reduction techniques in speech recognition. In 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), volume 1, pages 928–933. IEEE, 2020.
Atreyee Khan and Uttam Kumar Roy. Emotion recognition using prosodie and spectral features of speech and na¨ive bayes classifier. In 2017 international conference on wireless communications, signal processing and networking (WiSPNET), pages 1017–1021. IEEE, 2017.
Leena Mary. Extraction and representation of prosody for speaker, language, emotion, and speech recognition. In Extraction of Prosody for Automatic Speaker, Language, Emotion and Speech Recognition, pages 23–43. Springer, 2019.
Safa Chebbi and Sofia Ben Jebara. On the use of pitchbased features for fear emotion detection from speech. In 2018 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), pages 1–6. IEEE, 2018.
Kasiprasad Mannepalli, Panyam Narahari Sastry, and Maloji Suman. Analysis of emotion recognition system for telugu using prosodic and formant features. In Speech and Language Processing for Human-Machine Communications, pages 137–144. Springer, 2018.
Abdullah I Al-Shoshan. Speech and music classification and separation: a review. Journal of King Saud University- Engineering Sciences, 19(1):95–132, 2006.
Priyanka Gupta and S Sengupta. Voiced/unvoiced decision with a comparative study of two pitch detection techniques. 2018.
Garima Sharma, Kartikeyan Umapathy, and Sridhar Krishnan. Trends in audio signal feature extraction methods. Applied Acoustics, 158:107020, 2020.
Paavo Alku and Rahim Saeidi. The linear predictive modeling of speech from higher-lag autocorrelation coefficients applied to noise-robust speaker recognition. IEEE/acm transactions on audio, speech, and language processing, 25(8):1606– 1617, 2017.
Mohammed Arif Mazumder and Rosalina Abdul Salam. Feature extraction techniques for speech processing: A review. 2019.
Khushboo S Desai and Heta Pujara. Speaker recognition from the mimicked speech: A review. In 2016 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), pages 2254–2258. IEEE, 2016.
Yong Yin, Yu Bai, Fei Ge, Huichun Yu, and Yunhong Liu. Long-term robust identification potential of a wavelet packet decomposition based recursive drift correction of e-nose data for chinese spirits. Measurement, 139:284–292, 2019.
Yu Zhou, Yanqing Sun, Jianping Zhang, and Yonghong Yan. Speech emotion recognition using both spectral and prosodic features. In 2009 International Conference on Information Engineering and Computer Science, pages 1–4. IEEE, 2009.
Agustinus Bimo Gumelar, Afid Kurniawan, Adri Gabriel Sooai, Mauridhi Hery Purnomo, Eko Mulyanto Yuniarno, Indar Sugiarto, Agung Widodo, Andreas Agung Kristanto, and Tresna Maulana Fahrudin. Human voice emotion identification using prosodic and spectral feature extraction based on deep neural networks. In 2019 IEEE 7th International Conference on Serious Games and Applications for Health (SeGAH), pages 1–8. IEEE, 2019.
Mohit Dua, Rajesh Kumar Aggarwal, and Mantosh Biswas. Gfcc based discriminatively trained noise robust continuous asr system for hindi language. Journal of Ambient Intelligence and Humanized Computing, 10(6):2301–2314, 2019.
Abdul Malik Badshah, Nasir Rahim, Noor Ullah, Jamil Ahmad, Khan Muhammad, Mi Young Lee, Soonil Kwon, and Sung Wook Baik. Deep features-based speech emotion recognition for smart affective services. Multimedia Tools and Applications, 78(5):5571–5589, 2019.
Moataz El Ayadi, Mohamed S Kamel, and Fakhri Karray. Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3):572–587, 2011.
Bhagyalaxmi Jena and Sudhansu Sekhar Singh. Analysis of stressed speech on teager energy operator (teo). International Journal of Pure and Applied Mathematics, 118(16):667–680, 2018.
Antonio Camarena-Ibarrola, Fernando Luque, and Edgar Chavez. Speaker identification through spectral entropy analysis. In 2017 IEEE international autumn meeting on power, electronics and computing (ROPEC), pages 1–6. IEEE, 2017.
Ankita N Chadha, Mukesh A Zaveri, and Jignesh N Sarvaiya. Optimal feature extraction and selection techniques for speech processing: A review. In 2016 International Conference on Communication and Signal Processing (ICCSP), pages 1669–1673. IEEE, 2016.
Ali Mirzaei, Vahid Pourahmadi, Mehran Soltani, and Hamid Sheikhzadeh. Deep feature selection using a teacher-student network. Neurocomputing, 383:396–408, 2020.
Jie Cai, Jiawei Luo, Shulin Wang, and Sheng Yang. Feature selection in machine learning: A new perspective. Neurocomputing, 300:70–79, 2018.
Ismail El Moudden, Mounir Ouzir, and Souad ElBernoussi. Automatic speech analysis in patients with parkinson’s disease using feature dimension reduction. In Proceedings of the 3rd International Conference on Mechatronics and Robotics Engineering, pages 167–171, 2017.
Zhen-Tao Liu, Qiao Xie, Min Wu, Wei-Hua Cao, Ying Mei, and Jun-Wei Mao. Speech emotion recognition based on an improved brain emotion learning model. Neurocomputing, 309:145–156, 2018.
Qipei Mei, Mustafa G¨ul, and Marcus Boay. Indirect health monitoring of bridges using mel-frequency cepstral coefficients and principal component analysis. Mechanical Systems and Signal Processing, 119:523–546, 2019.
Ana Rodr´iguez-Hoyos, David Rebollo-Monedero, Jos´e Estrada-Jim´enez, Jordi Forn´e, and Luis Urquiza-Aguiar. Preserving empirical data utility in k-anonymous microaggregation via linear discriminant analysis. Engineering Applications of Artificial Intelligence, 94:103787, 2020.
Chun-Na Li, Yuan-Hai Shao, Wotao Yin, and Ming-Zeng Liu. Robust and sparse linear discriminant analysis via an alternating direction method of multipliers. IEEE Transactions on Neural Networks and Learning Systems, 31(3):915–926, 2019.
Xiaowei Zhao, Jun Guo, Feiping Nie, Ling Chen, Zhihui Li, and Huaxiang Zhang. Joint principal component and discriminant analysis for dimensionality reduction. IEEE Transactions on Neural Networks and Learning Systems, 31(2):433–444, 2019.
Lawrence R Rabiner. Speech recognition based on pattern recognition approaches. In Digital Speech Processing, pages 111–126. Springer, 1992.
Zuzanna Miodonska, Marcin D Bugdol, and Michal Krecichwost. Dynamic time warping in phoneme modeling for fast pronunciation error detection. Computers in Biology and Medicine, 69:277–285, 2016.
Usman Khan, Muhammad Sarim, Maaz Bin Ahmad, and Farhan Shafiq. Feature extraction and modeling techniques in speech recognition: A review. In 2019 4th International Conference on Information Systems Engineering (ICISE), pages 63–67. IEEE, 2019.
S Shaikh Naziya and RR Deshmukh. Speech recognition systema review. IOSR J. Comput. Eng, 18(4):3–8, 2016.
Trishna Barman and Nabamita Deb. State of the art review of speech recognition using genetic algorithm. In 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), pages 2944–2946. IEEE, 2017.
Lawrence Rabiner. Fundamentals of speech recognition. Fundamentals of speech recognition, 1993.
Kennedy Okokpujie, Etinosa Noma-Osaghae, Samuel John, and Prince C Jumbo. Automatic home appliance switching using speech recognition software and embedded system. In 2017 international conference on computing networking and informatics (ICCNI), pages 1–4. IEEE, 2017.
Matt Shannon. Optimizing expected word error rate via sampling for speech recognition. arXiv preprint arXiv:1706.02776, 2017.
Dat Tat Tran. Fuzzy approaches to speech and speaker recognition. PhD thesis, university of Canberra, 2000.

Index Terms

Computer Science

Information Sciences

Keywords

Automatic speech recognition feature extraction dimension reduction modeling and matching techniques