A Survey on Techniques for Enhancing Speech

Print
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2018
Authors:
Tayseer M. F. Taha, Amir Hussain
10.5120/ijca2018916290

Tayseer M F Taha and Amir Hussain. A Survey on Techniques for Enhancing Speech. International Journal of Computer Applications 179(17):1-14, February 2018. BibTeX

@article{10.5120/ijca2018916290,
	author = {Tayseer M. F. Taha and Amir Hussain},
	title = {A Survey on Techniques for Enhancing Speech},
	journal = {International Journal of Computer Applications},
	issue_date = {February 2018},
	volume = {179},
	number = {17},
	month = {Feb},
	year = {2018},
	issn = {0975-8887},
	pages = {1-14},
	numpages = {14},
	url = {http://www.ijcaonline.org/archives/volume179/number17/28957-2018916290},
	doi = {10.5120/ijca2018916290},
	publisher = {Foundation of Computer Science (FCS), NY, USA},
	address = {New York, USA}
}

Abstract

Speech enhancement is used in almost all the modern communication systems. It is obvious that when speech is being transmitted, its quality may degrade due to interference in the environment it is passing through. Some of the interferences that may affect the speech quality of transit include acoustic additive noise, acoustic reverberation or white Gaussian noise. This paper focuses on the techniques that appeared in the literature to enhance the signal of speech. Various methods used include wiener filter, statistical methods, subspace method, basic spectral subtraction method and spectral subtraction. In this paper authors will discuss various such methods along with their advantages and disadvantages. The discussion will also review the studies conducted by other researchers on other machine learning techniques, such as Neural network, Deep Neural Network ,Convolution Neural Networks and optimization techniques which used for the enhancement of speech.

References

  1. Shishir Banchhor, Jimish Dodia, and Darshana Gowda. Gui based performance analysis of speech enhancement techniques. International Journal of Scientific and Research Publications, 3(9):1, 2013.
  2. Kumar K Ravi and PV Subbaiah. A survey on speech enhancement methodologies. International Journal of Intelligent Systems and Applications, 8(12):37, 2016.
  3. Philipos C. Loizou. Speech Enhancement: Theory and Practice. CRC Press, Inc., Boca Raton, FL, USA, 2nd edition, 2013.
  4. Hardik Panchmatia, Karan Gaikar, and Dharmesh Patel. Comparison of different speech enhancement techniques. Imperial Journal of Interdisciplinary Research, 2(5), 2016.
  5. Soumasunderaswari D and Prashanthini K. A survey on various multichannel speech enhancement algorithms. pages 254–255, 01 2015.
  6. Sunita Dixit and Dr MD Yusuf Mulge. Review on speech enhancement techniques. International Journal of Computer Science and Mobile Computing, IJCSMC, 3(8):285–290, 2014.
  7. A. Chaudhari and S. B. Dhonde. A review on speech enhancement techniques. In 2015 International Conference on Pervasive Computing (ICPC), pages 1–3, Jan 2015.
  8. Devyani S Kulkarni, Ratnadeep R Deshmukh, and Pukhraj P Shrishrimal. A review of speech signal enhancement techniques. International Journal of Computer Applications, 139(14), 2016.
  9. V Sunnydayal, N Sivaprasad, and T Kishore Kumar. A survey on statistical based single channel speech enhancement techniques. International Journal of Intelligent Systems and Applications, 6(12):69, 2014.
  10. Jae S Lim and Alan V Oppenheim. Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE, 67(12):1586–1604, 1979.
  11. Norbert Wiener. Extrapolation, interpolation, and smoothing of stationary time series : with engineering applications. M.I. T. paperback series. Cambridge, Mass. Technology Press of the Massachusetts Institute of Technology, 1949.
  12. Aarti Singh. Adaptive noise cancellation. Central Elektronica Engineering Research Institute, University of Dehli, 2001.
  13. Nasser Mohammadiha, Timo Gerkmann, and Arne Leijon. A new linear mmse filter for single channel speech enhancement based on nonnegative matrix factorization. In Applications of Signal Processing to Audio and Acoustics (WASPAA), 2011 IEEE Workshop on, pages 45–48. IEEE, 2011.
  14. M. Berouti, R. Schwartz, and J. Makhoul. Enhancement of speech corrupted by acoustic noise. In Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP ’79., volume 4, pages 208–211, Apr 1979.
  15. S. Boll. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), Apr 1979.
  16. Stefan J Mauger, Chris D Warren, Michelle R Knight, Michael Goorevich, and Esti Nel. Clinical evaluation of the nucleus R 6 cochlear implant system: Performance improvements with smartsound iq. International journal of audiology, 53(8):564–576, 2014.
  17. Tobias Goehring, Federico Bolner, Jessica JM Monaghan, Bas van Dijk, Andrzej Zarowski, and Stefan Bleeck. Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users. Hearing research, 344:183–194, 2017.
  18. Gianluca Monaci. On the modelling of multi-modal data using redundant dictionaries. 2007.
  19. Jon Driver. Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading. Nature, 381(6577):66, 1996.
  20. Harry McGurk and John MacDonald. Hearing lips and seeing voices. Nature, 264(5588):746–748, 1976.
  21. Mark T Wallace, GE Roberson, W David Hairston, Barry E Stein, J William Vaughan, and Jim A Schirillo. Unifying multisensory signals across time and space. Experimental Brain Research, 158(2):252–258, 2004.
  22. Shams Watkins, Ladan Shams, Sachiyo Tanaka, J-D Haynes, and Geraint Rees. Sound alters activity in human v1 in association with illusory visual perception. Neuroimage, 31(3):1247–1256, 2006.
  23. Artem Violentyev, Shinsuke Shimojo, and Ladan Shams. Touch-induced visual illusion. Neuroreport, 16(10):1107–1110, 2005.
  24. Jean-Pierre Bresciani, Franziska Dammeier, and Marc O Ernst. Vision and touch are automatically integrated for the perception of sequences of events. Journal of vision, 6(5):2–2, 2006.
  25. In Jean-Philippe Thiran, , Ferran Marqus, , and Herv Bourlard, editors, Multimodal Signal Processing, pages iv –. Academic Press, Oxford, 2010.
  26. S Lakshmikanth, KR Nataraj, and KR Rekha. Noise cancellation in speechsignal processing: A review. International Journal of Advanced Research in Computer and Communication Engineering, (1), 2014.
  27. Saeed V Vaseghi. Advanced digital signal processing and noise reduction. John Wiley & Sons, 2008.
  28. A. Hussain, M. Chetouani, S. Squartini, A. Bastari, and F. Piazza. Nonlinear Speech Enhancement: An Overview, pages 217–248. Springer Berlin Heidelberg, 2007.
  29. Sunil Kamath and Philipos C. Loizou. A multi-band spectral subtraction method for enhancing speech corrupted by colored noise. In ICASSP, page 4164. IEEE, 2002.
  30. P. Lockwood and J. Boudy. Experiments with a nonlinear spectral subtractor (nss), hidden markov models and the projection, for robust speech recognition in cars. Speech Communication, 11(2):215 – 228, 1992.
  31. S Ogata and Tetsuya Shimamura. Reinforced spectral subtraction method to enhance speech signal. In TENCON 2001. Proceedings of IEEE Region 10 International Conference on Electrical and Electronic Technology, volume 1, pages 242–245. IEEE, 2001.
  32. Nathalie Virag. Single channel speech enhancement based on masking properties of the human auditory system. IEEE Transactions on speech and audio processing, 7(2):126–137, 1999.
  33. Hwai-Tsu Hu, Fang-Jang Kuo, and Hsin-Jen Wang. Supplementary schemes to spectral subtraction for speech enhancement. Speech Communication, 36(3?4):205 – 218, 2002.
  34. Navneet Upadhyay and Abhijit Karmakar. An improved multi-band spectral subtraction algorithm for enhancing speech in various noise environments. Procedia Engineering, 64:312–321, 2013.
  35. C. Yu and L. Su. Speech enhancement based on the generalized sidelobe cancellation and spectral subtraction for a microphone array. In 2015 8th International Congress on Image and Signal Processing (CISP), pages 1318–1322, Oct 2015.
  36. L. Cao, T. q. Zhang, H. x. Gao, and C. Yi. Multi-band spectral subtraction method combined with auditory masking properties for speech enhancement. In 2012 5th International Congress on Image and Signal Processing, pages 72–76, Oct 2012.
  37. Yu Cai and Chaohuan Hou. Subband spectral-subtraction speech enhancement based on the dft modulated filter banks. In Signal Processing (ICSP), 2012 IEEE 11th International Conference on, volume 1, pages 571–574. IEEE, 2012.
  38. Prabhakaran G., Indra J., and Kasthuri N. Tamil speech enhancement using non-linear spectral subtraction. In 2014 International Conference on Communication and Signal Processing, pages 1482–1485, April 2014.
  39. Md T Islam, C Shahnaz, and SA Fattah. Speech enhancement based on a modified spectral subtraction method. In 2014 IEEE 57th International Midwest Symposium on Circuits and Systems (MWSCAS), pages 1085–1088. IEEE, 2014.
  40. Shambhu Shankar Bharti, Manish Gupta, and Suneeta Agarwal. A new spectral subtraction method for speech enhancement using adaptive noise estimation. In Recent Advances in Information Technology (RAIT), 2016 3rd International Conference on, pages 128–132. IEEE, 2016.
  41. Guo-Hong Ding, Taiyi Huang, and Bo Xu. Suppression of additive noise using a power spectral density mmse estimator. IEEE Signal Processing Letters, 11(6):585–588, June 2004.
  42. I. Almajai and B. Milner. Visually derived wiener filters for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 19(6):1642–1651, Aug 2011.
  43. Marco Jeub and Peter Vary. Binaural dereverberation based on a dual-channel wiener filter with optimized noise field coherence. In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pages 4710–4713. IEEE, 2010.
  44. MA Abd El-Fattah, Moawad Ibrahim Dessouky, Salah M Diab, and Fathi El-Sayed Abd El-Samie. Speech enhancement using an adaptive wiener filtering approach. Progress In Electromagnetics Research M, 4:167–184, 2008.
  45. Marwa A Abd El-Fattah, Moawad I Dessouky, Alaa M Abbas, Salaheldin M Diab, El-Sayed M El-Rabaie, Waleed Al-Nuaimy, Saleh A Alshebeili, and Fathi E Abd El-Samie. Speech enhancement with an adaptive wiener filter. International Journal of Speech Technology, 17(1):53–64, 2014.
  46. Amart Sulong, Teddy Surya Gunawan, Othman O Khalifa, Mira Kartiwi, and Eliathamby Ambikairajah. Speech enhancement based on wiener filter and compressive sensing. Indonesian Journal of Electrical Engineering and Computer Science, 2(2):367–379, 2016.
  47. Yariv Ephraim and David Malah. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 33(2):443–445, 1985.
  48. Y. Ephraim and D. Malah. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6):1109–1121, Dec 1984.
  49. Olivier Capp´e. Elimination of the musical noise phenomenon with the ephraim and malah noise suppressor. IEEE transactions on Speech and Audio Processing, 2(2):345–349, 1994.
  50. Y. Ephraim, H.L. Van Trees, A signal subspace approach for speech enhancement IEEE Transactions on speech and audio processing, 3(4),251–266, 1995.
  51. Yi Hu and Philipos C. Loizou. A subspace approach for enhancing speech corrupted by colored noise. In ICASSP, pages 573–576. IEEE, 2002.
  52. S. Surendran and T. K. Kumar. Perceptual subspace speech enhancement with ssdr normalization. In 2016 International Conference on Microelectronics, Computing and Communications (MicroCom), pages 1–6, Jan 2016.
  53. F. Jabloun and B. Champagne. Incorporating the human hearing properties in the signal subspace approach for speech enhancement. IEEE Transactions on Speech and Audio Processing, 11(6):700–708, Nov 2003.
  54. Wang Guang Yan, Geng Yan Xiang, and Zhao Xiao Qun. A signal subspace speech enhancement method for various noises. Indonesian Journal of Electrical Engineering and Computer Science, 11(2):726–735, 2013.
  55. Chengli SUN, Jianxiao XIE, and Yan LENG. A signal subspace speech enhancement approach based on joint low-rank and sparse matrix decomposition. Archives of Acoustics, 41(2), 2016.
  56. Sultana Hifrin, Nayan Bharali Palash, and Sharmah Uzzal. General report on speech recognition using pattern classification methods. Communication, Cloud and Big Data: Proceedings of CCB 2014, 2014.
  57. S Lakshmikanth, KR Nataraj, and KR Rekha. Noise cancellation in speechsignal processing: A review. International Journal of Advanced Research in Computer and Communication Engineering, (1), 2014.
  58. Bernard Widrow and Samuel D Stearns. Adaptive signal processing, volume 15. Prentice-hall Englewood Cliffs, NJ, 1985.
  59. Prajna Kunche and KVVS Reddy. Metaheuristic Applications to Speech Enhancement. Springer, 2016.
  60. Simon Haykin. Adaptive Filter Theory (3rd Ed.). Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1996.
  61. E Hari Krishna,MRaghuram, K Venu Madhav, and K Ashoka Reddy. Acoustic echo cancellation using a computationally efficient transform domain lms adaptive filter. In Information Sciences Signal Processing and their Applications (ISSPA), 2010 10th International Conference on, pages 409–412. IEEE, 2010.
  62. Huang Guopin, Zhao Wei, and Zhang Qin. Improvement of audio noise reduction system based on rls algorithm. In Computer Science and Network Technology (ICCSNT), 2013 3rd International Conference on, pages 964–968. IEEE, 2013.
  63. Rakesh, Pogula and Kumar, T Kishore, A Novel RLS Based Adaptive Filtering Method for Speech Enhancement International Journal of Electrical, Computer, Electronics and Communication Engineering, World Academy of Science, Engineering and Technology 9(2), 624–628, 2015
  64. Mohamed Djendi, Rahima Henni, and Akila Sayoud. A new dual forward bss based rls (dfrls) algorithm for speech enhancement. In Engineering & MIS (ICEMIS), International Conference on, pages 1–4. IEEE, 2016.
  65. M. M. Dewasthale, R. D. Kharadkar, and M. Bari. Comparative performance analysis and hardware implementation of adaptive filter algorithms for acoustic noise cancellation. In 2015 International Conference on Information Processing (ICIP), pages 124–129, 2015.
  66. Jyoti Dhiman, Shadab Ahmad, and Kuldeep Gulia. Comparison between adaptive filter algorithms (lms, nlms and rls). International Journal of Science, Engineering and Technology Research (IJSETR), 2(5):1100–1103, 2013.
  67. Tobias Goehring, Federico Bolner, Jessica JM Monaghan, Bas van Dijk, Andrzej Zarowski, and Stefan Bleeck. Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users. Hearing research, 344:183–194, 2017.
  68. Yong Xu, Jun Du, Li-Rong Dai, and Chin-Hui Lee. An experimental study on speech enhancement based on deep neural networks. IEEE Signal processing letters, 21(1):65–68, 2014.
  69. Yong Xu, Jun Du, Li-Rong Dai, and Chin-Hui Lee. A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 23(1):7–19, 2015.
  70. Szu-Wei Fu, Yu Tsao, and Xugang Lu. Snr-aware convolutional neural network modeling for speech enhancement. In INTERSPEECH, pages 3768–3772, 2016.
  71. Szu-Wei Fu, Ting-yao Hu, Yu Tsao, and Xugang Lu. Complex spectrogram enhancement by convolutional neural network with multi-metrics learning, 2017.
  72. Hong, Liang and Rosca, Justinian and Balan, Radu. Bayesian single channel speech enhancement exploiting sparseness in the ICA domain. Signal Processing Conference, 2004 12th European, 1713–1716.IEEE, 2004
  73. L Badri Asl and Masoud Geravanchizadeh. Dual-channel speech enhancement based on stochastic optimization strategies. In Information Sciences Signal Processing and their Applications (ISSPA), 2010 10th International Conference on, pages 229–232. IEEE, 2010.
  74. Sina Ghalami Osgouei and Masoud Geravanchizadeh. Dual-channel speech enhancement based on a hybrid particle swarm optimization algorithm. In Telecommunications (IST), 2010 5th International Symposium on, pages 873–877. IEEE, 2010.
  75. Xin-She Yang. Nature-inspired metaheuristic algorithms. Luniver press, 2010.
  76. K Prajna, G Sasibhushana Rao, KVVS Reddy, and R Uma Maheswari. Application of bat algorithm in dual channel speech enhancement. In Communications and Signal Processing (ICCSP), 2014 International Conference on, pages 1457–1461. IEEE, 2014.
  77. Prajna Kunche and KVVS Reddy. Speech enhancement based on bat algorithm (ba). In Metaheuristic Applications to Speech Enhancement, pages 91–110. Springer, 2016.
  78. Prajna Kunche and KVVS Reddy. Metaheuristic Applications to Speech Enhancement. Springer, 2016.
  79. Jonathan P´erez, Fevrier Valdez, and Oscar Castillo. Modification of the bat algorithm using fuzzy logic for dynamical parameter adaptation. In Evolutionary Computation (CEC), 2015 IEEE Congress on, pages 464–471. IEEE, 2015.
  80. K Prajna, GSB Rao, KVVS Reddy, and R Uma Maheswari. A new approach to dual channel speech enhancement based on gravitational search algorithm (gsa). International Journal of Speech Technology, 17(4):341–351, 2014.
  81. Prajna Kunche, G Sasi Bhushan Rao, KVVS Reddy, and R Uma Maheswari. A new approach to dual channel speech enhancement based on hybrid psogsa. International Journal of Speech Technology, 18(1):45–56, 2015.
  82. R. Goecke, G. Potamianos, and C. Neti. Noisy audio feature enhancement using audio-visual speech data. In Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on, volume 2, pages II–2025–II–2028, May 2002.
  83. Ibrahim Almajai and Ben Milner. Enhancing audio speech using visual speech features. In In:proc.Interspeech, Brighton, UK, 2009.
  84. Andrew Abel and Amir Hussain. Cognitively Inspired Audiovisual Speech Filtering: Towards an Intelligent, Fuzzy Based, Multimodal, Two-Stage Speech Enhancement System, chapter A Two Stage Multimodal Speech Enhancement System, pages 35–51. Springer International Publishing, Cham, 2015.
  85. Andrew Abel and Amir Hussain. Towards Fuzzy Logic Based Multimodal Speech Filtering, pages 75–90. Springer International Publishing, Cham, 2015.
  86. Jen-Cheng Hou, Syu-Siang Wang, Ying-Hui Lai, Jen-Chun Lin, Yu Tsao, Hsiu-Wen Chang, and Hsin-Min Wang. Audio-visual speech enhancement based on multimodal deep convolutional neural network. arXiv preprint arXiv:1703.10893, 2017.

Keywords

Conventional speech enhancement methods, Adaptive filtering methods, Multi-modal methods