CFP last date
20 May 2025
Reseach Article

Subspace-based Representations for Acoustic Scene Classification

by Akansha Tyagi, Padmanabhan Rajan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Number 3
Year of Publication: 2025
Authors: Akansha Tyagi, Padmanabhan Rajan
10.5120/ijca2025924777

Akansha Tyagi, Padmanabhan Rajan . Subspace-based Representations for Acoustic Scene Classification. International Journal of Computer Applications. 187, 3 ( May 2025), 1-8. DOI=10.5120/ijca2025924777

@article{ 10.5120/ijca2025924777,
author = { Akansha Tyagi, Padmanabhan Rajan },
title = { Subspace-based Representations for Acoustic Scene Classification },
journal = { International Journal of Computer Applications },
issue_date = { May 2025 },
volume = { 187 },
number = { 3 },
month = { May },
year = { 2025 },
issn = { 0975-8887 },
pages = { 1-8 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume187/number3/subspace-based-representations-for-acoustic-scene-classification/ },
doi = { 10.5120/ijca2025924777 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-05-17T02:45:46.437466+05:30
%A Akansha Tyagi
%A Padmanabhan Rajan
%T Subspace-based Representations for Acoustic Scene Classification
%J International Journal of Computer Applications
%@ 0975-8887
%V 187
%N 3
%P 1-8
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Real-world acoustic scene data has a complex structure that leads to high levels of overlap within an acoustic scene class. This overlap stems from various similar factors, such as different recording devices and recording locations or cities, which act as confounding factors. On the other hand, the same set of confounding factors would be present across different acoustic scene classes and can be considered as a common link across them. Utilizing this common structure, it is possible to perform multi-block analysis to learn the representation of these common links. Two formulations are proposed for the multi-block analysis of acoustic scene data, employing a common orthogonal basis extraction algorithm. The proposed formulations enhance the performance of the acoustic scene classification system by reducing the information pertaining to the recording devices and cities from the learnt acoustic scene representations. Experiments were conducted on five standard Detection and Classification of Acoustic Scenes and Events (DCASE) datasets. Across all datasets, the classification performance achieved using features derived from the multi-block formulations surpassed that of features not incorporating these formulations.

References
  1. Raman Arora and Karen Livescu. Multi-view CCA-based acoustic features for phonetic recognition across speakers and domains. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013.
  2. Daniele Barchiesi, Dimitrios Giannoulis, Dan Stowell, and Mark D. Plumbley. Acoustic scene classification: Classifying environments from the sounds they produce. IEEE Signal Processing Magazine, 2015.
  3. Helen L. Bear, Toni Heittola, Annamaria Mesaros, Emmanouil Benetos, and Tuomas Virtanen. City classification from multiple real-world sound scenes. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pages 11–15, 2019.
  4. Helen L Bear, Veronica Morfi, and Emmanouil Benetos. An evaluation of data augmentation methods for sound scene geotagging. arXiv preprint arXiv:2110.04585, 2021.
  5. Victor Bisot, Romain Serizel, Slim Essid, and Ga¨el Richard. Acoustic scene classification with matrix factorization for unsupervised feature learning. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.
  6. Victor Bisot, Romain Serizel, Slim Essid, and Ga¨el Richard. Feature learning with matrix factorization applied to acoustic scene classification. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(6):1216–1229, 2017.
  7. Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. IEEE conference on computer vision and pattern recognition, pages 248–255, 2009.
  8. J. Eggert and E. Korner. Sparse coding and NMF. 2004 IEEE International Joint Conference on Neural Network, 4:2529– 2533 vol.4, 2004.
  9. Jort F. Gemmeke, Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, and Marvin Ritter. Audio set: An ontology and human-labeled dataset for audio events. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017.
  10. Yuan Gong, Yu-An Chung, and James Glass. Ast: Audio spectrogram transformer. Interspeech, 2021.
  11. David Heise and Helen L Bear. Visually exploring multipurpose audio data. IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP), pages 1–6, 2021.
  12. Harold Hotelling. Relations between two sets of variates. Biometrika, Oxford University Press, Biometrika Trust, 28(3/4):321–377, 1936.
  13. Liu Jie. Acoustic scene classification with residual networks and attention mechanism. Detection and classification of acoustic scenes and events (DCASE) challenge, 2020.
  14. Michał Kosmider. Spectrum correction: Acoustic scene classification with mismatched recording devices. Interspeech, 2020.
  15. Anurag Kumar, Benjamin Elizalde, and Bhiksha Raj. Audio content based geotagging in multimedia. arXiv preprint arXiv:1606.02816, 2016.
  16. Julien Mairal, Francis Bach, and Jean Ponce. Task-driven dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4):791–804, 2012.
  17. Mark D McDonnell and Wei Gao. Acoustic scene classification using deep residual networks with late fusion of separated high and low frequency paths. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 141–145, 2020.
  18. Seongkyu Mun and Suwon Shon. Domain mismatch robust acoustic scene classification using channel information conversion. pages 845–849, 2019.
  19. Paul D. O’Grady and Barak A. Pearlmutter. Convolutive nonnegative matrix factorisation with a sparseness constraint. 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing, 2006.
  20. Daniel S Park,William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D Cubuk, and Quoc V Le. Specaugment: A simple data augmentation method for automatic speech recognition. Interspeech, 2019.
  21. Lam Pham, Huy Phan, Truc Nguyen, Ramaswamy Palaniappan, Alfred Mertins, and Ian McLoughlin. Robust acoustic scene classification using a multi-spectrogram encoderdecoder framework. Digital Signal Processing, 110:102943, 2021.
  22. William Phillips and Ellen Riloff. Exploiting strong syntactic heuristics and co-training to learn semantic lexicons. Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10, page 125–132, 2002.
  23. A. Solomonoff, W.M. Campbell, and I. Boardman. Advances in channel compensation for svm speaker recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2005.
  24. Krishna Somandepalli, Rajat Hebbar, and Shrikanth Narayanan. Multi-face: Self-supervised multiview adaptation for robust face clustering in videos. arXiv preprint arXiv:2008.11289, 2020.
  25. Yizhou Tan, Haojun Ai, Shengchen Li, and Mark D Plumbley. Acoustic scene classification across cities and devices via feature disentanglement. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024.
  26. Akansha Tyagi and Padmanabhan Rajan. Location-invariant representations for acoustic scene classification. 30th European Signal Processing Conference (EUSIPCO), 2022.
  27. Devalraju Dhanunjaya Varma, Padmanabhan Rajan, and Aroor Dinesh Dileep. Learning to separate: Soundscape classification using foreground and background. 28th European Signal Processing Conference (EUSIPCO), 2021.
  28. Tuomas Virtanen, Mark D. Plumbley, and Dan Ellis. Computational analysis of sound scenes and events. Springer Publishing Company, Incorporated, 1st, 2017.
  29. Peiyao Wang, Zhiyuan Cheng, and Xinkang Xu. Acoustic scene classification with device mismatch using data augmentation by spectrum correction. Detection and classification of acoustic scenes and events (DCASE) challenge, 2020.
  30. Yu-Xiong Wang and Yu-Jin Zhang. Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering, 2013.
  31. Daoqiang Zhang, Zhi-Hua Zhou, and Songcan Chen. Nonnegative matrix factorization on kernels. Pacific Rim International Conference on Artificial Intelligence, pages 404–412, 2006.
  32. Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimization. International Conference on Learning Representations, 2018.
  33. Guoxu Zhou, Andrzej Cichocki, Yu Zhang, and Danilo P. Mandic. Group component analysis for multiblock data: Common and individual feature extraction. IEEE Transactions on Neural Networks and Learning Systems, 27(11):2426–2439, 2016.
Index Terms

Computer Science
Information Sciences

Keywords

Acoustic Scene Classification Multi-block Analysis Subspacebased representations Intra-scene variation Recording device Recording city Detection and Classification of Acoustic Scenes and Events