CFP last date
20 August 2025
Reseach Article

Identifying Relevant and Non-Redundant Features in High Dimensional Data using Automated Unsupervised Feature Selection Techniques

by Suman Laha, Utpal Roy
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Number 17
Year of Publication: 2025
Authors: Suman Laha, Utpal Roy
10.5120/ijca2025925227

Suman Laha, Utpal Roy . Identifying Relevant and Non-Redundant Features in High Dimensional Data using Automated Unsupervised Feature Selection Techniques. International Journal of Computer Applications. 187, 17 ( Jul 2025), 36-46. DOI=10.5120/ijca2025925227

@article{ 10.5120/ijca2025925227,
author = { Suman Laha, Utpal Roy },
title = { Identifying Relevant and Non-Redundant Features in High Dimensional Data using Automated Unsupervised Feature Selection Techniques },
journal = { International Journal of Computer Applications },
issue_date = { Jul 2025 },
volume = { 187 },
number = { 17 },
month = { Jul },
year = { 2025 },
issn = { 0975-8887 },
pages = { 36-46 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume187/number17/identifying-relevant-and-non-redundant-features-in-highdimensional-data-using-automated-unsupervised-feature-selection-techniques/ },
doi = { 10.5120/ijca2025925227 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-07-09T01:07:29.315749+05:30
%A Suman Laha
%A Utpal Roy
%T Identifying Relevant and Non-Redundant Features in High Dimensional Data using Automated Unsupervised Feature Selection Techniques
%J International Journal of Computer Applications
%@ 0975-8887
%V 187
%N 17
%P 36-46
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Automated unsupervised feature selection extracts relevant and non-redundant features from high-dimensional data through algorithms that examine the dataset's intrinsic structure. The goal of automated unsupervised feature selection is to identify relevant and non-redundant features in high-dimensional data to enhance model performance and clarity. In data pre-processing, Weighted Graph Formation (WGF) creates a graph where features are represented as nodes, and edges are weighted based on feature similarity or relevance, helping identify relevant and non-redundant features for automated unsupervised feature selection in high-dimensional data. The Unified Dense Subgraph Detection Algorithm (UDSDA) detects dense subgraphs in a weighted graph to uncover clusters of relevant and non-redundant features in high-dimensional data, facilitating automated unsupervised feature selection by emphasizing the most meaningful feature connections. The Shrinking and Expansion Algorithm (SEA) refines feature subsets by shrinking irrelevant features and expanding relevant ones, improving the identification of non-redundant and relevant features in high-dimensional data for automated unsupervised feature selection. Normalized Mutual Information (NMI) quantifies the relationship between feature subsets, aiding in the identification of relevant and non-redundant features in high-dimensional data by assessing the shared information for automated unsupervised feature selection. The result shows that with a feature selection accuracy score of 0.92, precision of 0.91, recall of 0.93, F1 score of 0.92, training time of 5, and testing time of 1. Without feature selection accuracy score of 0.88, the precision of 0.87, the recall of 0.89, the F1 score of 0.88, training time of 10, and testing time of 2, implemented using Python Software. The future scope of automated unsupervised feature selection includes advancing algorithms for large-scale high-dimensional data, enhancing accuracy, and improving the ability to handle diverse datasets across different fields.

References
  1. Barbieri, M.C., Grisci, B.I. and Dorn, M. 2024. Analysis and comparison of feature selection methods towards performance and stability. Expert Systems with Applications, 123667.
  2. Efrem, N.H. 2024. Data-Driven Supervised Classifiers in High-Dimensional Spaces: Application on Gene Expression Data.
  3. Aljawarneh, M., Hamdaoui, R., Zouinkhi, A., Alangari, S. and Abdelkrim, M.N. 2024. Energy optimization for wireless sensor network using minimum redundancy maximum relevance feature selection and classification techniques. PeerJ Computer Science, 10, e1997.
  4. Wang, H., Zhang, Y., Li, W., Wang, Z., Li, Z. and Yang, M. 2024. CLCluster: a redundancy-reduction contrastive learning-based clustering method of cancer subtype based on multi-omics data. bioRxiv, 2024-03.
  5. Lv, J., Xia, S., Liang, D. and Chen, W. 2024. EasyFS: An Efficient Model-free Feature Selection Framework via Elastic Transformation of Features. arXiv preprint arXiv:2402.05954.
  6. Robert Vincent, A.C.S. and Sengan, S. 2024. Effective clinical decision support implementation using a multi-filter and wrapper optimisation model for the Internet of Things-based healthcare data. Scientific Reports, 14(1), 21820.
  7. Das, A.K., Goswami, S., Chakrabarti, A. and Chakraborty, B. 2024. Semi-supervised feature selection using maximum mutual information and minimum correlated feature set retrieved by augmented learning. Authorea Preprints.
  8. Saranya, G., Rajendran, R., Jaganathan, S.C.B. and Pandimurugan, V. 2024. Leveraging Feature Sensitivity and Relevance: A Hybrid Feature Selection Approach for Improved Model Performance in Supervised Classification.
  9. Zhai, W., Shi, X., Wong, Y.D., Han, Q. and Chen, L. 2024. Explainable AutoML (xAutoML) with adaptive modelling for yield enhancement in semiconductor smart manufacturing. arXiv preprint arXiv:2403.12381.
  10. Shahar, N., As’ari, M.A., Swee, T.T., Ghazali, N.F., Ibrahim, B.K.K., Hisyam, A.R. and Mansor, M.A. 2024. Optimal Activity Recognition Framework based on Improvement of Regularized Neighborhood Component Analysis (RNCA). IEEE Access.
  11. El-Mageed, A.A.A., Elkhouli, A.E., Abohany, A.A. and Gafar, M. 2024. Gene selection via improved nuclear reaction optimization algorithm for cancer classification in high-dimensional data. Journal of Big Data, 11(1), 46.
  12. Hasan, S.N.S. and Jamil, N.W. 2024. A Review Study of Microarray Data Classification with the Application of Dimension Reduction. Journal of Computing Research and Innovation, 9(1), 235-256.
  13. Singh, K.N. and Mantri, J.K. 2024. A clinical decision support system using rough set theory and machine learning for disease prediction. Intelligent Medicine.
  14. Xu, X., Zhuo, L., Lu, J. and Wu, X. 2024. WSEL: EEG feature selection with weighted self-expression learning for incomplete multi-dimensional emotion recognition. In ACM Multimedia.
  15. Benghazouani, S., Nouh, S., Zakrani, A., Haloum, I. and Jebbar, M. 2024. Enhancing feature selection with a novel hybrid approach incorporating genetic algorithms and swarm intelligence techniques. International Journal of Electrical & Computer Engineering (2088-8708), 14(1).
  16. Bach, J. and Böhm, K. 2024. Alternative feature selection with user control. International Journal of Data Science and Analytics, 1-23.
  17. Elkabalawy, M., Al-Sakkaf, A., Mohammed Abdelkader, E. and Alfalah, G. 2024. CRISP-DM-Based Data-Driven Approach for Building Energy Prediction Utilizing Indoor and Environmental Factors. Sustainability, 16(17), 7249.
  18. Balestra, C. 2024. Rankings and importance scores as multi-facets of explainable machine learning (Doctoral dissertation, Dissertation, Dortmund, Technische Universität.
  19. Diwu, P.X., Zhao, B., Wang, H., Wen, C., Nie, S., Wei, W., Li, A.Q., Xu, J. and Zhang, F. 2024. Machine learning classification algorithm screening for the main controlling factors of heavy oil CO2 huff and puff. Petroleum Research.
  20. Rebbah, F.E., Chamlal, H. and Ouaderhman, T. 2024. Accurate analysis for univariate-based filter methods for microarray data classification. Journal of Algorithms & Computational Technology, 18, 17483026241232295.
  21. Ghosh, S., & Kaur, A. 2023. Deep Unsupervised Feature Selection via Sparse Autoencoders. Journal of Machine Learning Research, 24(5), 1-20. (http://www.jmlr.org/papers/volume24/ghosh23a/ghosh23a.pdf)
  22. Yang, Y., & Li, X. 2022. Graph-Based Unsupervised Feature Selection. IEEE Transactions on Neural Networks and Learning Systems, 33(12), 6504-6515. [DOI:10.1109/TNNLS.2022.3154256] (https://doi.org/10.1109/TNNLS.2022.315425)
  23. Chen, W., & Zhang, Z. 2024. Kernel-Based Unsupervised Feature Selection with Density Estimation. Data Mining and Knowledge Discovery, 38(1), 150-174. [DOI:10.1007/s10618-023-00812-9] (https://doi.org/10.1007/s10618-023-00812-9)
  24. Patel, V., & Kumar, R. 2022. Ensemble Learning for Unsupervised Feature Selection. Pattern Recognition Letters, 160, 21-28. [DOI: 10.1016/j.patrec.2022.04.013] (https://doi.org/10.1016/j.patrec.2022.04.013)
  25. Lee, J., & Kim, S. 2023. Unsupervised Feature Selection via Variational Inference. Journal of Statistical Computation and Simulation, 93(2), 341-355. [DOI:10.1080/00949655.2022.2110258] (https://doi.org/10.1080/00949655.2022.2110258)
  26. Zhao, Y., & Wang, J. 2023. Reinforcement Learning for Unsupervised Feature Selection. Neural Networks, 146, 153-167. [DOI: 10.1016/j.neunet.2022.11.008] (https://doi.org/10.1016/j.neunet.2022.11.008)
  27. Liu, H., & Yang, Y. 2022. Multi-View Unsupervised Feature Selection. Machine Learning, 111(4), 849-867. [DOI:10.1007/s10994-022-06140-6] (https://doi.org/10.1007/s10994-022-06140-6)
  28. Zhang, T., & Chen, X. 2024. Hierarchical Clustering for Feature Selection in Unsupervised Learning. Knowledge-Based Systems, 261, 109933. [DOI: 10.1016/j.knosys.2023.109933] (https://doi.org/10.1016/j.knosys.2023.109933)
  29. Koren, Y., & Shalev-Shwartz, S. 2023. Unsupervised Feature Selection via Information-Theoretic Measures. Entropy, 25(3), 450. [DOI:10.3390/e25030450] (https://doi.org/10.3390/e25030450).
  30. Wu, Y., & Chen, J. 2024. Self-Supervised Learning for Unsupervised Feature Selection. Artificial Intelligence, 57(1), 123-142. [DOI:10.1007/s10462-023-10326-5] (https://doi.org/10.1007/s10462-023-10326-5)
Index Terms

Computer Science
Information Sciences

Keywords

Weighted Graph Formation Normalized Mutual Information Shrinking and Expansion Algorithm Unified Dense Subgraph Detection Algorithm Feature Selection High-Dimensional Data