International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 187 - Number 17 |
Year of Publication: 2025 |
Authors: Suman Laha, Utpal Roy |
![]() |
Suman Laha, Utpal Roy . Identifying Relevant and Non-Redundant Features in High Dimensional Data using Automated Unsupervised Feature Selection Techniques. International Journal of Computer Applications. 187, 17 ( Jul 2025), 36-46. DOI=10.5120/ijca2025925227
Automated unsupervised feature selection extracts relevant and non-redundant features from high-dimensional data through algorithms that examine the dataset's intrinsic structure. The goal of automated unsupervised feature selection is to identify relevant and non-redundant features in high-dimensional data to enhance model performance and clarity. In data pre-processing, Weighted Graph Formation (WGF) creates a graph where features are represented as nodes, and edges are weighted based on feature similarity or relevance, helping identify relevant and non-redundant features for automated unsupervised feature selection in high-dimensional data. The Unified Dense Subgraph Detection Algorithm (UDSDA) detects dense subgraphs in a weighted graph to uncover clusters of relevant and non-redundant features in high-dimensional data, facilitating automated unsupervised feature selection by emphasizing the most meaningful feature connections. The Shrinking and Expansion Algorithm (SEA) refines feature subsets by shrinking irrelevant features and expanding relevant ones, improving the identification of non-redundant and relevant features in high-dimensional data for automated unsupervised feature selection. Normalized Mutual Information (NMI) quantifies the relationship between feature subsets, aiding in the identification of relevant and non-redundant features in high-dimensional data by assessing the shared information for automated unsupervised feature selection. The result shows that with a feature selection accuracy score of 0.92, precision of 0.91, recall of 0.93, F1 score of 0.92, training time of 5, and testing time of 1. Without feature selection accuracy score of 0.88, the precision of 0.87, the recall of 0.89, the F1 score of 0.88, training time of 10, and testing time of 2, implemented using Python Software. The future scope of automated unsupervised feature selection includes advancing algorithms for large-scale high-dimensional data, enhancing accuracy, and improving the ability to handle diverse datasets across different fields.