| International Journal of Computer Applications |
| Foundation of Computer Science (FCS), NY, USA |
| Volume 187 - Number 58 |
| Year of Publication: 2025 |
| Authors: Ramtin Dabiri |
10.5120/ijca2025925985
|
Ramtin Dabiri . Non-Invasive Abalone Sex Classification from External Measurements using Interpretable Machine Learning. International Journal of Computer Applications. 187, 58 ( Nov 2025), 65-72. DOI=10.5120/ijca2025925985
Accurate sex classification of abalone is essential for selective breeding and ethical harvesting, yet many existing studies rely on invasive measurements (e.g., internal weights), limiting real-world deployment. This study contributes two innovations motivated by practical field constraints. First, a strictly non-invasive framework is adopted, using only external traits—length, diameter, height, and whole weight—so specimens are not opened. Second, instead of the common rank-then-select approach, a ranking-guided combinatorial search over polynomial and interaction terms (degree ≤ 5) is applied for multinomial logistic regression. This design is motivated by three considerations: (1) standard ranking methods (ANOVA, Mutual Information, Random Forest) evaluate variables largely in isolation, whereas sex signal emerges from feature–feature interactions; (2) relationships among external measurements are partly non-linear, so higher-order terms capture structure missed by base features or linear models; and (3) rankings can be unstable under collinearity and outliers, making empirical validation of feature sets more robust. Under an outlier-inclusive protocol, a compact model excluding diameter attains 0.5689 test accuracy, while an all-four-measurements model reaches 0.5641—both exceeding the commonly reported 0.50–0.55 range for this dataset and avoiding invasive measurements. The curated interaction design enables logistic regression to outperform more complex models (e.g., tuned SVM and XGBoost), indicating that interaction construction, rather than model complexity, is the key driver of accuracy under non-invasive constraints. The resulting pipeline is interpretable, field-deployable, and supported by fully reproducible code.