Feature Ranking in Machine Learning

FEATURE RANKING and FEATURE SELECTION

Feature Selection (FS) and Feature Ranking (FR) can be defined as selecting a subset of available features in a dataset that are associated with the response variable by excluding irrelevant and redundant features. An effective feature selection or feature ranking process mitigates the problems associated with large datasets in the sense that it results in;

1- Better classification performance,
2- Reduced storage and computational cost,
3- Generalized and more interpretable models.

Recent emergence of datasets with massive numbers of features has made pattern recognition an ever-challenging task. In particular, such high numbers of features give rise to various issues such as;

1- Overfitting, poor generalization, and inferior prediction performance,
2- Slow and computationally expensive predictors,
3- Difficulty in comprehending the underlying process.

Feature Ranking (FR) is selecting 'n' number of significant features for a problem by ranking features according to their importance in the model. An alternative to feature ranking for dimensionality reduction is feature extraction (FE) wherein original features are first combined and then projected into a new feature space with lower dimensionality. A major downside of feature extraction is that the transformed features lose their physical meaning, which complicates further analysis of the model and makes it difficult to interpret. Thus, feature selection is superior to in terms of readability and interpretability.

BSPSA is one of the wrapper feature selection algorithm developed by Vural Aksakalli, you can find complete study in [1],[2]. The purpose of the study is to introduce a new wrapper approach for FS based on a pseudo-gradient descent stochastic optimization algorithm called Binary Simultaneous Perturbation Stochastic Approximation (BSPSA). This algorithm starts with a random solution vector and moves toward the optimal solution vector via successive iterations in which the current solution vector’s individual components are perturbed simultaneously by random offsets from a qualified probability distribution.