What is Feature Selection and Dimensionality Reduction In Machine Learning? | Realcode4you

realcode4you
Jan 14, 2023
2 min read

Definition

A process that chooses an optimal subset of features according to a objective function

Objectives

To reduce dimensionality and remove noise
To improve mining performance
Speed of learning
Predictive accuracy
Simplicity and comprehensibility of mined results

Feature Selection and dimensionality reduction:

Improve performance (speed, predictive power, simplicity of the model).
Visualize the data for model selection.
Reduce dimensionality and remove noise.

Feature Selection is a process to select optimal subset of features according to a certain criterion.

Other reasons for performing FS may include:

removing irrelevant data and noise.
increasing accuracy of learned models.
reducing the complexity of the resulting model description, improving the understanding of the data and the model.
Dimensionality reduction is an efficient approach to downsizing data
Visualization: projection of high-dimensional data onto 2D or 3D

Application of Dimensionality Reduction

Customer relationship management
Text mining
Image retrieval
Handwritten digit recognition
Intrusion detection

how it Works..

Searching for the best subset of features.
Criteria on how to evaluating different subsets

Different Aspects of Search

Search starting points

Empty set
Full set
Random point

Search directions

Sequential forward selection
Sequential backward elimination
Bidirectional generation
Random generation

Other Types of High-Dimensional Data

Face Images

Models of Feature Selection

Filter model

Separating feature selection from classifier learning
Relying on general characteristics and statistics of data (correlation, distance, dependence, consistency)

Wrapper model

Relying on a predetermined classification algorithm
Using predictive accuracy as goodness measure
High accuracy, but computationally expensive

Filter algorithms

Example: a filter algorithm based on entropy measure or information gain

Wrapper algorithms

Example: – a wrapper algorithm based on clustering or classification accuracy

wrapper based are advantageous for giving better performances since they use the target classifier the feature selection algorithm but they suffer they are computationally expensive.

filter methods are less accurate but faster to compute.

Filter Approach

Wrappers Approach:

Drawbacks of Features Selection in some cases

The resulted subsets of many models of FS are strongly dependent on the training set size.
the removal of any of them will seriously effect the learning performance.
A backward removal strategy is very slow when working with large-scale data sets.
In some cases, the FS outcome will still be left with a relatively large number of relevant features.

Example of feature selection in R… Wrapper approach

In this example we will use Boruta Package
Boruta is FS algorithm. It works as a wrapper algorithm around Random Forest.
Random forests are an ensemble learning method for classification, regression and other tasks that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes or mean prediction of the individual trees.

How does Boruta algorithm works?

Firstly, it adds randomness to the given data set by creating shuffled copies of all features (called shadow features).
Then, it trains a random forest classifier on the extended data set and applies a feature importance measure.
At every iteration, it checks whether a real feature has a higher importance than the best of its shadow features and constantly removes features which are unimportant.
Finally, the algorithm stops either when all features gets confirmed or rejected or it reaches a specified limit of random forest runs.

Application of Boruta algorithm and Random forest in R

Required libraries :

library(Boruta)
library(mlbench)
library(caret)
library(randomForest)
library(reprtree)

Code Implemetation

set.seed(111)
boruta <- Boruta(Species ~ ., data = iris, doTrace = 2, maxRuns = 500)
print(boruta)

To get any help in Feature Selection related assignments and projects you can contact us. Realcode4you machine learning experts and professionals team easily complete your homework or projects as per given instructions within your time frame without any plagiarism issues.

Send your project details at:

realcode4you@gmail.com

RealCode4You

What is Feature Selection and Dimensionality Reduction In Machine Learning? | Realcode4you

Recent Posts

Comments