Random forest is the supervised machine learning algorithms which is used for both classification and regression analysis. But most of cases it is used for classification.
In this creating the dicision tree on data sample and select the best solution of this.
How it work?
Step 1: Selecting the random sample of data
Step 2: And construct the decision tree for each sample data and predicting each sample
Step 3: After this use voting for each predicted result
Step 4: And last selecting most voted predicted result for final result
We can easily understand it with the help of below sklearn example:
# importing required libraries
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# read the train and test dataset
train_data = pd.read_csv('train-data.csv')
test_data = pd.read_csv('test-data.csv')
# view the top 3 rows of the dataset
print(train_data.head(3))
# shape of the dataset
print('\nShape of training data :',train_data.shape)
print('\nShape of testing data :',test_data.shape)
# target variable – Income
# seperate the independent and target variable on training data
x_train = train_data.drop(columns=['Income'],axis=1)
y_train = train_data['Income']
# seperate the independent and target variable on testing data
x_test = test_data.drop(columns=['Income'],axis=1)
y_test = test_data['Income']
model = RandomForestClassifier()
# fit the model with the training data
model.fit(x_train,y_train)
# number of trees used
print('Number of Trees used : ', model.n_estimators)
# predict the target on the train dataset
predict_train = model.predict(x_train)
print('\nTarget on train data',predict_train)
# Accuray Score on train dataset
accuracy_train = accuracy_score(y_train,predict_train)
print('\naccuracy_score on train dataset : ', accuracy_train)
# predict the target on the test dataset
predict_test = model.predict(x_test)
print('\nTarget on test data',predict_test)
# Accuracy Score on test dataset
accuracy_test = accuracy_score(y_test,predict_test)
print('\naccuracy_score on test dataset : ', accuracy_test)