Gradient boosting is one of the powerful machine learning algorithms which is used for both classification and regression. It uses to build the predictive model. Boosting is method for converting weak learner into strong learner.
Gradient Boosting trains many models in a gradual, additive and sequential manner.
It involves three elements:
A loss function to be optimized.
A weak learner to make predictions.
An additive model to add weak learners to minimize the loss function
# importing required libraries
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score
# read the train and test dataset
train_data = pd.read_csv('train-data.csv')
test_data = pd.read_csv('test-data.csv')
# shape of the dataset
print('Shape of training data :',train_data.shape)
print('Shape of testing data :',test_data.shape)
x_train = train_data.drop(columns=['Income'],axis=1)
y_train = train_data['Income']
# seperate the independent and target variable on testing data
x_test = test_data.drop(columns=['Income'],axis=1)
y_test = test_data['Income']
model = GradientBoostingClassifier(n_estimators=100,max_depth=5)
# fit the model with the training data
model.fit(x_train,y_train)
# predict the target on the train dataset
predict_train = model.predict(x_train)
print('\nTarget on train data',predict_train)
# Accuray Score on train dataset
accuracy_train = accuracy_score(train_y,predict_train)
print('\naccuracy_score on train dataset : ', accuracy_train)
# predict the target on the test dataset
predict_test = model.predict(x_test)
print('\nTarget on test data',predict_test)
# Accuracy Score on test dataset
accuracy_test = accuracy_score(y_test,predict_test)
print('\naccuracy_score on test dataset : ', accuracy_test)