Naive Bayes is a statistical classification technique based on Bayes Theorem which is used to for very high-dimensional datasets.
Naive Bayes classifiers are based on Bayesian classification methods which is used to describe the relationship between the conditional probabilities of stoical data.
Here the mathematical formula to calculate the conational probabilities is:
P(A/B) = P(A)P(B/A)/P(B)
Application of Naïve Bayes Algorithms:
Real-time Prediction
Multi-class Prediction
Text classification/ Spam Filtering/ Sentiment Analysis
Recommendation System
And more
Example using sklearn:
# importing required libraries
import pandas as pd
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
# read the train and test dataset
train_data = pd.read_csv('train-data.csv')
test_data = pd.read_csv('test-data.csv')
# shape of the dataset
print('Shape of training data :',train_data.shape)
print('Shape of testing data :',test_data.shape)
# target variable - Income
# seperate the independent and target variable on training data
X_train = train_data.drop(columns=['Income'],axis=1)
y_train = train_data['Income']
# seperate the independent and target variable on testing data
x_test = test_data.drop(columns=['Income'],axis=1)
y_test = test_data['Income']
model = GaussianNB()
# fit the model with the training data
model.fit(x_train,y_train)
# predict the target on the train dataset
predict_train = model.predict(x_train)
print('Target on train data',predict_train)
# Accuray Score on train dataset
accuracy_train = accuracy_score(y_train,predict_train)
print('accuracy_score on train dataset : ', accuracy_train)
# predict the target on the test dataset
predict_test = model.predict(x_test)
print('Target on test data',predict_test)
# Accuracy Score on test dataset
accuracy_test = accuracy_score(test_y,predict_test)
print('accuracy_score on test dataset : ', accuracy_test)