Realcode4you Data Science Assignment Help | Data Science Homework Help



In this blog we will learn how to analyze amazon review and using train and test data:


---

#Abstract

---

The multilayer perceptron(MLP) has a large wide of classification and regression applications in many fields: pattern recognition, voice and classification problems. But the architecture choice has a great impact on the convergence of these networks. In the present paper we introduce a new approach to optimize the AMAZON REVIEW DATA, for solving the obtained model we use the genetic algorithm and we train the amazon review.


# Introduction


Here we will analyze positive and nagative review of amazon dataset and test the accuracy of train and test data.


---

# Part I - Data preparation

---

# Like importing, reading, cleaning and split, etc.


Data Source

#### http://jmcauley.ucsd.edu/data/amazon/index_2014.html


Importing Libraries:


###


import gzip

import itertools

import numpy as np

import pandas as pd



import datetime as dt

import matplotlib.pyplot as plt



from collections import Counter

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

from sklearn.model_selection import train_test_split, cross_val_score

from sklearn.metrics import accuracy_score, confusion_matrix, f1_score


%matplotlib inline


###


Reviews into Pandas DataFrame


Here we will first parse the data sets parse_gz() method using which is given in zip formate and then we will convert it into the dataframe by using convert_to_DataFrame() methods


Code for unzip file:

It used to unzip file and then convert it into the data-frame.


def parse_gz(file_path):

g = gzip.open(file_path, 'rb')

for l in g:

yield eval(l)


def convert_to_DataFrame(file_path):

i = 0

df = {}

for d in parse_gz(file_path):

df[i] = d

i += 1

return pd.DataFrame.from_dict(df, orient='index')



We are going to classify Amazon product reviews to understand the positive or negative review. Amazon has different rating(1-stars, 2-stars, etc), which is given in overall column. We will use that to compare our prediction.


Now we go the split data, if you need complete data with how to load and how to clean and prepare for fit into the model then contact us, so we can give an complete details at here or need any help related to machine learning and data science then also contact with us.


Split data:


x_train, x_test, y_train, y_test = train_test_split(sports_data.reviewText,sports_data.review_in_float, random_state=0)


How to use countvectorizer()


It used to change the data into the string to integer


cv = CountVectorizer()

X_traincv = cv.fit_transform(x_train)

X_testcv = cv.transform(x_test)



After this we are fit it into the model


Here we fit it into the MLP Classifier


## import mlp classifier libraries


from sklearn.preprocessing import StandardScaler

# Training the model

from sklearn.neural_network import MLPClassifier

from sklearn.metrics import classification_report,confusion_matrix


##


mlp = MLPClassifier()


mlp.fit(X_traincv,y_train)


# predict the target on the train dataset


pred_train = mlp.predict(X_traincv)

pred_train


# Accuray Score on train dataset


accur_train = accuracy_score(train_y,pred_train)

print('accuracy_score on train dataset : ', accur_train)


#confusion matrix to find to mark predicted value


cnf = confusion_matrix(test_y,predictions)

cnf


#result with score and accuracy


print(classification_report(test_y,predictions))



Other Services which is offered by <realcode4you>


<Realcode4you> Assignment Help


<Realcode4you> Web Assignment Help


#Datascienceassignmenthelp #datascience #python #machinelearningassignmenthelp #Datasciencehomeworkhelp


2 views0 comments