realcode4you
- Dec 4, 2019
- 2 min read

Realcode4you Data Science Assignment Help | Data Science Homework Help

In this blog we will learn how to analyze amazon review and using train and test data:

---

The multilayer perceptron(MLP) has a large wide of classification and regression applications in many fields: pattern recognition, voice and classification problems. But the architecture choice has a great impact on the convergence of these networks. In the present paper we introduce a new approach to optimize the AMAZON REVIEW DATA, for solving the obtained model we use the genetic algorithm and we train the amazon review.

# Introduction

Here we will analyze positive and nagative review of amazon dataset and test the accuracy of train and test data.

---

# Part I - Data preparation

---

# Like importing, reading, cleaning and split, etc.

Data Source

#### http://jmcauley.ucsd.edu/data/amazon/index_2014.html

Importing Libraries:

###

import gzip

import itertools

import numpy as np

import pandas as pd

import datetime as dt

import matplotlib.pyplot as plt

from collections import Counter

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

from sklearn.model_selection import train_test_split, cross_val_score

from sklearn.metrics import accuracy_score, confusion_matrix, f1_score

%matplotlib inline

###

Reviews into Pandas DataFrame

Here we will first parse the data sets parse_gz() method using which is given in zip formate and then we will convert it into the dataframe by using convert_to_DataFrame() methods

Code for unzip file:

It used to unzip file and then convert it into the data-frame.

def parse_gz(file_path):

g = gzip.open(file_path, 'rb')

for l in g:

yield eval(l)

def convert_to_DataFrame(file_path):

i = 0

df = {}

for d in parse_gz(file_path):

df[i] = d

i += 1

return pd.DataFrame.from_dict(df, orient='index')

We are going to classify Amazon product reviews to understand the positive or negative review. Amazon has different rating(1-stars, 2-stars, etc), which is given in overall column. We will use that to compare our prediction.

Now we go the split data, if you need complete data with how to load and how to clean and prepare for fit into the model then contact us, so we can give an complete details at here or need any help related to machine learning and data science then also contact with us.

Split data:

x_train, x_test, y_train, y_test = train_test_split(sports_data.reviewText,sports_data.review_in_float, random_state=0)

How to use countvectorizer()

It used to change the data into the string to integer

cv = CountVectorizer()

X_traincv = cv.fit_transform(x_train)

X_testcv = cv.transform(x_test)

After this we are fit it into the model

Here we fit it into the MLP Classifier

## import mlp classifier libraries

from sklearn.preprocessing import StandardScaler

# Training the model

from sklearn.neural_network import MLPClassifier

from sklearn.metrics import classification_report,confusion_matrix

mlp = MLPClassifier()

mlp.fit(X_traincv,y_train)

# predict the target on the train dataset

pred_train = mlp.predict(X_traincv)

pred_train

# Accuray Score on train dataset

accur_train = accuracy_score(train_y,pred_train)

print('accuracy_score on train dataset : ', accur_train)

#confusion matrix to find to mark predicted value

cnf = confusion_matrix(test_y,predictions)

cnf

#result with score and accuracy

print(classification_report(test_y,predictions))

Other Services which is offered by <realcode4you>

<Realcode4you> Assignment Help

<Realcode4you> Web Assignment Help

#Datascienceassignmenthelp #datascience #python #machinelearningassignmenthelp #Datasciencehomeworkhelp

RealCode4You