top of page

Important Research Datasets For Recommender System To Analyse Amazon Review



Description

This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014.


This dataset includes reviews (ratings, text, helpfulness votes), product metadata (descriptions, category information, price, brand, and image features), and links (also viewed/also bought graphs).


Files

"Small" subsets for experimentation

If you're using this data for a class project (or similar) please consider using one of these smaller datasets below before requesting the larger files. To obtain the larger files you will need to contact me to obtain access.


K-cores (i.e., dense subsets): These data have been reduced to extract the k-core, such that each of the remaining users and items have k reviews each.

Ratings only: These datasets include no metadata or reviews, but only (user,item,rating,timestamp) tuples. Thus they are suitable for use with mymedialite (or similar) packages.


Books 5-core (8,898,041 reviews) ratings only (22,507,155 ratings) Electronics 5-core (1,689,188 reviews) ratings only (7,824,482 ratings) Movies and TV 5-core (1,697,533 reviews) ratings only (4,607,047 ratings)

CDs and Vinyl 5-core (1,097,592 reviews) ratings only (3,749,004 ratings)

Clothing, Shoes and Jewelry 5-core (278,677 reviews) ratings only (5,748,920 ratings)

Home and Kitchen 5-core (551,682 reviews) ratings only (4,253,926 ratings)

Kindle Store 5-core (982,619 reviews) ratings only (3,205,467 ratings)

Sports and Outdoors 5-core (296,337 reviews) ratings only (3,268,695 ratings)

Cell Phones and Accessories 5-core (194,439 reviews) ratings only (3,447,249 ratings)

Health and Personal Care 5-core (346,355 reviews) ratings only (2,982,326 ratings)

Toys and Games 5-core (167,597 reviews) ratings only (2,252,771 ratings)

Video Games 5-core (231,780 reviews) ratings only (1,324,753 ratings)

Tools and Home Improvement 5-core (134,476 reviews) ratings only (1,926,047 ratings)

Beauty 5-core (198,502 reviews) ratings only (2,023,070 ratings)

Apps for Android 5-core (752,937 reviews) ratings only (2,638,172 ratings)

Office Products 5-core (53,258 reviews) ratings only (1,243,186 ratings)

Pet Supplies 5-core (157,836 reviews) ratings only (1,235,316 ratings)

Automotive 5-core (20,473 reviews) ratings only (1,373,768 ratings)

Grocery and Gourmet Food 5-core (151,254 reviews) ratings only (1,297,156 ratings)

Patio, Lawn and Garden 5-core (13,272 reviews) ratings only (993,490 ratings)

Baby 5-core (160,792 reviews) ratings only (915,446 ratings)

Digital Music 5-core (64,706 reviews) ratings only (836,006 ratings)

Musical Instruments 5-core (10,261 reviews) ratings only (500,176 ratings)

Amazon Instant Video 5-core (37,126 reviews) ratings only (583,933 ratings)



Code to Read Data and Implement

Reading the data

Data can be treated as python dictionary objects. A simple script to read any of the above the data is as follows:


def parse(path):   
    g = gzip.open(path, 'r')   
    for l in g:     
        yield eval(l)

Convert to 'strict' json

The above data can be read with python 'eval', but is not strict json. If you'd like to use some language other than python, you can convert the data to strict json as follows:


import json 
import gzip  
def parse(path):   
	g = gzip.open(path, 'r')   
	for l in g:     
		yield json.dumps(eval(l))  
f = open("output.strict", 'w') 
for l in parse("reviews_Video_Games.json.gz"):   
	f.write(l + '\n')

Pandas data frame

This code reads the data into a pandas data frame:

import pandas as pd 
import gzip  
def parse(path):   
	g = gzip.open(path, 'rb')   
	for l in g:     yield eval(l)  
def getDF(path):   
	i = 0   
	df = {}   
	for d in parse(path):     
		df[i] = d     
		i += 1   
	return pd.DataFrame.from_dict(df, orient='index')  
df = getDF('reviews_Video_Games.json.gz')

Read image features

import array  
def readImageFeatures(path):   
	f = open(path, 'rb')   
	while True:     
		asin = f.read(10)     
		if asin == '': break     
		a = array.array('f')     
		a.fromfile(f, 4096)     
		yield asin, a.tolist()

Example: compute average rating

ratings = []  
for review in parse("reviews_Video_Games.json.gz"):   
	ratings.append(review['overall'])  
print sum(ratings) / len(ratings)



Get help in Big Data Assignments, Projects and Other Amazon Review Related Help with an affordable price.


Contact or Send Your Requirement Details at:


realcode4you@gmail.com
120 views0 comments

Comments


bottom of page