Hurry Up! And Get Flat 10% Discount On Your First Order                                   We are available by 24/7 for your support!

RealCode4You

+91 82 67 81 38 69

realcode4you@gmail.com

Order Now
  • Home
  • Services
    • Python Expert Help
    • Java Expert Help
    • Data Science Assignment Help
    • Machine Learning Expert Help
    • C Programming Assignment Help
    • Database Assignment Help
    • PHP Assignment Help
    • SQL Assignment Help
    • Data Structure Assignment Help
    • .Net Assignment Help
    • Android Assignment Help
    • Python Project Help
    • Python Programming Help
    • Python Homework Help
    • JavaFx Assignment Help
    • Spring Boot Assignment Help
    • JSP Servlet Assignment Help
    • Computer Science Assignment Help
  • Web Expert Help
    • Python Web Expert Help
    • Java Web Expert Help
    • PHP Web Expert Help
    • ASP .Net Web Expert Help
    • HTML Project Help
    • Java Script Project Help
    • Python GUI Assignment Help
    • Python Django Assignment Help
    • Python Flask Assignemnt Help
    • Python Tkinter Assignment Help
    • NodeJs Assignment Help
    • Angular Assignment Help
    • React Native Assignment Help
    • ReactJS Assignment Help
  • Machine Learning
    • OpenCV Assignment Help
    • Machine Learning Assignment Help India
    • Deep Learning Assignment Help
    • R Programming Assignment Help
    • Data Mining Assignment Help
    • Matlab Assignment Help
    • Tensorflow Assignment Help
    • Keras Assignment Help
    • Data Visualization Assignment Help
    • Computer Vision Assignment Help
    • Image Processing Assignment Help
    • Data Analysis Assignment Help
    • Big Data Assignment Help
    • Map Reduce Assignment Help
    • PySpark Assignment Help
    • Tableau Assignment Help
    • Power BI Assignment Help
    • NLP Assignment Help
    • Research Paper Assignment Help
    • D3.js Assignment Help
  • Database
    • MySQL Assignment Help
    • SQL Server Assignment Help
    • PostgreSQL Assignment Help
    • MongoDB Assignment Help
    • Oracle Assignment Help
    • MS Access Assignment Help
    • DBMS Assignment Help
    • Neo4j Assignment Help
    • Excel Assignment Help
  • Other Services
    • Mathematics Assignment Help
    • Abstract Algebra Assignment Help
    • Calculus Assignment Help
    • Linear Algebra Assignment Help
    • Discrete Mathematics Assignment Help
    • Trigonometry Assignment help
    • Set Theory Assignment Help
    • Mathematics Homework Help
    • Boolean Algebra Assignment Help
    • Case Study Assignment Help
    • Essay Writing Assignment Help
    • Report Writing Assignment Help
  • Blog
  • Payment
To see this working, head to your live site.
  • Categories
  • All Posts
  • My Posts
realcode4you
May 29, 2020

What is data Wrangling? Data Wrangling In Machine Learning | Machine Learning Homework Help | Realcode4you

in Machine Learning Tutorial

Data Wrangling is the process of converting data from the initial format to a format that may be readable and better for analysis.


Here we use the below data set :


https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data


Import pandas

Open Jupyter notebook or any online jupyter notebook editor and import pandas-


import pandas as pd
import matplotlib.pylab as plt

Want to add a caption to this image? Click the Settings icon.

Reading the data and add header


filename = "https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DA0101EN/auto.csv"


headers = ["symboling","normalized-losses","make","fuel-type","aspiration", "num-of-doors","body-style", "drive-wheels","engine-location","wheel-base",

"length","width","height","curb-weight","engine-type", "num-of-cylinders", "engine-size","fuel

-system","bore","stroke","compression-ratio","horsepower", "peak-rpm","city-mpg","highway-mpg","price"]


Want to add a caption to this image? Click the Settings icon.


Read CSV

df = pd.read_csv(filename, names = headers) 

Show data in tabular form

df.head()

Data display in tabular form and you will face some challenges like this-

  • identify missing data

  • deal with missing data

  • correct data format


Identify and handle missing values

Identify missing values


Convert "?" to NaN

Missing data comes with the question mark "?". We replace "?" with NaN (Not a Number)


Example:

import numpy as np
# replace "?" to NaN
df.replace("?", np.nan, inplace = True)
df.head(5)

It set NaN at first five index row where "?" is presented.


How to detect missing data:

There are two method used to detect missing data.

  • .isnull() - Return true at the place of missing data and other place return false.

  • .notnull() - Return true at the placed data and false at missing data place.


Example:

mis_value = df.isnull()
mis_value.head(5)

Count missing value -In columns

Using for loop:


Example:

Write this for loop and find result

for column in mis_value .columns.values.tolist():
print(column)
print (mis_value [column].value_counts())
print("")


How we will work with missing data

Drop data

  • drop the whole row- Let suppose any value is necessary like price but it is missing at any row then we remove whole row.

  • drop the whole column - let we suppose if price is missing at any column then it reason of delete whole column because price is necessary for data science to calculate price.


Replace data

  • replace it by mean

  • replace it by frequency - replace as per frequency for example- 84 % is good, and 16% bad, then 16% remove by good.

  • replace it based on other functions


Calculate the average of any column

Example

  • avg= df["column name"].astype("float").mean(axis=0)

  • print("Average of column name:", avg)


Replace "NaN" by mean value - of any column

Example

df["column_name"].replace(np.nan, avg, inplace=True)


Calculate the mean value - of any column

Example

avg=df['column_name'].astype('float').mean(axis=0)
print("Average of column_name:", avg)

Replace NaN by mean value

Example

df["column_name"].replace(np.nan, avg, inplace=True)

How count each column data separately

Use value_counts() function