top of page

What is Pearson's-r Correlation coefficient In Machine Learning ? - Data Dependency

realcode4you

Data Dependency


Pearson's-r Correlation coefficient

Import Libraries

import numpy as np
import csv
import matplotlib.pyplot as plt
import scipy.stats
import pandas as pd

%matplotlib inline

Install wget to download dataset from github


!pip install wget

import wget

link_to_data = 'https://github.com/tuliplab/mds/raw/master/Jupyter/data/Auto.csv'
DataSet = wget.download(link_to_data)

Read Dataset


data = pd.read_csv('Auto.csv')

Selecting Record


data.head()


Describe Dataset


data.describe()


Selecting Two features

miles = data['miles']
weights = data['Weight']
print miles[:10]
print weights[:10]

pearson_r = np.cov(miles, weights)[0, 1] / (miles.std() * weights.std())
print pearson_r


Finding correlation Coefficient of each features

np.corrcoef(miles,weights)
horse = data['Horse power']
np.corrcoef(weights,horse)

Plot

# plotting
fig, ax = plt.subplots(figsize=(7, 5), dpi=300)
ax.scatter(weights,miles, alpha=0.6, edgecolor='none', s=100)
ax.set_xlabel('Car Weight (tons)')
ax.set_ylabel('Miles Per Gallon')

line_coef = np.polyfit(weights, miles, 1)
xx = np.arange(1, 5, 0.1)
yy = line_coef[0]*xx + line_coef[1]

ax.plot(xx, yy, 'r', lw=2)

Output:




Practice Exercise

  1. Find the Pearson's-r coefficient for two linearly dependent variables. Add some noise and see the effect of varying the noise.

  2. Simulate and visualize some data with positive linear correlation

  3. Simulate and visualize some data with negative linear correlation.


xx = np.arange(-5, 5, 0.1)
pp = 1.5  # level of noise
yy = xx + np.random.normal(0, pp, size=len(xx))


# visualize the data
fig, ax = plt.subplots()
ax.scatter(xx, yy, c='r', edgecolor='none')
ax.set_xlabel('X data')
ax.set_ylabel('Y data')

line_coef = np.polyfit(xx, yy, 1)
line_xx = np.arange(-5, 5, 0.1)
line_yy = line_coef[0]*line_xx + line_coef[1]

ax.plot(line_xx, line_yy, 'b', lw=2)

print scipy.stats.pearsonr(xx, yy)

Output


Pearson's r coefficient is limited to analyze the linear correlation between two variables. It is not capable to show the non-linear dependency. Investigate the Pearson's r coefficient between two variables that are correlated non-linearly.



# generate some data, first for X
xx = np.arange(-5, 5, 0.1)

# assume Y = 2Y + some perturbation
pp = 1.1  # level of noise
yy = xx**2 + np.random.normal(0, pp, size=len(xx))

# visualize the data
fig, ax = plt.subplots()
ax.scatter(xx, yy, c='r', edgecolor='b')
ax.set_xlabel('X data')
ax.set_ylabel('Y data')
ax.set_title('$Y = X^2+\epsilon$', size=16)

Output:


# generate some data, first for X
xx = np.arange(-5, 5, 0.1)

# assume Y = 2Y + some perturbation
pp = 1.1  # level of noise
yy = xx**2 + np.random.normal(0, pp, size=len(xx))

# visualize the data
fig, ax = plt.subplots()
ax.scatter(xx, yy, c='r', edgecolor='b')
ax.set_xlabel('X data')
ax.set_ylabel('Y data')
ax.set_title('$Y = X^2+\epsilon$', size=16)

Output:



The Pearson's-r correlation is near zero which means there is no linear correlation. But how about non-linear correlation? Isn't y=x2?


np.corrcoef(xx,yy)

Output:

array([[ 1.        , -0.04489687],
       [-0.04489687,  1.        ]])

Comments


REALCODE4YOU

Realcode4you is the one of the best website where you can get all computer science and mathematics related help, we are offering python project help, java project help, Machine learning project help, and other programming language help i.e., C, C++, Data Structure, PHP, ReactJs, NodeJs, React Native and also providing all databases related help.

Hire Us to get Instant help from realcode4you expert with an affordable price.

USEFUL LINKS

Discount

ADDRESS

Noida, Sector 63, India 201301

Follows Us!

  • Facebook
  • Twitter
  • Instagram
  • LinkedIn

OUR CLIENTS BELONGS TO

  • india
  • australia
  • canada
  • hong-kong
  • ireland
  • jordan
  • malaysia
  • new-zealand
  • oman
  • qatar
  • saudi-arabia
  • singapore
  • south-africa
  • uae
  • uk
  • usa

© 2023 IT Services provided by Realcode4you.com

bottom of page