# Implementing Linear Regression and Multiple Linear Regression Using Python Machine Learning

#### Linear Regression

Linear regression is perhaps one of the most well known and well understood algorithms in statistics and machine learning.

Importing Related Libraries

```import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

%matplotlib inline```

First we investigate a simple case by fitting a linear regression for three data points. First we simulate the data:

```# simulating the data
x = np.c_[0, 1, 2, 1.5].T
y  = [1, 1.5, 3.1, 1.5]

print(x)
print(y)```

Output:

[[0. ] [1. ] [2. ] [1.5]] [1, 1.5, 3.1, 1.5]

```#plotting the data
fig, ax = plt.subplots(figsize=(5, 5), dpi=150)
ax.scatter(x, y, c='r')
ax.set_title('simulated data')
ax.set_xlabel('x')
ax.set_ylabel('y')```

Output: Now we fit the linear regression:

```from sklearn import linear_model

# instanciate the model
lr = linear_model.LinearRegression()

# fit the model
lr.fit(x, y)```
```print("Coefficients:", lr.coef_)
print("   Intercept:", lr.intercept_)
# print "    Residues:", lr.residues_```

output:

Coefficients: [0.89714286] Intercept: 0.7657142857142858

Let's plot the line to see how it estimates our data:

```yhat = lr.predict(x)

fig, ax = plt.subplots(figsize=(5, 3), dpi=150)
ax.scatter(x, y, c='r')
ax.plot(x, yhat)

ax.set_title('simulated data and the estimated line')
ax.set_xlabel('x')
ax.set_ylabel('y')```

output: We can use the method predict() to predict y for a new x

```x_test = np.c_[4, 2.3].T
y_test = lr.predict(x_test)

print(x_test.T)
print(y_test)```

Output:

[[4. 2.3]] [4.35428571 2.82914286]

#### Multiple Linear Regression

Multiple linear regression attempts to model the relationship between two or more explanatory variables and a response variable by fitting a linear equation to observed data. Every value of the independent variable x is associated with a value of the dependent variable y. For example if we have two explanatory variables (attributes, features), our data has such a form:

Output: ```# simulate the data
x = np.c_[[0, 0], [0, 1], [1, 1], [1, 0]].T
y = [1.5, 3.2, 4, 2]

print(x)
print(y)```

output:

[[0 0] [0 1] [1 1] [1 0]] [1.5, 3.2, 4, 2]

```mlr = linear_model.LinearRegression(fit_intercept=True)
mlr.fit(x,y)

```

```print(mlr.coef_)
print(mlr.intercept_)```

Output:

[0.65 1.85] 1.425

`print(mlr.predict(x))`

Output:

[1.425 3.275 3.925 2.075]

Regression for median house prices

We are going to use the package pandas for reading and storing the data.

```wget.download('https://github.com/tuliplab/mds/raw/master/Jupyter/data/housing_300.csv')

`data.head()`

output: `data.describe()`

output: Plot the scatter plot of the number of rooms vs the median house prices.

```fig, ax = plt.subplots(figsize=(5, 5), dpi=150)
median_prices = data['MEDV']
avg_rooms = data['RM']
scales = 50*np.ones(len(median_prices))
ax.scatter(avg_rooms, median_prices, color='b',s=scales, alpha=0.7, edgecolor='r')
plt.xlabel('\$X\$ (number of rooms)')
plt.ylabel('\$Y\$ (median house prices)')

```

Output: ```print(avg_rooms.shape)
print(median_prices.shape)```

Output:

(300,) (300,)

How correlated are the number of rooms and the price of the house?

`np.corrcoef(avg_rooms, median_prices)`

Output:

array([[1. , 0.89804265], [0.89804265, 1. ]])

Now we want to fit a linear regression mode on the data.

```# prepare the data
x = np.c_[avg_rooms.values]
y = median_prices.tolist()```
```from sklearn import linear_model
lr = linear_model.LinearRegression()```
`lr.fit(x,y)`
```print(lr.coef_)
print(lr.intercept_)
# print lr.residues_```

Output:

[11.30440747] -47.09339739688137

```# obtain the model parameters
print(lr.coef_, lr.intercept_)```

output:

[11.30440747] -47.09339739688137

```# predict
yhat = lr.predict(x)
print(x[:10])
print(yhat[:10])

```

output

[[6.575] [6.421] [7.185] [6.998] [7.147] [6.43 ] [6.012] [6.172] [5.631] [6.004]] [27.23308169 25.49220294 34.12877024 32.01484605 33.69920276 25.59394261 20.86870028 22.67740548 16.56172104 20.77826503]

