Linear Regression
Regression analysis is a form of predictive modelling technique which investigates the relationship between a dependent and independent variable.
Mathematical Part,
Y = aX + b
Y – random variable (response,
dependent)
X – random variable (predictor,
independent)
a, b -
regression coefficients, that are to be learned
a = Si=1:s (xi –
x¢) (yi-
y¢) / Si=1:s (xi – x¢)2 ,
b = y¢ - ax¢,
where
,
x¢ = average of x1, x2,
… , xs ,
y¢ = average of y1, y2, …
, ys ,
given
sample data points (x1, y1), (x2, y2), …, (xs, ys).
Dataset: The dataset we use is iris.data
Source Code:
- # -*- coding: utf-8 -*-
- """
- Created on Mon Sep 30 03:12:40 2019
- @author: nowshad
- """
- import numpy as np
- import matplotlib.pyplot as plt
- import pandas as pd
- dataset = pd.read_csv('headbrain.csv')
- print(dataset.shape)
- print(dataset.head(5))
- X = dataset['Head Size(cm^3)'].values
- Y = dataset['Brain Weight(grams)'].values
- mean_x = np.mean(X)
- mean_y= np.mean(Y)
- n=len(X)
- numer=0
- denom=0
- for i in range(n):
- numer += (X[i]-mean_x)*(Y[i]-mean_y)
- denom += (X[i]-mean_x)**2
- m= numer/denom
- c= mean_y - (m*mean_x)
- print("m= ",m)
- print("c= ",c)
- max_x = np.max(X)+1
- min_x= np.min(X)-1
- x=np.linspace(min_x,max_x,5)
- y=m*x+c
- plt.plot(x,y,color='green',label='Regression Line')
- plt.scatter(X,Y,color='red',label='Scatter Plot')
- plt.xlabel('Head Size')
- plt.ylabel('Brain Weight')
- plt.legend()
- plt.show()
- err_nm=0
- err_dn=0
- for i in range(n):
- y_predict=m*X[i]+c
- err_nm += (Y[i]-y_predict) **2
- err_dn += (Y[i]-mean_y)**2
- r2= (1-(err_nm/err_dn))*100
- print("\nR2 = ",r2,"%")
Comments
Post a Comment