Linear Regression

Regression analysis is a form of predictive modelling technique which investigates the relationship between a dependent and independent variable. 

Mathematical Part,

Y = aX + b
            Y – random variable (response, dependent)
            X – random variable (predictor, independent)
a, b - regression coefficients, that are to be learned

a = Si=1:s (xi – x¢) (yi- y¢) / Si=1:s  (xi – x¢)2
b = y¢ - ax¢,
where ,
x¢ = average of x1, x2, … , xs
y¢average of y1, y2, … , ys ,
given sample data points (x1, y1), (x2, y2),  …, (xs, ys).

Dataset: The dataset we use is iris.data  

Source Code: 

  1. # -*- coding: utf-8 -*-  
  2. """ 
  3. Created on Mon Sep 30 03:12:40 2019 
  4.  
  5. @author: nowshad 
  6. """  
  7. import numpy as np  
  8. import matplotlib.pyplot as plt  
  9. import pandas as pd  
  10.   
  11. dataset = pd.read_csv('headbrain.csv')  
  12. print(dataset.shape)   
  13. print(dataset.head(5))    
  14.   
  15. X = dataset['Head Size(cm^3)'].values  
  16. Y = dataset['Brain Weight(grams)'].values  
  17. mean_x = np.mean(X)  
  18. mean_y= np.mean(Y)  
  19.   
  20. n=len(X)  
  21.   
  22. numer=0  
  23. denom=0  
  24. for i in range(n):  
  25.     numer += (X[i]-mean_x)*(Y[i]-mean_y)  
  26.     denom += (X[i]-mean_x)**2  
  27.       
  28. m= numer/denom  
  29. c= mean_y - (m*mean_x)  
  30. print("m= ",m)  
  31. print("c= ",c)  
  32.   
  33. max_x = np.max(X)+1  
  34. min_x= np.min(X)-1  
  35. x=np.linspace(min_x,max_x,5)  
  36. y=m*x+c  
  37.   
  38. plt.plot(x,y,color='green',label='Regression Line')  
  39. plt.scatter(X,Y,color='red',label='Scatter Plot')  
  40. plt.xlabel('Head Size')  
  41. plt.ylabel('Brain Weight')  
  42. plt.legend()  
  43. plt.show()  
  44.   
  45. err_nm=0  
  46. err_dn=0  
  47. for i in range(n):  
  48.     y_predict=m*X[i]+c  
  49.     err_nm += (Y[i]-y_predict) **2  
  50.     err_dn += (Y[i]-mean_y)**2  
  51. r2= (1-(err_nm/err_dn))*100  
  52. print("\nR2 = ",r2,"%")  

Output

R square value is more than 0.50 and its good enough.

Reference

https://www.edureka.co/python

Comments

Popular Posts