K Nearest Neighbors

K Nearest Neighbors is a simple algorithm that stores all the available cases and classifies the new data or case based on a similarity measure.
K= Number of Nearest Neighbors

How its works

Let’s assume we have 2 types of data class A and class B. Now we got another new data the star now we have to predict which class it belongs. Now take k=3, so we take 3 nearest neighbours of star. If no of data of class A is more than B then it belongs to class A, otherwise B. So simple.

Dataset: The dataset we use is headbrain.csv
From https://www.kaggle.com/saarthaksangam/headbrain

Source Code: 


  1. # -*- coding: utf-8 -*-  
  2. """ 
  3. Created on Mon Sep 30 03:12:40 2019 
  4.  
  5. @author: nowshad 
  6. """  
  7.   
  8. import csv  
  9. import random  
  10. import math  
  11. import operator  
  12.   
  13. def loadDataset(filename, split,trainingSet=[],testSet=[]):  
  14.     with open(filename, 'r') as csvfile:  
  15.         lines=csv.reader(csvfile)  
  16.         dataset=list(lines)  
  17.         for x in range(len(dataset)-1):  
  18.             for y in range(4):  
  19.                 dataset[x][y]=float(dataset[x][y])  
  20.             if random.random()<split:  
  21.                 trainingSet.append(dataset[x])  
  22.             else:  
  23.                 testSet.append(dataset[x])  
  24.   
  25. def euclideanDistance(instance1,instance2, length):  
  26.     distance=0  
  27.     for x in range(length):  
  28.         distance +=pow((instance1[x]-instance2[x]),2)  
  29.     return math.sqrt(distance)  
  30.   
  31. def getNeighbours(trainingSet, testInstance, k):  
  32.     distance=[]  
  33.     length=len(testInstance)-1  
  34.     for x in range(len(trainingSet)):  
  35.         dist=euclideanDistance(testInstance,trainingSet[x], length)  
  36.         distance.append((trainingSet[x],dist))  
  37.     distance.sort(key=operator.itemgetter(1))  
  38.     neighbors=[]  
  39.     for x in range (k):  
  40.         neighbors.append(distance[x][0])  
  41.     return neighbors  
  42.   
  43. def getResponse(neighbors):  
  44.     classVotes={}  
  45.     for x in range(len(neighbors)):  
  46.         response=neighbors[x][-1]  
  47.         if response in classVotes:  
  48.             classVotes[response]+=1  
  49.         else:  
  50.             classVotes[response]=1  
  51.     sortedVotes=sorted(classVotes.items(), key=operator.itemgetter(1), reverse=True)  
  52.     return sortedVotes[0][0]  
  53.   
  54. def getAccuracy(testSet, predictions):  
  55.     correct=0  
  56.     for x in range(len(testSet)):  
  57.         if testSet[x][-1] == predictions[x]:  
  58.             correct+=1  
  59.     return (correct/float(len(testSet)))*100.0  
  60.   
  61. #main  
  62. with open(r'G:\AUST4.1\AILab\lab5\Assignment5\iris.data')as csvfile:  
  63.     lines=csv.reader(csvfile)  
  64. trainingSet=[]  
  65. testSet=[]  
  66. split=0.67  
  67. loadDataset('iris.data',split,trainingSet,testSet)  
  68. predictions=[]  
  69. k=3  
  70. for x in range(len(testSet)):  
  71.     neighbors=getNeighbours(trainingSet,testSet[x],k)  
  72.     result=getResponse(neighbors)  
  73.     predictions.append(result)  
  74.     #print('Predicted=',result,', actual=',testSet[x][-1])  
  75. accuracy=getAccuracy(testSet,predictions)  
  76. print('Accuracy= ',accuracy,'%')  

Output: 

The max accuracy we got is 98.14%. the Screen Shot given bellow. 

Reference

Comments

Popular Posts