stay K In the original algorithm of nearest neighbor , No, right K The nearest neighbor method is used to optimize , Or traverse the training set , Find the nearest to the input instance K Training examples , Count their categories , This is used as the judgment of input instance category . For specific model theory, see : Statistical learning method ——K Nearest neighbor method ( Original method )
In the process of algorithm implementation , The Euclidean distance is used to measure the distance between points . In data operations , There is no use numpy, But the use of Python The built-in list To calculate the data .
def knn(x,dataSet,labels,k):
distanceMemories = {
} # Use a dictionary to record distances
for i in range(len(dataSet)):
distance = euDis(x,dataSet[i])
distanceMemories[i] = distance
sortResult = sorted(distanceMemories.items(),key = lambda x:x[1])
distance_min_k = sortResult[:k]
classCount = {
} # Used to record before k The number of times each class appears in the
for i in range(len(distance_min_k)):
if labels[distance_min_k[i][0]] not in classCount:
classCount[labels[distance_min_k[i][0]]] = 0
classCount[labels[distance_min_k[i][0]]] += 1
result = sorted(classCount.items(),key = lambda x:x[1],reverse = True)
# For the statistical results , According to the values in the dictionary , Sort in descending order
return result[0][0]
def euDis(x,y): # Calculation of Euclidean distance
dim = len(x)
temp = 0
for i in range(dim):
temp += (x[i] - y[i]) ** 2
return temp ** 0.5
dataSet = [[3,104],[2,100],[1,81],[101,10],[99,5],[98,2]]
# This is a small example from the book machine learning
labels = [" Love story "," Love story "," Love story "," Action movies "," Action movies "," Action movies "]
print(knn([18,90],dataSet,labels,3))
# Output results : Love story