 # K-Nearest Neighbors Algorithm (k-NN)

• NOTE - This is an incomplete stub article that I’m displaying for now just in case it is somehow useful in its current state.

• source 1
• source 2
• source 3
• source 4
• source 5
• source 6
• source 7
• source 8
• source 9
• source 10
• source 11
• source 12
• source 13
• source 14
• source 15
• source 16
• source 17
• source 18

• no actual training step

• “sensitive to the local structure of the data”

• input consists of the k closest training examples in the feature space

• output:
• k-NN classification - class membership - most common class from k nearest neighbors
• k is a small positive integer
• k-NN regression - value - average of the values of K nearest neighbors
• neighbors can be weighted
• closer neighbors have a higher weight
• accounts for a skew based on large number ….
• weight = 1/d
• d = distance to the neighbor

• neighbors - taken from the set of objects where the class or value is known ( training set )
• don’t use everydata point if you have a large dataset, this is too computationally expensive.
• the Nearest neighbor search (NNS) algorithm can be used to select neighbors for large data sets
• feature space -
• feature vectors -

k - user-defined constant

• larger values
• reduce effect of noise
• class boundaries are less distinct
• query / test point - unlabeled vector - classified k training examples

## Distance Metric:

• multiple ways to find the distance membership

• continuous variables - could take any value between points
• discrete variables - only specific points

• common distance metric options:
• Use Euclidean distance for continuous variables
•  1D: dist = a-b
• 2D: dist = sqrt((a1 - b1)^2+(a2 - b2)62) first point: a1,a2 second point: b1,b2 Pythagorean theorem: a^2+b^2=c^2
• Use overlap metric (or Hamming distance) for discrete variables

To improve accuracy, find the distance metric with one of these algorithms:

• Large Margin Nearest Neighbor
• Neighbourhood components analysis.

• metric (distance function) - a function that finds the distance between two elements in a set
• metric space - the set
• pseudo-metric -

Metric learning

• supervised metric learning can improve performance

• Feature extraction - “Transforming the input data into the set of features is called feature extraction”
• remove redundant data

• Dimension reduction - …..
• Decision boundary - ….
• Data reduction - ….

=======================

• OpenCV - real-time computer vision library

• face recognition example:

• Haar face detection
• Mean-shift tracking analysis
• PCA or Fisher LDA projection into feature space, followed by k-NN classification