K-Nearest Neighbors Algorithm (k-NN)
- NOTE - This is an incomplete stub article that I’m displaying for now just in case it is somehow useful in its current state.
Read these:
- source 1
- source 2
- source 3
- source 4
- source 5
- source 6
- source 7
- source 8
- source 9
- source 10
- source 11
- source 12
- source 13
- source 14
- source 15
- source 16
- source 17
-
no actual training step
-
“sensitive to the local structure of the data”
-
input consists of the k closest training examples in the feature space
- output:
- k-NN classification - class membership - most common class from k nearest neighbors
- k is a small positive integer
- k-NN regression - value - average of the values of K nearest neighbors
- k-NN classification - class membership - most common class from k nearest neighbors
- neighbors can be weighted
- closer neighbors have a higher weight
- accounts for a skew based on large number ….
- weight = 1/d
-
d = distance to the neighbor
- neighbors - taken from the set of objects where the class or value is known ( training set )
- don’t use everydata point if you have a large dataset, this is too computationally expensive.
- the Nearest neighbor search (NNS) algorithm can be used to select neighbors for large data sets
- feature space -
- feature vectors -
k - user-defined constant
- larger values
- reduce effect of noise
- class boundaries are less distinct
- query / test point - unlabeled vector - classified k training examples
Distance Metric:
-
multiple ways to find the distance membership
- continuous variables - could take any value between points
-
discrete variables - only specific points
- common distance metric options:
- Use Euclidean distance for continuous variables
-
1D: dist = a-b - 2D: dist = sqrt((a1 - b1)^2+(a2 - b2)62) first point: a1,a2 second point: b1,b2 Pythagorean theorem: a^2+b^2=c^2
-
- Use overlap metric (or Hamming distance) for discrete variables
- Use Euclidean distance for continuous variables
To improve accuracy, find the distance metric with one of these algorithms:
- Large Margin Nearest Neighbor
-
Neighbourhood components analysis.
- metric (distance function) - a function that finds the distance between two elements in a set
- metric space - the set
- pseudo-metric -
Metric learning
-
supervised metric learning can improve performance
- Feature extraction - “Transforming the input data into the set of features is called feature extraction”
-
remove redundant data
- Dimension reduction - …..
- Decision boundary - ….
- Data reduction - ….
=======================
-
OpenCV - real-time computer vision library
-
face recognition example:
- Haar face detection
- Mean-shift tracking analysis
- PCA or Fisher LDA projection into feature space, followed by k-NN classification