Weighted k-Nearest Neighbors Regression Using C# -- Visual Studio Magazine

Weighted k-Nearest Neighbors Regression Using C#

09/05/2023

Console.WriteLine("Explaining for " +
  "x = (0.5, -0.5, 0.5, -0.5, 0.5) ");
double[] x = 
  new double[] { 0.5, -0.5, 0.5, -0.5, 0.5 };
knnr_model.Explain(x);
      
Console.WriteLine("End demo ");
Console.ReadLine();

The Explain() method computes a predicted y value but also prints messages that show how the prediction is made. In essence Explain() is a verbose Predict() method and so an alternative design is to combine the Predict() and Explain() methods.

The Predict Method
The implementation of the Predict() method begins with:

public double Predict(double[] x)
{
  if (this.trainX == null)
    Console.WriteLine("Error: Store() not yet called ");

  // 0. set up ordering/indices
  int n = this.trainX.Length;
  int[] indices = new int[n];
  for (int i = 0; i < n; ++i)
    indices[i] = i;
. . .

The purpose of the indices array will become clear in a moment. Next, the distances from the input vector x to all training data items are computed and then sorted from closest to farthest:

  double[] distances = new double[n];
  for (int i = 0; i < n; ++i)
    distances[i] = EucDistance(x, this.trainX[i]);
  Array.Sort(distances, indices);

The C# language Array.Sort() method physically sorts its first argument, the distances, and also sorts in parallel the second argument, the indices of each training item. It would be possible to sort the distances, and the training x data, and the target y data in parallel, but that approach is a lot of unnecessary effort.

The weights are computed by two statements:

  double[]? wts = null;
  if (this.weighting == "uniform")
    wts = UniformWts(this.k);
  else if (this.weighting == "skewed")
    wts = SkewedWts(this.k);

When called, the Predict() method expects an explicit string of "uniform" or "skewed." Using magic strings is often a bad design choice, but in this case the technique is simpler than using an enumeration in my opinion.

The Predict() method finishes by computing the weighted sum of the k-nearest target y values:

. . .
  double result = 0.0;
  for (int i = 0; i < this.k; ++i)
    result += wts[i] * this.trainY[indices[i]];

  return result;
} // Predict

Computing Skewed Weights
The demo program computes skewed weights using a micro-algorithm I devised a few years ago. It's best explained by example. Suppose k = 5, as in the demo. A uniform weight scheme would return 1/5 = 0.20 for each of the five weight values. The demo program gives more weight to the closest training item by multiplying a uniform weight by 1.5, and gives less weight to the farthest item by multiplying by 0.5. So weight[0] = 0.20 * 1.5 = 0.30 and weight[4] = 0.20 * 0.5 = 0.10.

Because the weights must sum to 1.0, this leaves 1 - (0.30 + 0.10) = 0.60 to be shared by the remaining three weights. Therefore weight[1] = weight[2] = weight[3] = 0.60 / 3 = 0.20.

This skewed weighting technique doesn't work for k = 1 or 2 or 3 and so skewed weights are explicitly returned:

if (k == 1) {
  result[0] = 1.0;
}
else if (k == 2) {
  result[0] = 0.60; result[1] = 0.40;
}
else if (k == 3) {
  result[0] = 0.40; result[1] = 0.35; result[2] = 0.25;
}

To be sure, these weight values are arbitrary to some extent, but they have worked well for me in practice and in a series of experiments with synthetic data. A perfectly good alternative design is to use uniform weighting for small values of k and use skewed weighting for k = 4 or larger.

In some machine learning libraries, notably scikit-learn, the weighted k-nearest neighbors regression module allows you to apply a mini-algorithm called inverse distance. Based on my experience, this weighting technique usually works poorly because when a distance is very small, the inverse distance is huge and the resulting weight overwhelms all other weights.

Wrapping Up
Weighted k-nearest neighbors regression is arguably the simplest technique for regression. The main strengths of weighted KNNR are simplicity, interpretability and ability to scale to large (but not huge) datasets. The main weaknesses of KNNR are that it works with only strictly numeric data, and sensitivity to the value of the k parameter.

In practice, weighted KNNR is often less accurate than more sophisticated techniques such as kernel ridge regression and neural network regression. But sometimes weighted KNNR works better than more sophisticated techniques. A common approach is to apply weighted KNNR and one or more other regression techniques to see if they all give consistent results.

Most machine learning libraries have a k-nearest neighbors regression module. But implementing weighted KNNR from scratch allows you to fully customize your code, easily integrate your prediction model with other systems, and gives you a complete understanding of how results are computed.

Get Code Download

About the Author

Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products including Azure and Bing. James can be reached at [email protected].

Printable Format

comments powered by Disqus

Featured

VS Code Insiders Get Copilot Chat AI-Enhanced Extensions in Latest Update

Developers using the "Insiders" build of the latest Visual Studio Code update, version 1.90, can now enjoy enhanced chat AI functionality in extensions.
Regression Using LightGBM

Dr. James McCaffrey of Microsoft Research presents a full-code, step-by-step tutorial on this powerful machine learning technique used to predict a single numeric value.
Microsoft Teases Discounts for August Dev Conference at Redmond HQ

Microsoft is offering a special discount for Visual Studio Professional and Enterprise subscribers wishing to attend a developer conference being held in August at the company's Redmond, Wash., headquarters.
Building Planet-Scale .NET Apps with Azure Cosmos DB

Azure Cosmos DB is a fully managed distributed database that can be transparently replicated across regions while remaining highly performant and seamlessly scaling according to needs, making it great for applications of any scale.
Java Devs in VS Code Can Now Ask Copilot for Syntax Rewrites

Count among the many emerging abilities of GitHub Copilot new functionality for rewriting your Java syntax in Visual Studio Code.