Kernel Ridge Regression Using C# -- Visual Studio Magazine

Kernel Ridge Regression Using C#

08/15/2023

The model is trained like so:

Console.WriteLine("Training model ");
krr.Train(trainX, trainY);
Console.WriteLine("Done ");
// Console.WriteLine("Model weights: ");
// Utils.VecShow(krr.wts, 4, 9, true);

Behind the scenes, the Train() method uses the algorithm explained in the previous section. Recall that training produces one weight for each training item, so the demo data would generate 40 weights. After the model has been trained, it is evaluated:

Console.WriteLine("Computing model accuracy" +
  " (within 0.10) ");
double trainAcc = Accuracy(krr, trainX, trainY, 0.10);
double testAcc = Accuracy(krr, testX, testY, 0.10);
Console.WriteLine("Train acc = " +
  trainAcc.ToString("F4"));
Console.WriteLine("Test acc = " +
  testAcc.ToString("F4"));

Because there's no inherent definition of regression accuracy, it's necessary to implement a program-defined function. How close a predicted value must be to the true value in order to be considered a correct prediction, will vary from problem to problem.

The demo computes the root mean squared error of the model on the training and test data like so:

double trainRMSE = RootMSE(krr, trainX, trainY);
double testRMSE = RootMSE(krr, testX, testY);
Console.WriteLine("Train RMSE = " +
  trainRMSE.ToString("F4"));
Console.WriteLine("Test RMSE = " +
  testRMSE.ToString("F4"));

The demo concludes by using the trained model to predict for X = (0.5, -0.5, 0.5, -0.5, 0.5) using these statements:

Console.WriteLine("Predicting for x = " +
  "[0.5, -0.5, 0.5, -0.5, 0.5] ");
double[] x =
  new double[] { 0.5, -0.5, 0.5, -0.5, 0.5 };
double y = krr.Predict(x);
Console.WriteLine("Predicted y = " + y.ToString("F4"));

The Predict() method expects a single vector of predictor values. A common design alternative is to implement Predict() to accept multiple inputs as an array of vectors, which is equivalent to a matrix.

The Predict Method
The Predict() method is simple. It computes and returns the weighted sum of the input vector x times the RBF applied to x and each training item, as described previously.

public double Predict(double[] x)
{
  int N = this.trainX.Length;
  double sum = 0.0;
  for (int i = 0; i < N; ++i) {
    double[] xx = this.trainX[i];
    double k = this.Rbf(x, xx);
    sum += this.wts[i] * k;
  }
  return sum;
}

Instead of using an explicit for-loop, an alternative design is to use matrix operations:

  double[][] K = Utils.MatCreate(N, 1); // K(x, X')
  for (int i = 0; i < N; ++i)
    K[i][0] = this.Rbf(x, this.trainX[i]);
  double[][] y = Utils.MatProduct(
    Utils.VecToMat(this.wts, 1, N), K); // w * K
  return y[0][0];

The matrix operations approach matches the underlying math equations but is a bit messier in my opinion.

Finding Good KRR Parameters
The key to a good KRR model is finding good values for the RBF gamma parameter and the alpha regularization parameter. The demo used a simple grid search. Expressed in pseudo-code:

set gammas = [0.1, 0.2, 0.3, 0.5, 1.0, 1.5, 2.0]
set alphas = [0.0001, 0.001, 0.01, 0.05, 0.1, 0.5]
for-each g in gammas
  for-each a in alphas
    create KRR(g, a) and train model
    compute accuracy on train and test data
    compute RMSE on train and test data
    display g, a, accuracies, RMSEs
  end-alphas
end-gammas

The idea is to find values of gamma and alpha that balance training and test accuracy (a very coarse metric of quality) and RMSE (a very fine-grained metric of quality). Instead of using a grid search, it is possible to programmatically compute values of gamma and alpha that minimize some program-defined loss function. This approach is a lot of work -- roughly equivalent to the entire KRR system itself -- and doesn't always give good results.

Kernel ridge regression computes the inverse of a matrix that has size n x n where n is the number of training data items. Therefore, KRR doesn't scale well to very large datasets. In such situations, it's still possible to use KRR by estimating the model weights using stochastic gradient descent.

The demo program uses a standard general-purpose algorithm called Crout's decomposition to invert the K(X', X') matrix. Because K(X', X') is a square, symmetric, positive definite matrix, it's possible to use a specialized, slightly more efficient algorithm called Cholesky decomposition. However, the difference of a few milliseconds isn't usually important in most machine learning scenarios.

Wrapping Up
The kernel ridge regression system presented in this article uses a hard-coded radial basis function as the kernel function. There are about a dozen other common kernel functions such as polynomial kernel, Laplacian and sigmoid. You can experiment with these kernel functions but there are no good rules of thumb for picking a function based on the nature of your data.

The demo program uses predictor variables that are strictly numeric. KRR can handle categorical data by using one-hot encoding. For example, if you had a predictor variable color with possible values red, blue, green, you could encode red = (1, 0, 0), blue = (0, 1, 0), green = (0, 0, 1).

Kernel ridge regression is a relatively common machine learning technique so it's sometimes called just kernel regression. However, there is an old, relatively rare, simpler, related technique called just kernel regression, and so the two techniques are sometimes confused.

A technique that is very closely related to kernel ridge regression is Gaussian process regression (GPR). GPR starts with what appears to be different math assumptions than KRR, but GPR ends up with the same prediction equation as KRR. However, GPR has more restrictive assumptions which allow GPR to produce a predicted y value and also a standard deviation measure of confidence of the prediction.

Get Code Download

About the Author

Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products including Azure and Bing. James can be reached at [email protected].

Printable Format

comments powered by Disqus

Featured

VS Code Insiders Get Copilot Chat AI-Enhanced Extensions in Latest Update

Developers using the "Insiders" build of the latest Visual Studio Code update, version 1.90, can now enjoy enhanced chat AI functionality in extensions.
Regression Using LightGBM

Dr. James McCaffrey of Microsoft Research presents a full-code, step-by-step tutorial on this powerful machine learning technique used to predict a single numeric value.
Microsoft Teases Discounts for August Dev Conference at Redmond HQ

Microsoft is offering a special discount for Visual Studio Professional and Enterprise subscribers wishing to attend a developer conference being held in August at the company's Redmond, Wash., headquarters.
Building Planet-Scale .NET Apps with Azure Cosmos DB

Azure Cosmos DB is a fully managed distributed database that can be transparently replicated across regions while remaining highly performant and seamlessly scaling according to needs, making it great for applications of any scale.
Java Devs in VS Code Can Now Ask Copilot for Syntax Rewrites

Count among the many emerging abilities of GitHub Copilot new functionality for rewriting your Java syntax in Visual Studio Code.