The Data Science Lab
Regression Using scikit Kernel Ridge Regression
# KernelRidge(alpha=1.0, *, kernel='linear', gamma=None,
# degree=3, coef0=1, kernel_params=None)
# ['additive_chi2', 'chi2', 'linear', 'poly', 'polynomial',
# 'rbf', 'laplacian', 'sigmoid', 'cosine']
When working with scikit, you'll spend most of your time reading the documentation and trying to figure out what each model parameter does. The key to a successful kernel ridge regression model is understanding kernel functions. The demo program uses the radial basis function (RBF) kernel with a gamma value of 1.0.
Basic linear regression can fit data that lies on a straight line (or hyperplane when there are two or more predictors). The "kernel" part of kernel ridge regression means that KRR uses a behind-the-scenes kernel function that transforms the raw data so that non-linear data can be handled. The "ridge" part of kernel ridge regression means that KRR adds noise to the internal transformed data so that model overfitting is discouraged. The alpha parameter, set to 0.1 in the demo, controls the ridge. Larger values of alpha discourage overfitting more than smaller values. Finding a good value for alpha is a matter of trial and error. As a general rule of thumb, the more training data you have, the smaller alpha should be -- but there are many exceptions.
The scikit KernelRidge module supports nine built-in kernel functions. Based on my experience, the two most commonly used are the radial basis function and the polynomial function. Each kernel function has one or more parameters. For example, the polynomial kernel function needs values for the degree and coef0 parameters. The radial basis function needs a value for the gamma parameter.
Experimenting with different kernel functions and their parameters is a matter of trial and error guided by experience. This is one reason that machine learning is sometimes said to be part art and part science.
Evaluating the Trained Model
The demo computes the accuracy of the trained model like so:
# 3. compute model accuracy
print("Computing accuracy (within 0.10 of true) ")
acc_train = accuracy(model, train_X, train_y, 0.10)
print("Accuracy on train data = %0.4f " % acc_train)
acc_test = accuracy(model, test_X, test_y, 0.10)
print("Accuracy on test data = %0.4f " % acc_test)
For most scikit classifiers, the built-in score() function computes a simple accuracy which is just the number of correct predictions divided by the total number of predictions. But for a KernelRidge regression model you must define what a correct prediction is and write a program-defined custom accuracy function. The accuracy() function in Listing 1 defines a correct income prediction as one that is within a specified percentage of the true target income value. The key statement is:
if np.abs(pred - y) < np.abs(pct_close * y):
n_correct += 1
else:
n_wrong += 1
The pred variable is the predicted income and the y is the known correct target income. In words, "if the difference between predicted and actual income is less than x percent of the actual income, then the prediction is correct."
Using and Saving the Trained Model
The demo uses the trained model like so:
# 4. make a prediction
print("Predicting income for M 34 Oklahoma moderate: ")
X = np.array([[0, 0.34, 0,0,1, 0,1,0]],
dtype=np.float32)
pred_inc = model.predict(X)
print("$%0.2f" % (pred_inc * 100_000)) # un-normalized
Because the KRR model was trained using normalized and encoded data, the X-input must be normalized and encoded in the same way. Notice the double square brackets on the X-input. The predict() method expects a matrix rather than a vector. The predicted income value is normalized so the demo multiplies the prediction by 100,000 to de-normalize and make the result easier to read.
In most situations you will want to save the trained regression model so that it can be used by other programs. There are several ways to save a scikit model but using the pickle library ("pickle" means to preserve in ordinary English) is the simplest and most common. The demo code is:
# 5. save model
print("Saving model ")
fn = ".\\Models\\krr_model.pkl"
with open(fn,'wb') as f:
pickle.dump(model, f)
This code assumes there is a subdirectory named Models. The saved model can be loaded into memory and used like this:
X = np.array([[0, 0.34, 0,0,1, 0,1,0]], dtype=np.float64)
with open(path, 'rb') as f:
loaded_model = pickle.load(f)
inc = loaded_model.predict(X)
A pickle file uses a proprietary binary format. An alternative is to write a custom save() function that saves model information as plain text. This is useful if a trained model is going to be consumed by a non-Python program.
Wrapping Up
The scikit KernelRidge module creates models that are often surprisingly effective, especially with small datasets. A technique that is closely related to, but is definitely different from kernel ridge regression, is called just kernel regression. The use of plain kernel regression is quite rare so the term "kernel regression" is often used to refer to kernel ridge regression.
Another technique that is very closely related to KRR is Gaussian process regression (GPR). KRR and GPR have different mathematical assumptions but, surprisingly, end up with the same prediction equation. The GPR assumptions are more restrictive and so GPR can generate a metric (standard deviation / variance) of prediction uncertainty.
About the Author
Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products including Azure and Bing. James can be reached at [email protected].