The Data Science Lab
Neural Network Regression from Scratch Using C#
The 40-item test data is loaded in the same way:
string testFile =
"..\\..\\..\\Data\\people_test.txt";
double[][] testX = Utils.MatLoad(testFile,
new int[] { 0, 1, 2, 3, 4, 6, 7, 8 }, ',', "#");
double[] testY =
Utils.MatToVec(Utils.MatLoad(testFile,
new int[] { 5 }, ',', "#"));
Next, the first three lines of predictor values and the corresponding target income values are displayed:
Console.WriteLine("First three X data: ");
for (int i = 0; i < 3; ++i)
Utils.VecShow(trainX[i], 2, 6, true);
Console.WriteLine("First three target Y: ");
for (int i = 0; i < 3; ++i)
Console.WriteLine(trainY[i].ToString("F5"));
In a non-demo scenario, you'd probably want to display all the training and test data to make sure it has been loaded correctly. Next, the neural network is created using this statement:
NeuralNetwork nn =
new NeuralNetwork(8, 100, 1, seed: 0);
The network has 8 input nodes, 100 hidden processing nodes and 1 output node. The number of input and output nodes is determined by the data. The number of hidden nodes is a hyperparameter that must be determined by trial and error. The seed parameter is fed to an internal C# Random generator object, which is used to randomly initialize the values of the network weights and biases, and to randomly shuffle the order of the training data during training.
Training the Neural Network Regression Model
The neural network training parameters are set like so:
int maxEpochs = 2000;
double lrnRate = 0.01;
int batSize = 10;
A training epoch is one pass through the training dataset. The max epochs value must be determined by trial and error. If you use too few training epochs, the resulting model will underfit and predict poorly. If you use too many training epochs, the model will overfit where prediction accuracy on the training data is high but accuracy on new, previously unseen data is low.
The learning rate controls how much the weight and bias values change on each training iteration. The learning rate must be determined by trial and error. If you use a rate that is too small, training will be too slow. If you use a rate that is too high, training can jump past good weight and bias values.
The batch size controls how many training items are analyzed to estimate the overall error gradient before weights and biases are updated. It is good practice to use a batch size that evenly divides the number of training items so that all batches have the same number of data items.
The neural network is trained using these statements:
Console.WriteLine("Starting (batch) training ");
nn.TrainBatch(trainX, trainY, lrnRate,
batSize, maxEpochs);
Console.WriteLine("Done ");
The max epochs, learning rate and batch size interact in complex ways, and so when you search for good values, it's not possible to optimize each parameter in a sequential way. Searching for good training parameter values is often time-consuming. Typically, my colleagues and I begin by manually searching for good combinations of values. Once we get a rough idea of what the ranges of good values might be, we set up arrays of possible values and then programmatically try all possible combinations. This is called grid search.
Evaluating the Neural Network Regression Model
During training, the neural network is monitored by computing and displaying the mean squared error between computed output values and correct target values. For example, with just three data items, if the correct target income values are (0.5000, 0.9200 and 0.6800) and the associated predicted income values are (0.4800, 0.9600 and 0.6700), then the mean squared error is ((0.5000 - 0.4800)^2 + (0.9200 - 0.9600)^2 + (0.6800 - 0.6700)^2) / 3 = (0.0004 + 0.0016 + 0.0001) / 3 = 0.0007. A common alternative, which has the same amount of information, is root mean squared error, which is just the square root of mean squared error.
Because there is no inherent definition of regression model accuracy, it's necessary to implement a program-defined accuracy method. The demo network defines a correct prediction as one that is within a specified percentage of the true target income value. The demo uses 10 percent, but a reasonable percentage interval to use will vary from problem to problem. The calling statements are:
double trainAcc = nn.Accuracy(trainX, trainY, 0.10);
Console.WriteLine("Accuracy on train data = " +
trainAcc.ToString("F4"));
double testAcc = nn.Accuracy(testX, testY, 0.10);
Console.WriteLine("Accuracy on test data = " +
testAcc.ToString("F4"));
Model accuracy is ultimately what you're after in most situations. But accuracy is too coarse to use for training. Mean squared error is less ambiguous than accuracy but can be misleading when evaluating a regression model.
Using the Neural Network Regression Model
After the neural network model has been evaluated, the demo concludes by using the model to predict the income of a previously unseen person who is male, age 34, lives in Oklahoma and is a political moderate:
Console.WriteLine("Predicting income for male" +
" 34 Oklahoma moderate ");
double[] X = new double[] { 0, 0.34, 0,0,1, 0,1,0 };
double y = nn.ComputeOutput(X);
Console.WriteLine("Predicted income = " +
y.ToString("F5"));
Console.WriteLine("End demo ");
Console.ReadLine(); // keep shell open
Predictions must be made using the same encoding and normalization that's used when training the model. The output of 0.44382 is normalized so the actual predicted income is $44,382.
In a non-demo scenario, you might want to save the trained model weights and biases so that the model can be used without retraining it. Saving would look like:
string fn = "..\\..\\..\\Models\\people_wts.txt";
nn.SaveWeights(fn);
Then another system could use the trained model like so:
string fn = "..\\..\\..\\Models\\people_wts.txt";
NeuralNetwork nn2 = new NeuralNetwork(8, 100, 1, 0);
nn2.LoadWeights(fn);
Notice that this approach assumes the system that uses the trained model has access to the NeuralNetwork class definition.
Wrapping Up
The demo program can be used as a template for most regression problems. The architecture parameter to explore is the number of hidden nodes. The training hyperparameters to explore are the max epochs, learning rate, and batch size.
The demo program uses random uniform initialization for the network weights and biases. The key code is:
double lo = -0.01; double hi = +0.01;
int numWts = (this.ni * this.nh) +
(this.nh * this.no) + this.nh + this.no;
double[] initialWeights = new double[numWts];
for (int i = 0; i < initialWeights.Length; ++i)
initialWeights[i] =
(hi - lo) * rnd.NextDouble() + lo;
The range is specified as [-0.01, +0.01). Somewhat surprisingly, weight and bias initialization values have a big impact on a neural network model. You might want to experiment with other initialization range values, however this introduces two more hyperparameters to deal with. Large neural networks often use fancy initialization schemes based on the neural architecture -- Glorot initialization, He initialization and others -- but these can be a bit of overkill for relatively simple neural networks like the one presented in this article.
The demo neural network uses a single hidden layer. It is possible to extend the demo network architecture to multiple hidden layers, but this requires a huge effort. Theoretically, a neural network with a single hidden layer and enough hidden nodes can compute anything that a neural network with multiple hidden layers can compute. This fact comes from what is called the Universal Approximation Theorem.
The demo uses tanh() hidden node activation and identity() output node activation. The training method assumes these two activation functions are used, so you shouldn't change activation functions unless you understand the deep theory of back-propagation and can modify the training code.
Note: My thanks to Thorsten Kleppe who reviewed the code presented in this article and made valuable suggestions.
About the Author
Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products including Azure and Bing. James can be reached at [email protected].