The Data Science Lab
Neural Network Regression from Scratch Using C#
Compared to other regression techniques, a well-tuned neural network regression system can produce the most accurate prediction model, says Dr. James McCaffrey of Microsoft Research in presenting this full-code, step-by-step tutorial.
The goal of a machine learning regression problem is to predict a single numeric value. For example, you might want to predict the annual income of a person based on their sex (male or female), age, State of residence and political leaning (conservative, moderate, liberal).
There are roughly a dozen major regression techniques, and each technique has several variations. Among the most common techniques are linear regression, linear ridge regression, k-nearest neighbors regression, kernel ridge regression, Gaussian process regression, decision tree regression and neural network regression. Each technique has pros and cons. This article explains how to implement neural network regression from scratch, using the C# language.
Compared to other regression techniques, a well-tuned neural network regression system can produce the most accurate prediction model. However, neural networks are complex, sometimes don't work well with small (less than 100 items) datasets, and can be very difficult to tune.
A good way to see where this article is headed is to take a look at the screenshot of a demo program in Figure 1. The demo program uses a 200-item set of training data and a 40-item set of test data that look like:
0, 0.24, 1,0,0, 0.2950, 0,0,1
1, 0.39, 0,0,1, 0.5120, 0,1,0
0, 0.63, 0,1,0, 0.7580, 1,0,0
. . .
The fields are sex, age, State, income and political leaning. The goal is to predict income from the other four variables.
The demo program creates an 8-100-1 neural network regression model, which means there are 8 input nodes, 100 hidden processing nodes and 1 output node. The program trains the network for 2,000 epochs. The trained prediction model scores 0.9200 accuracy on the training data (184 out of 200 correct) and 0.9500 accuracy on a test dataset (38 out of 40 correct).
The demo program concludes by predicting the income for a new, previously unseen person who is male, age 34, lives in Oklahoma and is a political moderate. The predicted income is $44,382.
This article assumes you have intermediate or better programming skill but doesn't assume you know anything about neural network regression. The demo is implemented using C#, but with a bit of effort you should be able to refactor the code to a different C-family language if you wish.
The source code for the demo program is too long to be presented in its entirety in this article. The complete code is available in the accompanying file download. The demo code and data are also available online.
Understanding the Data
To create a neural network regression system, the training data must be prepared by encoding categorical predictor variables and normalizing numeric predictor variables. For the demo, the raw data looks like:
F, 24, michigan, 29500.00, liberal
M, 39, oklahoma, 51200.00, moderate
F, 63, nebraska, 75800.00, conservative
M, 36, michigan, 44500.00, moderate
. . .
The normalized and encoded data looks like:
1, 0.24, 1, 0, 0, 0.2950, 0, 0, 1
0, 0.39, 0, 0, 1, 0.5120, 0, 1, 0
1, 0.63, 0, 1, 0, 0.7580, 1, 0, 0
0, 0.36, 1, 0, 0, 0.4450, 0, 1, 0
. . .
Binary predictors, such as sex (M, F), can be zero-one encoded or minus-one-plus-one encoded. In theory, minus-one-plus-one encoding is slightly superior to zero-one encoding, but in practice there is usually no significant difference. The demo uses zero-one encoding where male = 0 and female = 1.
Numeric predictor data should be normalized so that all values have roughly the same range. The three most common techniques for numeric normalization are divide-by-k, min-max and z-score. I recommend using the divide-by-k technique when possible. The age values are all divided by 100 so that the normalized age values are between 0 and 1. For regression problems, the target numeric variable can be normalized in the same way as predictor variables. The target income values are divided by 100,000 so they're all between 0 and 1.
Categorical predictor data should be one-hot encoded. The State predictor values are encoded so that Michigan = 100, Nebraska = 010 and Oklahoma = 001. If there were four State values, they would be encoded as 1000, 0100, 0010 and 0001. The political leaning values are encoded as conservative = 100, moderate = 010 and liberal = 001. The order in which categorical values are one-hot encoded is arbitrary.
The demo does not have ordinal predictor data such as a height variable with possible values short, medium and tall. In theory you can encode ordinal data using a scheme that retains order information, such as short = 0.3, medium = 0.5 and tall = 0.7, but in practice ordinal data is usually one-hot encoded.
Data normalization and encoding is a surprisingly complex and subtle topic. In a non-demo scenario, data preparation can be tedious and time-consuming. It is possible to use raw training and test data and then programmatically normalize and encode the data, but in practice, data is usually preprocessed.
Understanding Neural Network Regression
A neural network is essentially a complex math function. The neural network input-output mechanism is illustrated in Figure 2. The figure shows a simple neural network regression system with three input nodes, four hidden processing nodes, and one output node.
Each pair of nodes in adjacent layers is conceptually connected by a weight value. There are 3 * 4 = 12 input-to-hidden weights, and 4 * 1 = 4 hidden-to-output weights. Each hidden node and output node has a special weight called a bias, and so there are 4 + 1 = 5 bias values.
The values of the hidden nodes are computed as the hyperbolic tangent (tanh) of the sum of the products of each input node times its weight, plus the bias. For example, to compute the value of the top-most hidden node, if the three input values are 1.0, 2.0 and 3.0, and the three associated input-to-hidden weights are 0.01, 0.05 and 0.09, and the bias value is 0.13, then:
hNode0 = tanh( (1.0 * 0.01) + (2.0 * 0.05) +
(3.0 * 0.09) + 0.13 )
= tanh( 0.01 + 0.10 + 0.27 + 0.13 )
= tanh(0.51)
= 0.4699
The tanh() function is called the hidden node activation. Other hidden node activation functions include logistic sigmoid (formerly quite common but now rarely used) and relu ("rectified linear unit"), which is most often used for very large neural networks with multiple hidden layers but is rarely used for neural regression systems.
The single output node is computed in the same way as each hidden node except that the activation function is identity() instead of tanh(). If the values of the four hidden nodes are (0.4699, 0.5227, 0.5717 and 0.6169) and the four associated hidden-to-output weights are (0.17, 0.18, 0.19 and 0.20), and the output node bias is 0.21, then the output node value is computed as:
oNode = identity( (0.4699 * 0.17) + (0.5227 * 0.18) +
(0.5717 * 0.19) + (0.6169 * 0.20) + 0.21 )
= identity( 0.0799 + 0.0941 + 0.1086 + 0.1234 + 0.21 )
= identity(0.6160)
= 0.6160
The identity() function just returns its input. For neural network classifiers that don't do regression, the most common output activation functions are softmax() for multi-class classification, and sigmoid() for binary classification.
To recap, a neural network regression system is a complex math function where the output value depends on the input values, the hidden layer activation function, the values of the input-hidden weights, the hidden biases, the hidden-output weights and the output bias. But where do the values of the weights and biases come from?
Neural network weights and biases are computed by looking at training data with known input values and known correct output values. The idea is to find values of the weights and biases so that computed output values are as close as possible to the target correct values. Put another way, the values of the weights and biases are those that minimize the error between computed and target values.
Although there are several algorithms to find the values of neural network weights and biases, by far the most common technique is called stochastic gradient descent (SGD). There are many variations of SGD including Adam (adaptive momentum), RMSProp (root mean squared resilient back-propagation) and others. The specific algorithm used for SGD is called back-propagation (often spelled without the hyphen).
Training a neural network is one of the most complex topics in machine learning -- even explaining the complexity of neural network training is complex. That said, you don't need to have a complete understanding of how neural network training works -- which would literally take months of dedicated study -- in order to use neural networks effectively. An analogy is that you don't need to have a complete understanding of internal combustion engines in order to use an automobile effectively.
Overall Program Structure
I used Visual Studio 2022 (Community Free Edition) for the demo program. I created a new C# console application named NeuralNetworkRegression and checked the "Place solution and project in the same directory" option. I specified .NET version 6.0. I checked the "Do not use top-level statements" option to avoid the program entry point shortcut syntax.
The demo program has no significant .NET dependencies and any relatively recent version of Visual Studio with .NET (Core) or the older .NET Framework will work fine. You can also use the Visual Studio Code program if you like.
After the template code loaded into the editor, I right-clicked on file Program.cs in the Solution Explorer window and renamed the file to the more descriptive NeuralRegressionProgram.cs. I allowed Visual Studio to automatically rename class Program.
The overall program structure is presented in Listing 1. The demo program has three primary classes. The Program class holds all the control logic. The NeuralNetwork class houses all the neural network functionality. The Utils class houses functionality to load data from file into memory.
Listing 1: Overall Program Structure
using System;
using System.IO;
namespace NeuralNetworkRegression
{
internal class NeuralRegressionProgram
{
static void Main(string[] args)
{
Console.WriteLine("Neural network " +
"regression C# ");
Console.WriteLine("Predict income from sex," +
" age, State, political leaning ");
// 1. load data from file into memory
// 2. create neural network
// 3. train neural network
// 3. evaluate neural network accuracy
// 4. use neural network to make a prediction
Console.WriteLine("End demo ");
Console.ReadLine();
} // Main
} // Program
public class NeuralNetwork
{
private int ni; // number input nodes
private int nh;
private int no;
private double[] iNodes;
private double[][] ihWeights; // input-hidden
private double[] hBiases;
private double[] hNodes;
private double[][] hoWeights; // hidden-output
private double[] oBiases;
private double[] oNodes; // single val as array
private Random rnd;
public NeuralNetwork(int numIn, int numHid,
int numOut, int seed) { . . }
private void InitWeights() { . . }
public void SetWeights(double[] wts) { . . }
public double[] GetWeights() { . . }
public double ComputeOutput(double[] x) { . . }
private static double HyperTan(double x) { . . }
private static double Identity(double x) { . . }
public void TrainBatch(double[][] trainX,
double[] trainY, double lrnRate, int batSize,
int maxEpochs) { . . }
private void Shuffle(int[] sequence) { . . }
public double Error(double[][] trainX,
double[] trainY) { . . }
public double Accuracy(double[][] dataX,
double[] dataY, double pctClose) { . . }
} // NeuralNetwork class
public class Utils
{
public static double[][] VecToMat(double[] vec,
int rows, int cols) { . . }
public static double[][] MatCreate(int rows,
int cols) { . . }
static int NumNonCommentLines(string fn,
string comment) { . . }
public static double[][] MatLoad(string fn,
int[] usecols, char sep, string comment) { . . }
public static double[] MatToVec(double[][] m) { . . }
public static void MatShow(double[][] m,
int dec, int wid) { . . }
public static void VecShow(int[] vec, int wid) { . . }
public static void VecShow(double[] vec,
int dec, int wid, bool newLine) { . . }
} // Utils class
} // ns
Creating the Neural Network Regression Model
The Main method begins by loading the 200-item training data from file into memory:
string trainFile =
"..\\..\\..\\Data\\people_train.txt";
// sex, age, State, income, politics
// 0 0.32 1 0 0 0.65400 0 0 1
double[][] trainX = Utils.MatLoad(trainFile,
new int[] { 0, 1, 2, 3, 4, 6, 7, 8 }, ',', "#");
double[] trainY =
Utils.MatToVec(Utils.MatLoad(trainFile,
new int[] { 5 }, ',', "#"));
The code assumes that the data file is in a directory named Data in the root project directory. The MatLoad() arguments instruct the method to load comma-separated columns 0, 1, 2, 3, 4, 6, 7 and 8 as predictors, where lines beginning with "#" are interpreted as comments. The income values in column 5 are loaded into a C# matrix and then converted to an array/vector.