The Data Science Lab

How to Do Machine Learning Perceptron Classification Using C#

Dr. James McCaffrey of Microsoft Research uses code samples and screen shots to explain perceptron classification, a machine learning technique that can be used for predicting if a person is male or female based on numeric predictors such as age, height, weight, and so on. It's mostly useful to provide a baseline result for comparison with more powerful ML techniques such as logistic regression and k-nearest neighbors.

Perceptron classification is arguably the most rudimentary machine learning (ML) technique. The perceptron technique can be used for binary classification, for example predicting if a person is male or female based on numeric predictors such as age, height, weight, and so on. From a practical point of view, perceptron classification is useful mostly to provide a baseline result for comparison with more powerful ML techniques such as logistic regression and k-nearest neighbors.

From a conceptual point of view, understanding how perceptron classification works is often considered fundamental knowledge for ML engineers, is interesting historically, and contains important techniques used by logistic regression and neural network classification. In fact, the simplest type of neural network is often called a multi-layer perceptron.

Additionally, understanding exactly how perceptron classification works by coding a system from scratch allows you to understand the system's strengths and weaknesses in case you encounter the technique in an ML code library. For example, the Azure ML.NET library has a perceptron classifier, but the library documentation doesn't fully explain how the technique works or when to use it.

A good way to get a feel for what perceptron classification is and to see where this article is headed is to take a look at the screenshot of a demo program in Figure 1. The goal of the demo is to create a model that predicts if a banknote (think dollar bill or euro) is authentic or a forgery.

Perceptron Binary Classification Demo
[Click on image for larger view.] Figure 1: Perceptron Binary Classification Demo

The demo program sets up a tiny set of 10 items to train the model. Each data item has four predictor variables (often called features in ML terminology) that are characteristics of a digital image of each banknote: variance, skewness, kurtosis, and entropy. Each data item is labeled as -1 (authentic) or +1 (forgery).

Behind the scenes, the demo program uses the 10-item training dataset to create a perceptron prediction model. The final model scores 0.6000 accuracy on the training data (6 correct predictions, 4 wrong). The demo concludes by using the perceptron model to predict the authenticity of a new, previously unseen banknote with predictor values (0.00, 2.00, -1.00, 1.00). The computed output is -1 (authentic).

This article assumes you have intermediate or better skill with C# but doesn’t assume you know anything about perceptron classification. The complete code for the demo program shown is presented in this article. The code is also available in the file download that accompanies this article.

Understanding the Data
The demo program uses a tiny 10-item subset of a well-known benchmark collection of data called the Banknote Authentication Dataset. The full dataset has 1,372 items, with 762 authentic and 610 forgery items. You can find the complete dataset in many places on the Internet, including here for convenience.

Most versions of the dataset encode authentic as 0 and forgery as 1. For perceptron classification, it's much more convenient to encode the two possible class labels to predict as -1 and +1 instead of 0 and 1. Which class is encoded as -1 and which class is encoded as +1 is arbitrary but it's up to you to keep track of what each value means.

Because the data has four dimensions, it's not possible to display the data in a two-dimensional graph. However, you can get an idea of what the data is like by taking a look at a graph of partial data shown in Figure 2.

Perceptron Binary Classification Demo
[Click on image for larger view.] Figure 2: Partial Graph of the Banknote Authentication Data

The graph plots just the skewness and entropy of the 10 items. The key point is that perceptron classifiers only work well with data that is linearly separable. For data that is linearly separable, it's possible to draw a line (or hyperplane for three or more dimensions) that separates the data so that all of one class is on one side of the line and all of the other class is on the other side. You can see in Figure 2 that no line will perfectly separate the two classes. In general, you won't know in advance if your data is linearly separable or not.

Understanding How Perceptron Classification Works
Perceptron classification is very simple. For a dataset with n predictor variables, there will be n weights plus one special weight called a bias. The weights and bias are just numeric constants with values like -1.2345 and 0.9876. To make a prediction, you sum the products of each predictor value and its associated weight and then add the bias. If the sum is negative the prediction is class -1 and if the sum is positive the prediction is class +1.

For example, suppose you have a dataset with three predictor variables and suppose that the three associated weight values are (0.20, -0.50, 0.40) and the bias value is 1.10. If the item to predict has values (-7.0, 3.0, 9.0) then the computed output is (0.20 * -7.0) + (-0.50 * 3.0) + (0.40 * 9.0) + 1.10 = -1.4 + (-1.5) + 3.6 + 1.1 = +1.8 and therefore the predicted class is +1.

Of course the tricky part is determining the weights and bias values of a perceptron classifier. This is called training the model. Briefly, training is an iterative process that tries different values for the model's weights and the bias until the computed outputs closely match the known correct class values in the training data.

Because of the way perceptron classification output is computed, it's usually a good idea to normalize the training data so that small predictor values (such as a GPA of 3.15) aren't overwhelmed by large predictor values (such as an annual income of 65,000.00). The demo program doesn't used normalized data because all the predictor values are roughly in the same range (about -15.0 to + 15.0). The three most common normalization techniques are min-max normalization, z-score normalization, and order of magnitude normalization.

The Demo Program
To create the demo program, I launched Visual Studio 2019. I used the Community (free) edition but any relatively recent version of Visual Studio will work fine. From the main Visual Studio start window I selected the "Create a new project" option. Next, I selected C# from the Language dropdown control and Console from the Project Type dropdown, and then picked the "Console App (.NET Core)" item.

The code presented in this article will run as a .NET Core console application or as a .NET Framework application. Many of the newer Microsoft technologies, such as the ML.NET code library, specifically target .NET Core so it makes sense to develop most new C# machine learning code in that environment.

I entered "Perceptron" as the Project Name, specified C:\VSM on my local machine as the Location (you can use any convenient directory), and checked the "Place solution and project in the same directory" box.

After the template code loaded into Visual Studio, at the top of the editor window I removed all using statements to unneeded namespaces, leaving just the reference to the top-level System namespace. The demo needs no other assemblies and uses no external code libraries.

In the Solution Explorer window, I renamed file Program.cs to the more descriptive PerceptronProgram.cs and then in the editor window I renamed class Program to class PerceptronProgram to match the file name. The structure of the demo program, with a few minor edits to save space, is shown in Listing 1.

Listing 1. Perceptron Classification Demo Program Structure

using System;
namespace Perceptron
{
  class PerceptronProgram
  {
  static void Main(string[] args)
  {
    Console.WriteLine("Begin perceptron demo");
    Console.WriteLine("Authentic (-1) fake (+1)");
    Console.WriteLine("Data looks like: ");
    Console.WriteLine(" 3.6216,  8.6661," +
      " -2.8073, -0.44699, -1");
    Console.WriteLine("-2.0631, -1.5147," +
      "  1.219,   0.44524, +1");
    
    Console.WriteLine("Loading data");
    double[][] xTrain = new double[10][];
    xTrain[0] = new double[] { 3.6216, 8.6661,
      -2.8073, -0.44699 }; // auth
    xTrain[1] = new double[] { 4.5459, 8.1674,
      -2.4586, -1.4621 };
    xTrain[2] = new double[] { 3.866, -2.6383,
      1.9242, 0.10645 };
    xTrain[3] = new double[] { 2.0922, -6.81,
      8.4636, -0.60216 };
    xTrain[4] = new double[] { 4.3684, 9.6718,
      -3.9606, -3.1625 };

    xTrain[5] = new double[] { -2.0631, -1.5147,
      1.219, 0.44524 }; // forgeries
    xTrain[6] = new double[] { -4.4779, 7.3708,
      -0.31218, -6.7754 };
    xTrain[7] = new double[] { -3.8483, -12.8047,
      15.6824, -1.281 };
    xTrain[8] = new double[] { -2.2804, -0.30626,
      1.3347, 1.3763 };
    xTrain[9] = new double[] { -1.7582, 2.7397,
      -2.5323, -2.234 };

    int[] yTrain = new int[] { -1, -1, -1, -1, -1,
      1, 1, 1, 1, 1 }; // -1 = auth, 1 = forgery

    int maxIter = 100;
    double lr = 0.01;
    Console.WriteLine("Starting training");
    double[] wts =
      Train(xTrain, yTrain, lr, maxIter, 0);
    Console.WriteLine("Training complete");

    double acc = Accuracy(xTrain, yTrain, wts);
    Console.WriteLine("Accuracy = ");
    Console.WriteLine(acc.ToString("F4"));

    Console.WriteLine("Weights and bias: ");
    for (int i = 0; i < wts.Length; ++i)
      Console.Write(wts[i].PadLeft(8));
    Console.WriteLine("");

    Console.WriteLine("Note (0.00 2.00 -1.00 1.00)");
    double[] unknown = new double[] { 0.00, 2.00,
      -1.00, 1.00 };
    double z = ComputeOutput(unknown, wts);
    Console.WriteLine("Computed output = ");
    Console.WriteLine(z);  // -1 or +1

    Console.WriteLine("End perceptron demo ");
    Console.ReadLine();
  } // Main

  static int ComputeOutput(double[] x,
    double[] wts) { . . }
  
  static double[] Train(double[][] xData,
    int[] yData, double lr, int maxEpochs,
    int seed) { . . }
  
  static void Shuffle(int[] indices,
    Random rnd) { . . }
  
  static double Accuracy(double[][] xData,
    int[] yData, double[] wts) { . . }
  
  } // Program class
} // ns

All of the program logic is contained in the Main method. The demo uses a static method approach rather than an OOP approach for simplicity. All normal error checking has been removed to keep the main ideas as clear as possible.

The demo begins by setting up the training data:

double[][] xTrain = new double[10][];
xTrain[0] = new double[] { 3.6216, 8.6661,
  -2.8073, -0.44699 };
. . .

int[] yTrain = new int[] { -1, -1, -1, -1, -1,
  1, 1, 1, 1, 1 };
  

The predictor values are hard-coded and stored into an array-of-arrays style matrix. The class labels are stored in a single integer array. In a non-demo scenario you'd likely want to store your training data as a text file:

  
  3.6216, 8.6661, -2.8073, -0.44699, -1
  4.5459, 8.1674, -2.4586, -1.4621, -1
  . . . 
  -1.7582, 2.7397, -2.5323, -2.234, 1
      

And then you'd read the training data into memory using helper functions along the lines of:

double[][] xTrain = MatLoad("..\\data.txt",
  new int[] { 0, 1, 2, 3 }, ",");
int[] yTrain = VecLoad("..\\data.txt", 4, ",");

In many scenarios you'd want to set aside some of your source data as a test dataset. After training you'd compute the prediction accuracy of the model on the held-out dataset. This accuracy metric would be a rough estimate of the accuracy you could expect on new, previously unseen data.

After setting up the training data, the demo program trains the model using these statements:
int maxIter = 100;
double lr = 0.01;
Console.WriteLine("Starting training");
double[] wts = Train(xTrain, yTrain,
  lr, maxIter, 0);

The maxIter variable holds the number of training iterations to perform and the lr variable holds the learning rate. Both of these values are hyperparameters that must be determined using trial and error. The learning rate influences how much the weights and bias change on each training iteration.

The 0 argument passed to the Train() function is the seed value for a Random object that is used to scramble the order in which training items are processed. The Train() function returns an array that holds the weights and the bias, which essentially defines the perceptron classification model.

After training, the demo program computes the model's accuracy on the training data, and then displays the values of the weights and bias:

double acc = Accuracy(xTrain, yTrain, wts);
Console.WriteLine(acc.ToString("F4"));

Console.WriteLine("\nModel weights and bias: ");
for (int i = 0; i < wts.Length; ++i)
  Console.Write(wts[i].ToString("F4").PadLeft(8));
Console.WriteLine("");

The demo concludes by making a prediction for a new banknote item:

double[] unknown =
  new double[] { 0.00, 2.00, -1.00, 1.00 };
double z = ComputeOutput(unknown, wts);
Console.WriteLine("Computed output = ");
Console.WriteLine(z);
      

The Accuracy() function computes the number of correct and incorrect predictions on the training data. Because the training data has five authentic and five forgery items, just by guessing either class you would get 50 percent accuracy. Therefore the 60 percent accuracy of the demo model isn't very strong and in a non-demo scenario you'd likely next try a more powerful approach such as logistic regression, k-nearest neighbors, numeric naive Bayes, or a neural network.


comments powered by Disqus

Featured

Subscribe on YouTube