The Data Science Lab

Neural Networks Using Python and NumPy

With Python and NumPy getting lots of exposure lately, I'll show how to use those tools to build a simple feed-forward neural network.

Over the past few months, the use of the Python programming language has increased greatly, at least among my colleagues who do data science and machine learning. I suspect this increase is due in large part to the release of two very powerful code libraries: Microsoft CNTK and Google TensorFlow. Both libraries have a Python API, which is the preferred interface.

Both libraries work at a fairly low level of abstraction, which means that a solid knowledge of Python is important when working with either library. Additionally, both libraries make extensive use of the "numerical Python" (NumPy) add-in package to create vectors and matrices, which typically offer better performance than Python's built-in list type.

In this article I'll explain how to implement a simple feed-forward neural network from scratch, using just Python 3.x and NumPy. After reading this article you should have a solid grasp of neural network fundamentals, as well as knowledge of Python and NumPy techniques that would be useful when working with CNTK or Tensorflow.

The best way to get a feel for where this article is headed is to take a look at the screenshot of a demo program shown in Figure 1. The demo Python program is designed to illustrate how the neural network input-output mechanism works. The demo does not create a prediction model as you'd do in a realistic scenario.

[Click on image for larger view.] Figure 1. Python Neural Network IO Demo

The demo creates a neural network with three input nodes, four hidden processing nodes and two output nodes. If you're new to neural networks you can think of a neural network as a complex math function that accepts a set of numeric inputs and produces one or more numeric outputs.


The demo input values are (1.0, 2.0, 3.0) and the output values are (0.4920, 0.5080). The demo program displays the values of the neural net's so-called weights and bias values, which determine the output values for a given set of input values. The demo also displays some intermediate values (pre-activation hidden node values, and pre-activation output node values) that are calculated inside the neural net.

This article assumes you have a basic familiarity with Python or any C-family language such as C#, C/C++, JavaScript or Java, but does not assume you know anything about neural networks. The demo program is a bit too long to present in its entirety in this article, but the complete source code is available in the accompanying file download.

I wrote the demo using the 3.5 version of Python and the 1.11.1 version of NumPy. It is possible to install Python and NumPy separately; however, if you're new to Python and NumPy, I recommend installing the Anaconda distribution of Python, which simplifies installation and gives you many additional useful packages.

Understanding Neural Network Input-Output
Before looking at the demo code, it's important to understand the neural network input-output mechanism. The diagram in Figure 2 corresponds to the demo program.

[Click on image for larger view.] Figure 2. Neural Network Input-Output

The input node values are (1.0, 2.0, 3.0). Each blue line connecting input-to-hidden and hidden-to-output nodes represents a numeric constant called a weight. If nodes are zero-based indexed with node [0] at the top of the diagram, then the weight from input[0] to hidden[0] is 0.01, and the weight from hidden[3] to output[1] is 0.24.

Each hidden node and each output node (but not the input nodes) has an additional special weight called a bias. The bias value for hidden[3] is 0.16 and the bias for output[0] is 0.25.

Notice that if there are ni input nodes and nh hidden nodes and no output nodes. Then there are a total of (ni * nh) + (nh * no) + nh + no weights and biases. For the demo 3-4-2 neural net there are (3 * 4) + (4 * 2) + 4 + 2 = 26 weights and bias values.

In words, to compute the value of a hidden node, you multiply each input value times its associated input-to-hidden weight, add the products up, then add the bias value, and then apply the hyperbolic tangent function to the sum. For hidden[0] this is:

sum[0] = (1.0)(0.01) + (2.0)(0.05) + (3.0)(0.09) + 0.13
            = 0.5100

hidden[0] = tanh(0.5100)
                = 0.4699

The tanh function is called the neural network hidden layer activation function. The tanh function forces all hidden node values to be between -1.0 and +1.0. A common alternative activation function is the logistic sigmoid function.

The output nodes are calculated similarly, but instead of using tanh or logistic sigmoid activation, a special function called softmax is used for activation. The preliminary sum of products plus bias values of output[0] and output[1] are:

sum[0] = (0.4699)(0.17) + (0.5227)(0.19) + (0.5717)(0.21) + (0.6169)(0.23) + 0.25
            = 0.6911

sum[1] = (0.4699)(0.18) + (0.5227)(0.20) + (0.5717)(0.22) + (0.6169)(0.24) + 0.26
            = 0.7229

The softmax function is perhaps best explained by example:

divisor = exp(0.6911) +  exp(0.7229)
            = 1.9960 + 2.0604
            = 4.0563

output[0] = 1.9960 / 4.0563
               = 0.4920

ouput[1] = 2.0604 / 4.0563
              = 0.5080

Notice that softmax uses the exp(x) function, which can be astronomically large for even moderate values of x. The demo program uses a clever algebra trick ("the softmax max trick") to reduce the possibility of arithmetic overflow.

The purpose of softmax activation is to scale output values so that they sum to 1.0 and can be interpreted as probabilities. Suppose the demo corresponded to a problem where the goal is to predict if a person is male or female based on three predictor variables such as annual income, years of education and height. If male is encoded as (1, 0) and female is encode as (0, 1), then the prediction is female because the second output value (0.5080) is larger than the first (0.4920). Note: It's relatively uncommon to use (1, 0) and (0, 1) encoding for a binary classification problem, but I used this encoding in the explanation to match the demo neural network architecture.

Overall Program Structure
The overall program structure is presented in Listing 1. To edit the demo program I used the simple Notepad++ program. Most of my colleagues prefer using one of the many nice Python editors that are available.

I commented the name of the program and indicated the Python version used. I added three import statements to gain access to the NumPy package, and the math and random modules. As I mentioned earlier, the use of the NumPy package is becoming increasingly common.

Listing 1: Overall Program Structure
# nn_io.py
# Python 3.x

import numpy as np
import random
import math

# ------------------------------------

def showVector(v, dec): . . .
def showMatrix(m, dec): . . .

# ------------------------------------

class NeuralNetwork: . . .
  
# ------------------------------------

def main():
  print("\nBegin NN demo \n")
  
  # np.random.seed(0)  # does not affect the NN
  numInput = 3
  numHidden = 4
  numOutput = 2
  print("Creating a %d-%d-%d neural network " %
    (numInput, numHidden, numOutput) )
  nn = NeuralNetwork(numInput, numHidden, numOutput)
  
  print("\nSetting weights and biases ")
  numWts = NeuralNetwork.totalWeights(numInput, numHidden,
    numOutput)
  wts = np.zeros(shape=[numWts], dtype=np.float32)
  for i in range(len(wts)):
    wts[i] = ((i+1) * 0.01)  # [0.01, 0.02, . . . ]
  nn.setWeights(wts)
  
  # wts = nn.getWeights()  # verify weights 
  # showVector(wts, 2)

  xValues = np.array([1.0, 2.0, 3.0], dtype=np.float32)
  print("\nInput values are: ")
  showVector(xValues, 1)
  
  yValues = nn.computeOutputs(xValues)
  print("\nOutput values are: ")
  showVector(yValues, 4)

  print("\nEnd demo \n")
   
if __name__ == "__main__":
  main()

# end script

The demo program consists mostly of a program-defined NeuralNetwork class. I created a main function to hold all program control logic. I set up the demo neural network like so:

def main():
  print("\nBegin NN demo \n")
  # np.random.seed(0)  # does not affect the NN
  numInput = 3
  numHidden = 4
  numOutput = 2
  print("Creating a %d-%d-%d neural network " %
    (numInput, numHidden, numOutput) )
  nn = NeuralNetwork(numInput, numHidden, numOutput)
...

As you'll see shortly, the NeuralNetwork class has a self-contained random number generator so there's no need to set the seed for the Python global RNG. The seed value in the NeuralNetwork class definition is hardcoded to 0. An alternative design is to pass a seed value to the constructor function in addition to the number of input, hidden and output nodes.

The weights and bias values are set using these statements:

print("\nSetting weights and biases ")
numWts = NeuralNetwork.totalWeights(numInput, numHidden, numOutput)
wts = np.zeros(shape=[numWts], dtype=np.float32)  # 26 cells
for i in range(len(wts)):
  wts[i] = ((i+1) * 0.01)  # [0.01, 0.02, . . 0.26 ]
nn.setWeights(wts)

The totalWeights method is defined as a static method so it’s called on the class name (NeuralNetwork) rather than the object instance name (nn). A NumPy one-dimensional array named wts is created to hold the 26 weights and bias values, and then these values are set to (0.01, 0.02, . . . 0.26), as shown in Figure 2, by calling the setWeights method. Note that I'm using the relatively uncommon camel-case style rather than the more usual underscore style for variable and method names.

The neural network is exercised like this:

...
  xValues = np.array([1.0, 2.0, 3.0], dtype=np.float32)
  print("\nInput values are: ")
  showVector(xValues, 1)
  
  yValues = nn.computeOutputs(xValues)
  print("\nOutput values are: ")
  showVector(yValues, 4)

  print("\nEnd demo \n")
   
if __name__ == "__main__":
  main()

# end script

The computeOutputs method implements the neural network input-output process described in the previous section of this article. The input (xValues) and output (yValues) vectors are displayed using a program-defined showVector function rather than using the built-in print function just to demonstrate that you have a great deal of flexibility when implementing a neural network from scratch.

The Neural Network Class
The structure of the Python neural network class is presented in Listing 2. Python function and method definitions begin with the def keyword. All class methods and data members have essentially public scope, as opposed to languages like Java and C#, which can impose private scope. The built-in __init__ method (with two leading and two trailing underscore characters) can be loosely thought of as a constructor. All class method definitions must include the "self" keyword as the first parameter, except for methods that are decorated with the @staticmethod attribute.

Listing 2: NeuralNetwork Class Structure
class NeuralNetwork:
  def __init__(self, numInput, numHidden, numOutput): ... 
  def setWeights(self, weights): ... 
  def getWeights(self): ... 
  def initializeWeights(self): ... 
  def computeOutputs(self, xValues): ... 

  @staticmethod
  def hypertan(x): ...

  @staticmethod	  
  def softmax(oSums): ...

  @staticmethod
  def totalWeights(nInput, nHidden, nOutput): ...

# end class NeuralNetwork

The definition of the __init__ method begins with:

def __init__(self, numInput, numHidden, numOutput):
  self.ni = numInput
  self.nh = numHidden
  self.no = numOutput
...

Unlike most other programming languages, Python uses indentation, rather than begin-end keywords or begin-end curly brace characters, to establish the beginning and ending of a code block. In the demo program I use two spaces for indentation rather than the more usual four spaces, in order to save space.

Notice that you don't have to explicitly declare class variables. Even though all class method definitions (except those marked as @staticmethod) must include the "self" parameter, when a class method is called, the "self" argument is not used. But when accessing a class method or variable, the "self" keyword with dot notation must precede all method and variable names.

Next, three NumPy vectors are created to hold the input, hidden and output nodes:

self.iNodes = np.zeros(shape=[self.ni], dtype=np.float32)
self.hNodes = np.zeros(shape=[self.nh], dtype=np.float32)
self.oNodes = np.zeros(shape=[self.no], dtype=np.float32)

The syntax should be interpretable by you if you have experience with a C-family language. The default data type for the NumPy zeros function is "float64" (equivalent to the C# type "double"). Specifying float32 is common for neural networks because 64-bit precision is rarely needed. Next, matrices for the node-to-node weights and vectors for the node bias values are initialized:

self.ihWeights = np.zeros(shape=[self.ni,self.nh], dtype=np.float32)
self.hoWeights = np.zeros(shape=[self.nh,self.no], dtype=np.float32)
self.hBiases = np.zeros(shape=[self.nh], dtype=np.float32)
self.oBiases = np.zeros(shape=[self.no], dtype=np.float32)

For the ihWeights ("input-to-hidden") matrix, the first (row) index references an input node and the second (column) index references a hidden node. Similarly, the row index of hoWeights references a hidden node and the column index references an output node. For example, hoWeights[2,0] holds the weight that connects hidden node [2] with output node [0].

The __init__ method concludes with:

...
  self.rnd = random.Random(0) 
  self.initializeWeights()
# end __init__

comments powered by Disqus

Featured

Subscribe on YouTube