{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial: Implement a Neural Network from Scratch with Python\n", "\n", "In this tutorial, we will see how to write code to run a neural network model that can be used for regression or classification problems.\n", "We will **NOT** use fancy libraries like Keras, Pytorch or Tensorflow. Instead the neural network will be implemented using only numpy for numerical computation and scipy for the training process.\n", "\n", "The necessary libraries are:\n", "- Numpy: for numerical computation;\n", "- Scipy.optimize.minimize: to train the neural network;\n", "- Scipy.stats.pearsonr: to test goodness of fit.\n", "\n", "I will go over these libraries with a bit more detail later. First, import all the libraries mentioned above." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "from scipy.optimize import minimize\n", "from scipy.stats import pearsonr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The dataset used consists in a simple way, containing the following data, stored in *data.csv*:\n", "\n", "| x_1 | x_2 | y |\n", "|---- | ----| --- |\n", "| 1 | 0 | 1|\n", "| 0 | 1 | 0|\n", "| 1 | 1 | 0|\n", "| 0 | 0 | 1|" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[1., 0., 1.],\n", " [0., 1., 0.],\n", " [1., 1., 0.],\n", " [0., 0., 1.]])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = []\n", "with open('data.csv','r') as f:\n", " f.readline()\n", " for line in f:\n", " line = [float(val) for val in line[:-1].split(',')]\n", " data.append(line)\n", "data = np.array(data)\n", "data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Separate the columns, using the first two for the input `X`, while the last column is considered the output `y`." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[1. 0.]\n", " [0. 1.]\n", " [1. 1.]\n", " [0. 0.]]\n", "[[1.]\n", " [0.]\n", " [0.]\n", " [1.]]\n" ] } ], "source": [ "X = data[:,0:2]\n", "y = data[:,2].reshape(-1,1)\n", "print(X)\n", "print(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Structuring the Neural Network\n", "\n", "The neural network consists in a mathematical model that mimics the human brain, through the concepts of connected nodes in a network, with a propagation of signal. Each neuron contains an activation function, which may vary depending on the problem and on the programmer. A very common function used due to its felixibility and capablity of generation is the sigmoid (or logistic) function, which can be written as:\n", "\n", "$$\n", "y = \\frac{1}{1+e^{-w x}}\n", "$$\n", "\n", "Where $w$ are the weights (parameters) of the nerual network which should be optimized. The following function can be used in Python to define the sigmoid function." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "def sigmoid(w,x):\n", " # x - Nx2 input matrix\n", " # w - 2x1 parameter vector\n", " # output should be 4x1\n", " return 1/(1+np.exp(x.dot(w)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following cell is used to test the function above for correctnes fo the result. A dummy weight vector `w` is created for the purpose of this testing." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0.37754067]\n", " [0.62245933]\n", " [0.5 ]\n", " [0.5 ]]\n" ] } ], "source": [ "# test sigmoid function\n", "w = np.array([0.5,-0.5]).reshape(-1,1)\n", "print(sigmoid(w,X))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will implement the neural network using an Object-Oriented approach. This means we will write a class which will emulathe the model, and it will contain the functions to optimize its parameters and to test it, i.e perform predictions. Start by writing the class definition using the keyword `class` and the initialization function, contained in the `__init__()` function." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "class NeuralNetwork:\n", " def __init__(self,x,y,neurons):\n", " self.input = x\n", " self.obsOutput = y\n", " self.output = np.zeros((y.shape[0],1))\n", " self.inputWeights = np.random.rand(x.shape[1],neurons)\n", " self.outputWeights = np.random.rand(neurons,1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The class `NeuralNetwork` contain the following properties, as specified above:\n", "+ It stores the inputs in a property `input`\n", "+ It stores the observed outputs (targets) in a property `y`\n", "+ The shape of the targets is used to generate the output array, stored in `output` property. It is initialized with zeros.\n", "+ The weights of the inputs are stored in `inputWeights`. It is initialized with random values, and its shape is properly configured using the input `x` array and the number of neurons `neurons`.\n", "+ The weights of the hidden layer are stored in `outputWeights`. It is initialized with random values, and its shape is properly configured using the number of neurons `neurons`.\n", "\n", "Let's test and see if the `NeuralNetwork` class is not throwing any errors up to this point." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[0.90518084 0.71031147]\n", " [0.75555662 0.19589718]]\n", "[[0.00156605]\n", " [0.64047559]]\n" ] } ], "source": [ "net = NeuralNetwork(X,y,2)\n", "print(net.inputWeights)\n", "print(net.outputWeights)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that the values of weights, printed above, are simple generated randomly and do not hold any meaning up to this point.\n", "\n", "We can obtain the output of the hidden layer by applying the `sigmoid` function to the input weights of the neural network and the input array, as shown below." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.287987 , 0.32953002],\n", " [0.31961175, 0.45118172],\n", " [0.15966303, 0.28777629],\n", " [0.5 , 0.5 ]])" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# How to obtain the output of the hidden layer neurons\n", "sigmoid(net.inputWeights,X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To extend it, the output of the complete network is obtained by doing matrix multiplication of the `outputWeights` with the output of the hidden layer, shown above." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.21150694],\n", " [0.28947141],\n", " [0.18456373],\n", " [0.32102082]])" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# How to obtain the output of the output layer neuron (assuming linear)\n", "sigmoid(net.inputWeights,X).dot(net.outputWeights)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's make this code a bit more permanent, by writing a function `predict`, which we will use to generate the output of the neural network given an array of input. This function can be appended to the class `NeuralNetwork`, by using the `setattr` function of Python. Notice that this function work in the following way:\n", "\n", "```python\n", "'''\n", "setattr(class_name,method_to_be_created,function_name)\n", "\n", "class_name -> name of the class which the function will be appended to.\n", "method_to_be_created -> a string defining the name of the method implemented in the class, so it can be called by\n", " using class_name.method_name()\n", "function_name -> a call to the function to be appended, in this case predict.\n", "'''\n", "```" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "# Implementation of the prediction process of ANN\n", "def predict(self,X):\n", " # hidden layer output HL\n", " self.HL = sigmoid(self.inputWeights,X)\n", " # output layer (network output)\n", " self.OL = self.HL.dot(self.outputWeights)\n", " return self.OL\n", "\n", "# give the method to the class Neural Network\n", "setattr(NeuralNetwork,'predict',predict)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A very interesting fact is that, once the new method is appended to the class, every object previously created will already contain the method. We can test it by calling the `predict` method from the object `net` already created." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[0.21150694],\n", " [0.28947141],\n", " [0.18456373],\n", " [0.32102082]])" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "net.predict(X)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training the neural network" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here comes what I see as the most complex part to be implemented in the model, the training part. It consists in basic two main things:\n", "+ cost function - a way of evaluating how good the model is with its current parameters;\n", "+ optimization function - a way of reducing the cost by modifying the value of the parameters;\n", "\n", "To avoid growing too much the complexity of this exercise, we will use the [`minimize`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html) from the package `scipy.optimize`. This function will take as arguments the cost function, which we will implement, and the parameters.\n", "\n", "The issue here is that the `minimize` function is build to take the parameters to be optimized as a single vector. However, you saw that in the structure of our neural network we have the neural networ as matrices stored in two properties: `inputWeights` and `outputWeights`. So we need to join them to (using [`np.concatenate()`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html)) and them reshape it to one row, using [`np.reshape`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html).\n", "\n", "Also, after the optimization we have to get the optimum parameters and reshape them again to place back in `inputWeights` and `outputWeights`.\n", "\n", "Again, we define the function and append it to the class `NeuralNetwork` using the `setattr` function." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "def fit(self):\n", " x0 = np.concatenate([net.inputWeights,net.outputWeights.T]).reshape(-1,)\n", " res = minimize(self.cost,x0)\n", " \n", " self.inputWeights = res.x.reshape(3,-1)[:-1,:]\n", " self.outputWeights = res.x.reshape(3,-1)[-1,:].reshape(-1,1)\n", " return res\n", "\n", "setattr(NeuralNetwork,'fit',fit)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To calculate the **cost**, we need to define a way to define a value that will tell us how good the model is. Intuitively, we can think in terms of the **errors**. If errors are high, then the cost should be high. On the other hand, a low cost should indicate a low error. So let's start defining what is an error.\n", "*The error consists in the absolute difference between the expected value (target) and the predicted value.*\n", "This difference can be mathematically expressed as:\n", "\n", "$$\n", "(y_{obs} - y_{pred})\n", "$$\n", "\n", "Where $y_{obs}$ is the observed value and expected result, while $y_{pred}$ is the output of the neural network, the predicted result. Suppose that we have the following observations $y_{obs}$:\n", "\n", "$$\n", "y_{obs} = [1,-1,-1]\n", "$$\n", "\n", "And the corresponding predictions are:\n", "\n", "$$\n", "y_{obs} = [1,1,-3]\n", "$$\n", "\n", "Obtaining the errors\n", "\n", "$$\n", "(y_{obs} - y_{pred}) = [(1-1),(-1-(1)),(-1-(-3))] = [0,-2,2]\n", "$$\n", "\n", "We obtain a vector with 3 errors, each one corresponding to a pair expected/predicted value. But the cost needs to be a single value. Again, intuitively, we could think of obtaining a single error value by summing up all the errors. Doing that for the vector above produces,\n", "\n", "$$\n", "0 + (-2) + 2 = 0\n", "$$\n", "\n", "The sum of errors is 0! This result make it looks like there is no error, though we already know that there is differences between prediction and target. This can be corrected by squaring each value and them summing it up.\n", "\n", "$$\n", "0^2 + (-2)^2 + 2^2 = 8\n", "$$\n", "\n", "If this method, we avoid error cancellation, because any value will become positive. This metric is called the sum of squared errors (SSE), and can be expressed generically as:\n", "\n", "$$\n", "J = \\sum (y_{obs} - y_{pred})^2\n", "$$\n", "\n", "If we desire to obtain an average squared error instead of the sum, we can divide the SSE by the number of points, thus obtaining the mean of squared errors (MSE), expressed as:\n", "\n", "$$\n", "J = \\frac{1}{n}\\sum (y_{obs} - y_{pred})^2\n", "$$\n", "\n", "For this exercise let's use the SSE as our cost. We define a function to calcualte that and append it to the class, as it was done with previous methods." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "def cost(self,x):\n", " self.inputWeights = x.reshape(3,-1)[:-1,:]\n", " self.outputWeights = x.reshape(3,-1)[-1,:].reshape(-1,1)\n", " \n", " ypred = self.predict(self.input)\n", " \n", " J = np.sum((self.obsOutput - ypred)**2)\n", " \n", " return J\n", "\n", "setattr(NeuralNetwork,'cost',cost)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Test the cost by calling it once (remember to concatenate and reshape the weights so it works properly!)" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.2005915066240083" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "net.cost(np.concatenate([net.inputWeights,net.outputWeights.T]))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can call the fit() method on the neural network to find the optimum parameters that will show the minimum error of the predictions and observations. This is done in the following code. Notice that the output shown is the result from the minimize function on scipy.optimize." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " fun: 2.255136144862212e-09\n", " hess_inv: array([[ 4.14164529e+03, 7.06430001e+03, 1.78716005e+05,\n", " 2.61958344e+05, -2.18298665e+04, 2.18235283e+04],\n", " [ 7.06430001e+03, 1.20546383e+04, 3.04875328e+05,\n", " 4.46879826e+05, -3.72395856e+04, 3.72291055e+04],\n", " [ 1.78716005e+05, 3.04875328e+05, 7.71449754e+06,\n", " 1.13077626e+07, -9.42327432e+05, 9.42039179e+05],\n", " [ 2.61958344e+05, 4.46879826e+05, 1.13077626e+07,\n", " 1.65747049e+07, -1.38124557e+06, 1.38082329e+06],\n", " [-2.18298665e+04, -3.72395856e+04, -9.42327432e+05,\n", " -1.38124557e+06, 1.15106690e+05, -1.15070568e+05],\n", " [ 2.18235283e+04, 3.72291055e+04, 9.42039179e+05,\n", " 1.38082329e+06, -1.15070568e+05, 1.15035689e+05]])\n", " jac: array([-2.44188025e-08, -3.92603844e-06, -5.85603566e-10, -3.89227306e-09,\n", " 2.09829900e-06, 3.32048000e-06])\n", " message: 'Optimization terminated successfully.'\n", " nfev: 264\n", " nit: 32\n", " njev: 33\n", " status: 0\n", " success: True\n", " x: array([ 6.56030353e-01, -4.35259814e-03, 7.77890879e+00, 1.11330054e+01,\n", " 1.36193289e-02, 1.98637937e+00])" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "net.fit()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To finish the exercise, let's generate a vector of predictions using the optimum parameters and compare it with the observations. We test how good this prediction is by using the pearson coefficient, also from scipy package.\n", "\n", "Pearson coefficient (also known as R-squared) is a value that ranges from 0-1, where 1 is a perfect prediction (all the predictions match the targets), while 0 means that the predictions are as good as using the average target as a predictor (a quite poor model). Therefore our objective is to obtain a pearson coefficient as near to 1 as possible.\n", "\n", "The pearsonr() function also returns a value which is the p-value. I will not enter into details about this statistical parameter here, but what is important to mention is that it should be as low as possible, ideally less than 0.05 or 0.01, so we can statistically accept the predictions." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "ypred = net.predict(X)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([1.]), array([7.01427805e-12]))" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pearsonr(y,ypred)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" } }, "nbformat": 4, "nbformat_minor": 2 }