# Deep Learning Overview – 2

This article is the continuation of our article titled “ Deep Learning Overview ”. First of all, you need to read that article.

## Feed Forward: Printing Punched Cards in a 1960s IBM Computer

The purpose of feedforward is to make an estimate. Each guess is a new spot where the prediction ball passes on its way from the edge of the bowl to the error-free bottom. The neural network’s first guess as to which customers will buy the cat hygiene product is the top of the white dashed line in the bowl with the red bump. The guess ball starts its journey from here. Feedforward is the process of generating the first prediction with the prediction ball. The next point in the dashed white line corresponds to the second guess. The prediction ball rolls towards here. Then comes the third guess and so on.

Our red nodular bowl is not suspended in the vacuum of space. It is located in an area with the X and Y axes. The location of the prediction ball is also shown with these coordinates.

The simple bowl we used to describe the global minimum was on a table. This is similar in that the bowl with the red nodules sits on the grid of the X and Y axes. The white grid represents the table surface, while the “perfect accurate guess” corresponds to where the red bowl sits. The only point of contact of the bowl with the table surface is the bottom, where the dashed white line ends below. This is the global minimum.

Each stop of the prediction ball in the red bowl is identified by 3 coordinates. The X and Y axes define the location on the grid (table surface), while the Z coordinate represents the distance of the prediction ball from the grid.

I know it’s a bit confusing. What do these 3 coordinates in 3D space mean in real life?

We said that each red dot on the surface of the bowl with red nodules corresponds to an experiment by the neural network with a certain weighted combination of three survey questions. How does the network do this experiment? Suggests a specific location that represents a specific combination of survey questions using the X and Y coordinates on the white grid. The X and Y axes essentially ask: “Hey, Z axis! How is the combination with this emphasis/weight?” The Z coordinate tells us how wrong this particular combination is. The Z coordinate measures the error.

Let’s think again. We said that the point of the bottom of the bowl touching the white grid is the perfect correct guess. So each red dot in the bowl indicates the error. The further away the red dot is from the white grid, the larger the error.

Let’s repeat this topic because it is very important. The white grid represents every question combination and emphasis the network can try. For each point on the white grid, the red dot directly above it shows the error amount of this particular combination. The red bowl is actually the error bowl. Only at the bottom of the bowl where the red dot touches the white grid is the distance between the red dot and the white dots and there is no error.

Note the cone-shaped yellow arrow in the chart. This yellow arrow represents the Z-axis and the Z-axis is the measure of error. The amount of error the yellow arrow denotes is very useful because it makes it possible to make fewer mistakes next time by learning from here.

Now let’s imagine that the prediction ball is at each point of the dashed white line and the yellow arrow moves together, each time being under the prediction ball. The ball represents each bet made by the net, while the yellow arrow below the ball measures the distance of the prediction ball to the white grid. That is, it measures the error of each prediction. So how does this happen?

We said that the feedforward process takes the first three survey questions and combines them in different ways to make an estimate, producing a number that tells a probability between 0 and 1. Suppose the prediction ball is located at a point on the dashed white line. We know that the answer to the fourth question is “yes” or 1, so in reality the customer is My Cat Mis! got it. The real is the global minimum, so the goal we want to achieve by improving our estimates is the point where the red bowl touches the white grid. So our network subtracts the number of prediction probabilities from 1. If the initial estimate is 0.5, it means that the customer has a 50% probability of purchasing the product. The length of the yellow arrow is the error measurement. In this example, 1 – 0.5 = 0.5. The error is 0.5 (the error is always expressed in absolute value, so it cannot be negative).

Have you thought about it? You cannot train a network without the fourth survey question because you need a fact to compare your predictions to. The yellow arrow measures the extent to which the net misses the truth, that is, its error. Our initial guess of 0.5 is not quite right. Whereas, an estimate of 0.99 would be more accurate as it is much closer to the truth.

Now, let’s share some information that you may not understand right away on the first reading. The two orange arrows representing the X and Y axes on the grid are actually syn0 and syn1 synapses, which we’ll learn later. For now let’s say syn0 is equal to X-axis and 3, syn1 is equal to Y-axis and 3.

Do you remember the Yes / No / Yes answers the first customer gave to the first three questions? Our network takes these as 1, 0, 1 for the first client; It multiplies the numbers at the syn0 and syn1 synapses (and does some other cool stuff) and makes a guess. We said that this estimate is 0.5. This is our first feed-forward prediction and is where our prediction ball is located on the dashed white line on the red bowl.

## Finding the Global Minimum

We said that the purpose of training our network is to find the fastest way to reduce the error in our predictions. In other words, we want the yellow arrow to shorten. For this, our prediction ball needs to roll from the starting point to the lowest point of the bent bowl (the point where it touches the table) in the most efficient way. We can call it Utopia. In utopia, the Z-axis that measures the error of our predictions is almost zero in length, that is, the global minimum. Our predictions contain minimal error and our mesh is stunningly accurate. The most efficient route in which our prediction ball rolls off the surface of the sloping red bowl is represented by the dashed white line. When we have completed our training with the first data set, our network will determine which of the potential customers we have obtained from the veterinarian will be called My Cat Mis! can accurately predict that he or she is most likely to receive it.

Read More  What is the hippocampus? Anatomy and Duties

Now consider this: Our guess ball starts at the top of the dashed white line. Here the margin of error of our initial estimate was 0.5. So how do we roll the ball down to where the error approaches zero? How are we going to fine-tune our mesh so that our prediction ball, which is now at (3.3) as the X,Y coordinate, travels along the dashed white line to reach the bottom of the bowl, approximately at (3.0)? So far we have made only one guess based on the first customer’s answers. How will we improve each of our follow-up estimates with the answers of each customer and bring our margin of error to almost zero?

Finding the way to the bottom of the bowl, step-by-step, from our unflattering initial estimate of 0.5 to an accuracy of 0.9999 after the 60,000th guess, is a process of “descending”.

Gradual descent is a fancy term for trial-and-error learning of a network. Gradual descent is the master plan of varying the weights of synapses to reduce the amount of error in each trial. Paraphrased differently: Follow the steepest incline path to get the prediction ball to the bottom as quickly as possible. Back propagation is a method used to calculate the slope. Back propagation tells us the slope of the bowl surface under the prediction ball.

Yes, it’s all about the slope. “Graduated descent” is a cool term, but it actually just describes the slope. Finding the slope of the bowl surface where our prediction ball sits tells us the direction in which the ball should roll. Thus, the fastest descent to the bottom point where the margin of error approaches zero can be achieved.

Let us now consider the gradient descent process. First, the computer does the feed forward estimation. This subtracts the estimate from the “true” (the base of the global minimum) and obtains the error (length of the yellow arrow). It is then used to calculate the slope of the back propagation error. The slope value determines the direction and speed at which the prediction ball should roll (how big or how small to adjust the numbers on the net).

Finding the slope is a key tool in back propagation. If gradient descent is the master plan, then back propagation is the main tool used to achieve this plan. Let’s expand a little more:

### Gradual Descent is the Master Plan

What exactly does gradient descent do? It reduces the error of our estimation at each trial by changing the weights.

So far we’ve talked about how the network experiments with different combinations of survey questions and different emphases on those questions to make the best possible estimate. So what is the weight? Weight is a number that determines how important a question is for forecasting. This is a very important concept. Weights are values ​​(numbers) in our two groups of synapses. By playing with these synapse numbers, or their weights, the network experiments with different combinations of questions and the emphasis of the questions within these combinations.

Read More  What is Shortwave Diathermy? Why is it used?

Each line connecting neurons in the yellow diagram is a synapse or weight. If a weight has a large value, it means “this survey question has a large impact on the prediction of the network”. In other words, this survey question greatly contributes to correct guessing. If the weight is close to zero, it means that the feature/problem has little to do with making a correct guess.

### Back Propagation is the Main Tool for Achieving Gradual Descent

It is the method we use to calculate the degree (slope) of back propagation. This rating tells us how much we should re-prioritize each weight on the next try. Each point where the prediction ball stops on the dashed white line on its way down is a trial/replay of the neural network.

So what is an experiment? This is a very important question. It is an experiment in the learning process by trial and error. The first part of the experiment is the feed forward. With forward feed, our prediction ball is placed in the starting position at the top of the dashed white line. It is then rolled across the surface of the red bowl from there to the bottom. At the bottom is the global minimum where the error is almost zero. Each dot in the white dashed line represents an update or adjustment to the network’s suggested combination of survey questions. A slightly better estimate can be made by changing the weights of the questions.

The downward path of the prediction ball across the curved red bowl surface is uneven. The ball can roll over some bumps, go down some valleys, change direction abruptly. To understand what it is, let’s first consider the Z coordinate that forms the perpendicular axis. The yellow arrow denoting the Z coordinate moves with the ball, just below the ball. There are many variations in the length of the yellow arrow. As the ball approaches the bottom of the bowl, the yellow arrow shortens and approaches zero. Here the difference between prediction and reality is almost zero. So our guesses are correct. When the yellow arrow equals zero, the X and Y axes should be just below the bowl, at the global minimum.

So far, we’ve tried to outline how a neural network trains itself with a simple survey of pet store customers. The next step is to open the hood and learn the codes that make it possible to learn the network by trial and error process. Being able to imagine in three dimensions what the neural network does can help us understand what is done with math and coding. If you don’t fully understand the concepts we’ve talked about so far, don’t worry. Deep learning is a complex subject and you shouldn’t expect to learn with one reading. Your understanding will improve as you understand more concepts and see different examples.