In this article, we will discuss artificial intelligence and deep learning. These topics, which we have heard frequently in recent years, seem strange and mysterious to many people. However, with a correct explanation, basic concepts can become understandable, and you can even contribute to the development of deep learning and artificial intelligence. Without further ado, let’s start learning.
What is Deep Learning?
Deep learning describes a popular type of multilayer neural network . In this article, we will use a three-layer neural network as an example.
Deep learning is a subset of machine learning.
Machine learning is a subset of artificial intelligence.
There are four main concepts in deep learning. These:
- Feed forward
- Global minimum
- Gradual descent
- back propagation
These concepts may seem complicated and incomprehensible at first. But we will explain all of them with long and detailed examples.
Let’s say you have a pet store. You started selling a new cleaning product for cats a month ago. The name of this new product is “My Cat Mis!” never mind. A significant part of your business success may depend on your smart use of artificial intelligence. You can create a deep learning neural network to determine who to send your ads to.
Your secret weapon is the dataset. “My Cat is Mis!” During the last month you sold, you collected data by surveying your pet shop customers. In this survey, you asked the following questions:
1- Do you have a cat that poops?
2- Do you drink imported beer?
3- Have you visited our website Kedimmis.com in the last month?
4- My Cat Mis for your cat who has pooped in the last month! did you buy
The answers you get to these four questions create the characteristics (characteristics) of your past customers.
Important question: Why is AI so powerful? Because it can educate itself using the answers of your old customer surveys and accurately predict the purchasing behavior of your future customers.
First, you train your neural network with your past customers’ answers to the first three survey questions. The network uses this data for the customer’s My Cat Mis! It is used to predict whether or not he has received it. The network then compares its prediction with the answer to the fourth question. The answers to the fourth question are the facts against which your network compares its predictions. For example, your network will analyze the first three answers and say, “This customer is my Cat Mis! If the answer to the fourth question is indeed “Yes”, you have a successful neural network.
The network trains itself by trial and error. The network makes predictions, then compares those predictions with actual answers to question four. Thus, after many iterations, it learns from its mistakes and improves its predictions.
It is important to understand that a neural network always works with one dataset and makes predictions on another dataset. Once your network has passed on customer surveys, my Kedim Mis! Once successful in predicting buyers, it can be released to a new dataset, i.e. new customers.
For example, if you get a dataset from a veterinarian in your area where people answer the first three questions of the survey, your neural network can predict who among those people you should send your ads to. Ingenious, isn’t it? So how does this really happen?
The Big Picture: An Analogy of a Brain with Neurons and Synapses
Below is the schematic of the 3-layer neural network we will build in the form of “neurons and synapses”.
This figure shows the neural network you will use to predict the people you should target for the ad. It’s a three-layer feed-forward neural network. The input layer is on the left. The first three circles represent neurons (or nodes or features). So in the diagram, a column of three questions shows a customer’s answers. For example, for someone who answers “yes” to the first question, “no” to the second question, and “yes” to the third question, the top circle gets the value 1, the middle 0, and the bottom circle 1.
The synapses (lines connecting the circles) and the hidden layer are where the brain of the neural network carries out thinking. The single circle attached to the four synapses on the right represents the prediction of the network. “According to the combination of input attributes given to me, this client’s My Cat Mis! The probability of receiving (or not receiving) is this”.
The stand-alone circle labeled “y” on the right represents the truth, that is, each customer’s answer to the fourth question. There are only two options for this apartment. “0” stands for “no, I did not buy” while “1” represents “yes, I bought”. Our neural network makes a probability estimation, compares how accurate it is with reality, and learns from its mistakes and tries to be more successful in the next attempt. This trial-and-error process repeats thousands of times in seconds.
To summarize, the diagram above is a commonly used method for describing neural networks. It is essentially a picture of feed forward. Feed forward was one of our four core concepts. You may think that neurons are the most important part here, but they are not. In fact, it is the synapses that drive the four main concepts of deep learning. So what you need to learn in this chapter is that synapses are the most important factor in the prediction process. Below we have explained this process with the analogy of a ping pong ball rolling in a bowl.
The Big Picture: Bowl and Ball Parable
In this section, we will explain why artificial intelligence is so powerful. AI uses probability to make incremental improvements to its next prediction. This process takes learning by trial and error to a new level.
Let’s think about how one guesses. Have a stack of surveys from your old customers on your desk. Next to it is another pile of surveys of potential new customers you got from the vet. How does a person predict the purchasing behavior of future customers using past surveys? Maybe you thought: “My logic is with those who drink imported beer, My Cat Mis! says that there is probably no relationship between the fields. In addition, those who have cats and visit my website when they browse through the customer surveys are My Cat Mis! I realized that he was more likely to get it.”
Beautiful. Congratulations. With only four customers answering a three-question survey, making such inferences may not be difficult. But what if there are 4000 people answering a 40-question survey? It would then be very difficult for a person to identify the key factors that will enable them to make accurate predictions.
Neural network client My Cat Mis! It does not limit itself to black-and-white results such as “1: Yes” or “2: No” when making its first guess as to whether or not he got it. Instead, it sets a number in the spectrum that continues from zero to one. For example 0.67, “My Cat Mis! The probability of getting it is 67%. 0.13 means that the probability of buying is 13%.
Once a probability between zero and one is determined, it becomes trivial that the computer’s initial guess is so bad. The important thing is to compare the computer prediction with reality. For example, it compares 0.13 with the actual result of 1. It makes corrections when it sees that the neural network estimate is 0.87, largely incorrect. The neural network can increase or decrease some values to find the right combination that will make the next prediction more accurate. After tens of thousands of repetitions, the difference between the computer’s prediction and the real one approaches zero. At this point, it can be said that the neural network now makes the correct prediction.
What makes deep learning neural networks powerful is that they can make incremental improvements in their next prediction using probability and trial and error.
The trial-and-error learning process of the net can be compared to the movement of a ping pong ball that rolls off the side of a bowl and eventually stops at the bottom. Here the bottom of the bowl represents the perfect correct guess. The first guess is where the ball starts moving off the side. On the second guess, the ball skidded slightly off the edge and approached its final resting point. On the third guess, the ball slipped a little lower, and so on. With each guess, the ball gets a little closer to the perfect spot at the bottom of the bowl.
It actually takes four steps for the ball to roll and stop at the perfect spot. Let’s briefly explain these four steps:
1- Feed Forward: Think old room-size IBM computers of the 1960s. Data cards were fed into the machine at one end and miraculous results were obtained from the other end. Our network also takes data from the first three questions of the survey and feeds it to the computer to generate an estimate.
2- Global Minimum: Imagine the bowl above is standing on a table. The surface of the table represents the perfect estimate with almost zero errors. The bottom of the bowl is closest to this perfect estimate. Compared to the entire surface of the bowl (global surface), the bottom is closest to perfection. This is the “global minimum” for error.
As the net guesses better, the ball rolls down the side of the bowl and approaches the global minimum error point at the bottom. After each prediction, the network compares the prediction with the reality of the fourth question. It’s like measuring how far the ball is from the bottom at a given moment. Measuring the difference between prediction and reality is called error finding. The goal of the network with each prediction is to bring this error to a global minimum.
3- Back Spread: Consider a juggler performing with 16 pins of different sizes and weights. Since he can predict the weight and size of the pins while they are in the air, he can hold them all in the air without dropping them. After making the prediction, the network returns to the previous prediction process and works to see what changes will reduce the error in the next prediction.
4- Gradual Descent: Imagine the ball descending to the global minimum in the bowl above. Gradual descent can be likened to the prediction ball rolling down the bowl surface.
Let’s redefine the above concepts with different sentences. Gradual descent is the process of correcting the network’s predictions by trial and error until they are more accurate. Feed forward means guessing. The prediction can be thought of as a snapshot of where the ball is in the bowl at a given moment. It is the global minimum point of excellence. It is the point at the bottom of the bowl that has no margin of error. The goal is to reach the bottom of the bowl. The network measures its error by comparing its prediction to reality. Backpropagation is the process of finding and correcting what went wrong by working towards the previous guess. It means figuring out how to get the ball closer to the bottom of the bowl from the current position.
Here we describe the concepts in general terms. Therefore, it is very normal to have things that you do not understand. The bowl analogy above illustrates the four steps of the neural network training process, but it’s too simplistic. The ball goes down a single, straight course to the bottom of a simple bowl. This is “Do you have a cat?” may represent a single-question survey, such as In our survey, we want to find the best combination of three questions for the best guess. In this way, the bowl, which represents a questionnaire with different questions, has a nodular shape with many pits and bumps.
The red bowl above has serious pits and bumps. So why? Looking at the shape, you might think that the red bowl is made of plastic. It actually consists of millions of red dots. Each point corresponds to a place in three-dimensional space with X, Y, and Z axes. Each point is a possible combination of our survey questions. It also provides a possible hierarchy of which questions the network should prioritize when making predictions. This lumpy bowl represents the landscape of possible permutations that a four-question survey provides in guessing truth.
Note that the red bowl is the landscape or landscape of all possible permutations. This does not completely exclude every possible permutation because calculating them requires a lot of computing power. Instead, it starts where you randomly drop the probability ball and goes all the way to the valley floor, which forms the global minimum.
The top of the dashed white line in the upper right corner of the figure corresponds to the network’s initial guess. The dashed line represents the network’s predictions to reach the bottom global minimum. Our ball travels along this line with a large number of predictions, achieving minimum error and maximum accuracy in the prediction.
So why is the white-lined path followed by the prediction ball crooked? This crooked path comes from the network’s constant attempts to figure out which questions we should combine. Find out how much weight (emphasis) should be given to each question for fewer errors. The goal of the network is to reduce errors as fast as possible. For this, it determines the steepest descent direction from the point where the prediction ball is located. The road is curved as the slope is constantly changing.
At first glance, “Do you have a cat?” and “Have you visited our website?” We can say that the questions are the most decisive in basing our predictions. So which question is a bigger factor in predicting who buys the cat hygiene product? Is owning a cat a bigger determinant? Do two questions weigh 50/50?
The network experiments with many different weightings of these two questions. Thus, it finds which combination is more accurate. Each bump we see in the red bowl represents trials where these questions were weighted incorrectly. Because the bumps push the ball away from the global minimum at the bottom. Each hole represents a weighting in the right direction because the holes bring the ball closer to the bottom. Each pit is called a local minimum. One of the problems in gradient descent is to distinguish between local minimums and global minimums. This problem is often expressed in a process called “momentum”. Momentum is beyond the scope of this article.
But what if our network has found the perfect weighting of the questions, but their estimates still contain, say, 60% accuracy? There is one more tool our network uses in this situation: inference correlations.
Let’s go back to the import beer drinking question to understand the inference correlation. Networks constantly try different combinations of questions. For example, rich people may have more money to buy imported beer. Cat owners also tend to be wealthy people with excellent pet taste, right? Therefore, by combining the questions about drinking imported beer and owning a cat, the weights can be increased in the calculations of the network and the estimates can be improved. The inference correlation is here: “Rich people are My Cat Mis! more inclined to take
Our guess ball, our neural network, rich people’s My Cat Mis! he may come across a bump while making the inference correlation he received. This means that inference correlation on wealthy customers does not help improve forecast accuracy. This may be because Kedim Mis! is a necessity product such as paper towels or socks, rather than a luxury product. In this case, the prediction ball will roll away from the bump, the useless inference correlation associated with wealthy customers, on subsequent attempts. But it will also use useful inference correlations that bring the ball closer to the bottom.
The above example was to show how inference correlations were tested for their usefulness for accurate prediction. People use their logic to search for inference correlations, but the network tries every possible permutation. If the experiment yields slightly better results than before, it is held. If it performs worse, it is eliminated.
To summarize, each dot that makes up the lumpy red bowl surface represents the network’s experiment with a certain combination and weight of questions. Each trough corresponds to a step in the right direction. The network maintains these steps. Every bump is a movement in the wrong direction. The network eliminates them. The trajectory of the prediction ball may curve slightly as the ball avoids bumps and looks for potholes. The white-lined path is the ball’s most efficient route, with improvement upon improvement, to arrive at the most accurate prediction of all possible outcomes.
- Deep Learning Overview – 2