How Neural Networks Learn

A 5 minute Breakdown of How Neural Networks Actually Work

Jul 12, 2024

Hey 👋! I’m Arjun, a 16 year-old working in the intersection of using AI to improve cancer treatments and diagnostics. Welcome to import learn ai: a one-stop shop for everything AI/ML/DL. Enjoy!

The explosion of AI in the last few years has changed the course of humanity forever. New models like ChatGPT, GPT-4, and Med-PaLM-2 are capable of things we couldn’t imagine just a decade ago.

But, how do these models even work? The answer: Neural Networks.

Neural Networks are the Brain of AI

Training Deep Neural Networks. Deep Learning Accessories | by Ravindra Parmar | Towards Data Science — A basic deep neural network

Neural Networks are Function Approximators

At a high-level, neural networks act as functions.

If we know the function and have it defined, it is possible to calculate any output given an input. In other words, for any input, there exists an output.

Math: What Is A Function? Here Is A Simple Way To Look At, 46% OFF

Let’s say we are given some data that contains inputs and outputs, but not a function. Think of these as points on a graph that are not connected.

With this information, we could create a function that would model these inputs and outputs. And, we could use that function to calculate outputs for inputs not included in our dataset. Hence, a prediction.

Taken from the YouTube Channel @EmergentGardens

This function approximator is a neural network.

So neural networks, more precisely are function approximators.

A Neural Network is Modelled LooselyAfter the Brain

When you hear the term “Neural Networks,” the first thing that probably pops into your mind is the brain. It turns out, Neural Networks are loosely modelled after the brain.

A Gentle Introduction to Neural Networks - CleverTap — Neural Networks are not to be confused by a Neuron in the brain

Essentially, neural networks are a network of neurons. For now, we will define neurons as something that can store a number between 0 and 1. This value, between 0 and 1 is called an Activation.

All of the neurons are densely interconnected, are categorized into layers. Even basic neural networks can have thousands or even millions of these neurons.

Different layers in the network have different functions, but collectively, act as a function. In almost every neural network there are 3 primary layers:

Input Layer: This is where the inputs are fed into. Inputs are usually labeled data that the machine must classify.
Hidden Layers: These layers perform different functions at the same time.
Output Layer: The activation of the neuron in the output layer corresponds to how much confidence the system has in its prediction.

what is a 'layer' in a neural network - Stack Overflow — A Basic Neural Network Showing the Layered Structure

Each neuron in every later takes in outputs of the previous layer as its input and spits out a computed output (activation) that will be sent to the next neuron(s).

Weights, Biases, and Activations

For each incoming connection, a neuron will assign a weight and bias.

Weights are just numbers that determine the strength of the connections between the neurons. Every weight is multiplied by the activation of the previous neuron, which adjusts in the incoming data. During the training of a neural network, the weights are adjusted so there is less deviation from the network’s predicted values and the actual.

Biases are just constants that are added/subtracted to the weighted sum of the network.

To summarize, weights are multiplied by activations and this sum is added/subtracted to the bias.

Weights and Bias in a Neural Network | Towards Data Science — Mathematical Visualization of Neural Networks

But wait, there’s more.

When this sum is plotted on a number line, any number (activation) can come out. So how do neural networks fit these activations between 0 and 1?

Neural Networks make use of an activation function. This is usually a logistic curve that squishes all of the outputs to a value between 0 and 1. In the case of a logistic curve called “the sigmoid function” visualized below, our weighted sum is multiplied by σ.

Activation Functions — All You Need To Know! | by Sukanya Bag | Analytics Vidhya | Medium — Activation Functions Visualized

The function showed above is a sigmoid function, but a newer activation function called ReLU (which serves a similar purpose) is more efficient.

A Practical Guide to ReLU. Start using and understanding ReLU… | by Danqing Liu | Medium — The ReLU Activation Function

Mathematically, the computation for each neuron to the next is as follows:

The Basics of Neural Networks (Neural Network Series) — Part 1 | by Kiprono Elijah Koech | Towards Data Science — ŷ is our activation value or output, g is our activation function, w is our weight, and b is our bias

If the output of a neuron is below a certain value, the neuron does not pass data to the next layer. If the output is greater than that certain value (minimum value), the neuron “fires” or sends the number off to the next interconnected layer.

Training a Neural Network is an Iterative Process

At the beginning of training, all of the parameters a neural network contains are randomized. Through many iterations called epochs, these parameters are updated to match the training data as best as possible.

Remember neural networks are approximation functions; they try to predict labels to data that’s fed into it.

Neural Networks Learn Through Gradient Descent and Backpropagation

Understanding how neural networks actually learn boils down to 2 things you need to know:

The Goal of Neural Networks is to Minimize The Cost Function

The Cost Function is a measure of how much the predictions a neural network makes match the expected predictions. Essentially, it’s a quantification of the error between the predicted and expected values.

So, the smaller the cost function, the better our neural network.

A process called Gradient Descent used to optimize the cost function. Without getting into too much math, it does this by numerically estimating where a function outputs its lowest values.

Over many iterations, steps proportional to the negative of the gradient (moves in the opposite direction of the positive gradient) are taken. Derviatives (which find the slope of a point or instant on the gradient) are calculated to determine the direction to take these steps.

Gradient Descent Algorithm in Machine Learning — Gradient Descent Visualized With A Graph

Think of it as a ball rolling down a hill; it will ultimately lead to the lowest point on the hill due to gravity. Gradient Descent works in the same way!

Backpropagation Adjusts the Weights During Training

Backpropagation is a complicated word for fine-tuning the weights established based on the error rate from the previous iteration. That was a mouthful. Let me explain.

Since the weights are randomized at the beginning of training, backpropagation changes them so the outputs more accurately reflect the data. Fine-tuning these weights will lead lower rates and hence, more accurate predictions.

For those experienced in calculus, backpropagation calculates the gradients (minimum value of the cost function) using the chain rule. These weights are shifted based on this calculation.

Backpropagation for Dummies. All the math behind, simplest than… | by Diletta Goglia | Analytics Vidhya | Medium — A Nice Visualization for How Backpropagation Works

And That’s It. That’s How Neural Networks Learn.

If you understood everything mentioned, you’ve officially got a solid understanding of how neural networks work.

If you enjoyed this article, I would appreciate you subscribing for more machine learning content to pop up in your inbox! ‘Till next time!

References

[1] But what is a neural network? | Chapter 1, Deep learning

[2] Why Neural Networks can learn (almost) anything

[3] Explained: Neural networks

import learnai

Discussion about this post

Ready for more?