While earlier machine learning models provide powerful tools for many tasks, neural networks, inspired by the structure of the human brain, have enabled significant progress on more complex problems, particularly in areas like image and speech recognition. This section revisits the foundational elements of neural networks, preparing you to implement these models using Julia.
At its core, a neural network is composed of interconnected processing units called neurons (or nodes). A single neuron takes multiple input values, performs a calculation, and produces an output. Think of it as a small computational function.
Each input xi to a neuron is associated with a weight wi. The neuron sums up all these weighted inputs and adds a bias term b. This sum, often called the weighted sum or affine transformation, is then passed through an activation function f. The output y of the neuron can be expressed as:
y=f(∑i(wixi)+b)
The weights wi determine the importance of each input signal, while the bias b acts like an offset, allowing the neuron to activate even if all inputs are zero, or shifting the activation function's effective range. These weights and biases are the parameters that the network "learns" during the training process.
A single neuron computes a weighted sum of its inputs, adds a bias, and then applies an activation function to produce an output.
Neurons are typically organized into layers:
Information flows from the input layer, through one or more hidden layers, to the output layer. Each connection between neurons has an associated weight.
Activation functions are a critical part of a neural network. If neurons only performed weighted sums, the entire network, no matter how many layers it had, would behave like a single linear model. Activation functions introduce non-linearities, enabling the network to learn complex patterns and relationships in the data that extend past simple linear combinations.
Here are a few common activation functions:
Sigmoid: As mentioned in the chapter introduction, the sigmoid function is defined as: σ(x)=1+e−x1 It squashes its input into a range between 0 and 1. This makes it suitable for output neurons in binary classification problems, where the output can be interpreted as a probability. However, sigmoid functions can suffer from the "vanishing gradient" problem during training, especially in deep networks, which can slow down learning.
Hyperbolic Tangent (tanh): tanh(x)=ex+e−xex−e−x The tanh function is similar to sigmoid but squashes values to a range between -1 and 1. It is often preferred over sigmoid for hidden layers as its output is zero-centered. It also suffers from the vanishing gradient problem.
Rectified Linear Unit (ReLU): ReLU(x)=max(0,x) ReLU is currently one of the most popular activation functions for hidden layers. It outputs the input directly if it is positive, and zero otherwise. It is computationally efficient and helps mitigate the vanishing gradient problem for positive inputs. Variations like Leaky ReLU or Parametric ReLU (PReLU) exist to address the "dying ReLU" problem (where neurons can become inactive if their input is always negative).
Softmax: For multi-class classification problems, the softmax function is commonly used in the output layer. It takes a vector of arbitrary real-valued scores (logits) and converts them into a probability distribution over K classes, where each probability is between 0 and 1, and all probabilities sum to 1. For an input vector z=[z1,z2,…,zK], the softmax output for the j-th element is: softmax(z)j=∑k=1Kezkezj
Comparison of Sigmoid and ReLU activation functions. Sigmoid maps inputs to (0,1), suitable for probabilities. ReLU outputs max(0,x), promoting sparsity and alleviating vanishing gradients for positive inputs.
In a standard feedforward neural network, information flows in one direction: from the input layer, through any hidden layers, to the output layer. This process is called a feedforward pass or forward propagation.
During a feedforward pass:
For example, the output y^ (the prediction) is generated by processing the input X through the network's functions and learned parameters (weights W and biases b).
While the simple feedforward structure described above is fundamental, various specialized architectures have been developed for different types of data and tasks:
This chapter will focus on feedforward neural networks, providing the basis for understanding more complex architectures later.
The "learning" in neural networks refers to the process of finding the optimal set of weights and biases that allow the network to map input data to correct outputs. This is typically achieved by:
You'll explore loss functions, optimizers, and the training process, including automatic differentiation with Zygote.jl, in the subsequent sections of this chapter. For now, understand that these fundamental components, neurons, layers, activation functions, and a forward flow of information, are the building blocks upon which the learning process operates.
Was this section helpful?
© 2025 ApX Machine Learning