Feedforward Neural Networks (FNNs), also known as Multi-Layer Perceptrons (MLPs), are a fundamental type of artificial neural network where connections between nodes do not form a cycle. Information moves in only one direction: from the input nodes, through any hidden layers, to the output nodes. In this section, we will focus on how to construct these networks using Flux.jl.
Dense
Layer: A Core Building BlockThe primary component for building most layers in an FNN is the Dense
layer. A Dense
layer implements the operation output=activation(W⋅input+b), where W is a weight matrix, b is a bias vector, and activation is an element-wise activation function.
To create a Dense
layer in Flux, you need to specify the number of input features and the number of output features (neurons) for that layer. You also typically provide an activation function.
using Flux
# A Dense layer with 10 input features and 5 output neurons, using ReLU activation
layer1 = Dense(10 => 5, relu)
# Another Dense layer, perhaps for binary classification output, with 5 inputs and 1 output, using sigmoid
# The input to this layer (5) must match the output of the previous layer.
output_layer = Dense(5 => 1, sigmoid)
In the example Dense(10 => 5, relu)
, 10 => 5
is a Pair
indicating that the layer transforms an input vector of length 10 into an output vector of length 5. relu
is the Rectified Linear Unit activation function, defined as f(x)=max(0,x). The chapter introduction mentioned the sigmoid function, σ(x)=1+e−x1, which is often used in output layers for binary classification problems to squash the output to a probability between 0 and 1.
Flux provides many common activation functions:
relu
: Rectified Linear Unit, f(x)=max(0,x).sigmoid
: Sigmoid function, σ(x)=1+e−x1.tanh
: Hyperbolic tangent, tanh(x)=ex+e−xex−e−x.softmax
: Used for multi-class classification, normalizes outputs into a probability distribution.identity
: No activation, f(x)=x. This is the default if no activation function is specified.Choosing the right activation function depends on the layer's position in the network and the nature of the problem you are solving. For hidden layers, relu
is a very common and effective choice. For output layers, sigmoid
is common for binary classification, softmax
for multi-class classification, and identity
(or no activation) for regression tasks.
Chain
Most useful neural networks consist of multiple layers stacked one after another. Flux.jl uses the Chain
constructor to combine multiple layers into a single, callable model. The output of one layer in the Chain
automatically becomes the input to the next.
Let's build a simple FNN with an input layer, one hidden layer, and an output layer:
Suppose we have input data with 784 features (like a flattened 28x28 image), we want a hidden layer with 128 neurons using relu
activation, and an output layer with 10 neurons (e.g., for classifying digits 0-9) using softmax
activation.
using Flux
model = Chain(
Dense(784 => 128, relu), # Input: 784 features, Output: 128 features
Dense(128 => 10, softmax) # Input: 128 features (from previous layer), Output: 10 features (probabilities)
)
When constructing a Chain
, it's important that the output dimension of a layer matches the input dimension of the subsequent layer. In the example above:
Dense
layer takes 784 inputs and produces 128 outputs.Dense
layer takes 128 inputs (matching the previous layer's output) and produces 10 outputs.You can create more complex FNNs by adding more Dense
layers (or other types of layers, which we'll see later) to the Chain
:
using Flux
# A deeper FNN with two hidden layers
model_deep = Chain(
Dense(784 => 256, relu), # Hidden Layer 1
Dense(256 => 128, relu), # Hidden Layer 2
Dense(128 => 10, softmax) # Output Layer
)
The model
and model_deep
objects are now callable Flux models. You can pass data through them to get predictions, and their parameters (weights and biases) can be trained.
Understanding the architecture of your neural network is helpful. While Flux itself doesn't have a built-in visualizer, you can represent the structure of a simple FNN like this:
The diagram shows data flowing from input features through one or more hidden layers to an output layer. Each layer performs a transformation typically involving a
Dense
operation followed by an activation function.
Once a model is constructed, you can perform a "forward pass" by simply calling the model with input data. The input data should be a matrix where columns represent individual samples and rows represent features. If you have a single sample, it should be a column vector.
Let's assume model_deep
from the previous example and some random input data representing a batch of 5 images, each with 784 features:
# Example: 5 samples, each with 784 features
dummy_input = rand(Float32, 784, 5)
# Perform a forward pass
predictions = model_deep(dummy_input)
# `predictions` will be a 10x5 matrix,
# where each column contains 10 probabilities (due to softmax) for one input sample.
println("Output dimensions: ", size(predictions)) # Expected: (10, 5)
This forward pass computes the output of the network given the input. The next step in the deep learning process, which we'll cover in subsequent sections, is to define a loss function to measure how far these predictions are from the true values, and then use optimizers to adjust the model's weights and biases to minimize this loss.
You now have the foundational knowledge to construct FNN architectures in Flux.jl. By combining Dense
layers within a Chain
, you can define networks of varying depths and widths, tailored to the complexity of the machine learning task at hand.
Was this section helpful?
© 2025 ApX Machine Learning