Flux.jl is Julia's primary library for deep learning. It's designed with flexibility and extensibility in mind, allowing you to express complex models concisely. Unlike some frameworks that introduce many new data types, Flux.jl integrates smoothly with Julia's existing ecosystem, particularly its powerful array manipulation capabilities. This section will guide you through the fundamental building blocks of Flux.jl: tensors, which are essentially multi-dimensional arrays, and layers, which are the operational units within a neural network.
In the context of deep learning, a tensor is a generalization of vectors and matrices to an arbitrary number of dimensions. Flux.jl doesn't introduce a special "tensor" type; instead, it operates directly on Julia's built-in Array
type for CPU computations or CuArray
(from CUDA.jl
) for GPU computations. This seamless integration means you can leverage all of Julia's rich array functionalities.
Let's look at how we represent data using these arrays:
0D Tensor (Scalar): A single number.
scalar_value = 5.0f0 # A Float32 scalar
# To store it in a 0-dimensional array:
scalar_array = fill(5.0f0)
# ndims(scalar_array) == 0
Neural networks often use 32-bit floats (Float32
, indicated by the f0
suffix) for a balance between precision and computational performance.
1D Tensor (Vector): A sequence of numbers. Useful for representing a single data sample with multiple features, or the biases in a layer.
vector_data = [1.0f0, 2.0f0, 3.0f0, 4.0f0] # A 4-element vector
# size(vector_data) will be (4,)
2D Tensor (Matrix): A grid of numbers. For tabular data, this might represent a batch of samples, where rows are features and columns are individual samples. In Flux.jl, the convention for dense layers is often (features, batch_size)
.
# 3 features, 5 samples in a batch
matrix_data = rand(Float32, 3, 5)
# size(matrix_data) will be (3, 5)
3D Tensor: Can be used for sequence data, for example, (features, sequence_length, batch_size)
, or for data like a batch of grayscale images (height, width, batch_size)
.
4D Tensor: Standard for batches of color images, typically with dimensions (width, height, channels, batch_size)
.
# A batch of 16 color images, each 28x28 pixels
# (width, height, color_channels, batch_size)
images_batch = rand(Float32, 28, 28, 3, 16)
# size(images_batch) will be (28, 28, 3, 16)
Understanding the expected shape of tensors for different operations and layers is important for building models correctly. Flux.jl operations are generally designed to work with these standard conventions. Basic array operations like addition, multiplication, and element-wise functions from Julia work directly on these tensors.
Layers are the core components of a neural network. Each layer performs a specific transformation on its input data, typically involving learnable parameters (weights and biases). Flux.jl provides a rich collection of pre-defined layers, making it easy to construct common network architectures.
A layer in Flux is essentially a Julia structure that is callable, meaning it behaves like a function. It takes an input tensor and produces an output tensor.
Dense
LayerThe most fundamental layer is the Dense
layer, also known as a fully connected layer. It applies a linear transformation to the input followed by an optional activation function. Its operation can be described by the equation y=σ(Wx+b), where W is the weight matrix, b is the bias vector, x is the input, and σ is the activation function (like the sigmoid function σ(z)=1+e−z1).
You create a Dense
layer by specifying the number of input features and output features. An activation function can also be provided.
using Flux
# Create a Dense layer:
# Takes 3 input features, produces 2 output features
# Uses the sigmoid activation function (σ)
input_features = 3
output_features = 2
layer = Dense(input_features, output_features, sigmoid)
# layer.weight is a (output_features x input_features) matrix, i.e., 2x3
# layer.bias is a (output_features)-element vector, i.e., 2-element
# layer.σ is the sigmoid function
# Let's create some sample input data (3 features, 1 sample in a batch)
# Input dimensions: (features, batch_size)
input_data = rand(Float32, 3, 1)
# Pass the data through the layer
output_data = layer(input_data)
println("Input size: ", size(input_data)) # Expected: (3, 1)
println("Output size: ", size(output_data)) # Expected: (2, 1)
The layer.weight
and layer.bias
are the learnable parameters of this layer. Flux.jl automatically initializes them, often with random values drawn from a suitable distribution (like Glorot uniform by default for Dense
layers).
Transformation of an input tensor by a
Dense
layer. The input tensor with Din features is transformed into an output tensor with Dout features. The layer internally performs a linear transformation followed by an activation.
Flux provides many other layer types, such as:
Conv
: For convolutional neural networks, primarily used in image processing.RNN
, LSTM
, GRU
: For recurrent neural networks, used in sequence modeling tasks like natural language processing.MaxPool
, MeanPool
.BatchNorm
.Activation functions themselves (e.g., relu
, tanh
, softmax
) are also available in Flux. If an activation function is not specified in a layer like Dense
, it defaults to identity
(meaning no activation is applied, y=Wx+b).
You can apply an activation function after a layer explicitly, although it's often more convenient to include it in the layer definition:
layer_no_activation = Dense(5, 10) # No activation specified, defaults to identity (σ=identity)
input_for_layer = rand(Float32, 5, 3) # 5 features, 3 samples
output_linear = layer_no_activation(input_for_layer)
# Apply ReLU activation element-wise
output_activated = relu.(output_linear)
Chain
Individual layers are powerful, but the true strength of deep learning comes from stacking them to form deeper architectures. Flux.jl uses Chain
to combine multiple layers (or other functions that operate on tensors) sequentially. A Chain
takes any number of layers as arguments and applies them in the order they are provided.
using Flux
# Define a simple two-layer network
model = Chain(
Dense(10, 20, relu), # 1st layer: 10 inputs, 20 outputs, ReLU activation
Dense(20, 5, sigmoid) # 2nd layer: 20 inputs, 5 outputs, Sigmoid activation
)
# `model` is now also a callable struct.
# It expects input compatible with the first layer (10 features).
# Create some sample input data
# (features, batch_size)
input_to_model = rand(Float32, 10, 32) # 10 features, 32 samples in batch
# Pass data through the entire model
final_output = model(input_to_model)
println("Model input size: ", size(input_to_model)) # Expected: (10, 32)
println("Model output size: ", size(final_output)) # Expected: (5, 32)
The output of one layer in the Chain
becomes the input to the next. This simple and elegant way of composing models is a characteristic of Flux.jl. You can even nest Chain
s or include any Julia function that performs array operations within a Chain
, offering considerable flexibility in model design.
A simple
Chain
model in Flux.jl, illustrating data flowing sequentially through twoDense
layers, transforming from 10 features to 20, and then to 5 features.
When you create layers like Dense
or combine them into a Chain
, Flux.jl keeps track of all their learnable parameters (weights and biases). You can inspect these using Flux.params()
:
# For the 'model' defined in the Chain example:
parameters = Flux.params(model)
# `parameters` is an object that allows iteration over all weights and biases in the model.
# For example, to access the weights of the first Dense layer in the Chain:
# model[1].weight
# To access the biases of the second Dense layer:
# model[2].bias
These parameters are what your optimization algorithm will adjust during the training process. The gradients of the loss function with respect to these parameters are computed using automatic differentiation, primarily via Zygote.jl, which we will cover in a subsequent section. For now, it's sufficient to know that Flux.jl manages these parameters for you.
This foundation in tensors and layers sets the stage for building and training neural networks. You've seen how to represent data and how to define transformations on that data. Next, you'll learn how to combine these into complete models and train them by defining loss functions and using optimizers.
Was this section helpful?
© 2025 ApX Machine Learning