While Flux.jl offers a rich set of pre-built layers sufficient for many common neural network architectures, there will inevitably be times when you need something more specific. Perhaps you're implementing a novel layer from a research paper, designing a unique data transformation, or require a layer with a specialized structure for its learnable parameters. Fortunately, Flux.jl is designed with extensibility in mind, making the creation of custom layers a relatively straightforward process, largely thanks to Julia's powerful features like multiple dispatch and a flexible type system.
At its core, a Flux layer is typically a Julia struct that holds the layer's state, primarily its learnable parameters (like weights and biases), and any fixed hyperparameters. To integrate with Flux's ecosystem, this struct needs to be callable (i.e., act like a function to perform its forward pass) and its learnable parameters must be discoverable by Flux's optimization machinery.
Let's walk through creating a custom layer. We'll build a simple layer called BiasedActivation which adds a learnable bias vector to its input and then applies a user-specified activation function.
First, we define the struct that will hold our layer's data. For BiasedActivation, this includes the learnable bias vector and the activation_fn (which is fixed after construction).
using Flux, Functors
struct BiasedActivation
bias # Learnable bias vector
activation_fn # User-specified activation function (e.g., relu, sigmoid)
end
Next, we need a constructor to create instances of our layer. This constructor will initialize the bias vector. We'll initialize it with zeros, which is a common practice for biases. The size of the bias vector will depend on the number of output features this layer is intended to produce or match. The activation function will be passed as an argument.
# Constructor
function BiasedActivation(output_dims::Int, activation_fn=identity)
# Initialize bias as a column vector of zeros
bias_init = zeros(Float32, output_dims)
return BiasedActivation(bias_init, activation_fn)
end
Here, output_dims determines the size of the bias vector. If the input x to this layer has dimensions (features, batch_size), then bias should have dimensions (features, 1) or just (features,) to be broadcastable. Our constructor initializes bias as a vector of output_dims elements.
For Flux to recognize and train the bias parameter, we need to tell the Functors.jl package about it. This is done using the @functor macro. We explicitly list bias as a trainable parameter. activation_fn is not listed, so Flux will treat it as a fixed part of the layer's structure, not something to be updated during training.
Functors.@functor BiasedActivation (bias,)
By specifying (bias,), we are telling Flux that bias is a field containing parameters that should be managed (e.g., moved to GPU, gradients calculated for). If a layer had multiple parameter fields, say weights and bias, we would list them like (weights, bias).
To make our layer usable, it needs to be callable. This means we define a method that allows an instance of BiasedActivation to be called like a function, taking an input x and returning the transformed output. This method implements the layer's forward pass logic.
function (layer::BiasedActivation)(x::AbstractArray)
# Add bias (element-wise, broadcasting over batches if necessary)
# Then apply the activation function
return layer.activation_fn.(x .+ layer.bias)
end
In this forward pass, x .+ layer.bias performs element-wise addition. If x is a matrix of size (features, batch_size) and layer.bias is a vector of size (features,) (or a column vector (features,1)), Julia's broadcasting rules will handle the addition correctly. The result is then passed through the layer.activation_fn element-wise.
It's good practice to define how your layer should be displayed, for example, when printing a Chain containing it. We can do this by overloading Base.show.
function Base.show(io::IO, l::BiasedActivation)
print(io, "BiasedActivation(output_dims=", size(l.bias, 1),
", activation=", nameof(typeof(l.activation_fn)), ")")
end
This will give a cleaner representation, for instance: BiasedActivation(output_dims=10, activation=relu).
Now let's see our BiasedActivation layer in action.
# Create an instance of our custom layer
# Let's say it operates on 5 features and uses relu activation
custom_layer = BiasedActivation(5, relu)
# Check its parameters
params_found = Flux.params(custom_layer)
println("Parameters found by Flux: ", params_found)
# Output should show the bias vector
# Create some dummy input data
# (features, batch_size)
dummy_input = randn(Float32, 5, 3)
# Perform a forward pass
output = custom_layer(dummy_input)
println("Output shape: ", size(output))
# Custom layers can be part of a Chain
model = Chain(
Dense(10, 5), # Standard Dense layer
custom_layer, # Our custom layer
Dense(5, 2),
softmax
)
println("\nModel structure:")
println(model)
# Test with some data through the model
test_data = randn(Float32, 10, 4) # Input to the model
model_output = model(test_data)
println("Model output shape: ", size(model_output))
When you run this, Flux.params(custom_layer) will correctly identify the bias vector as a trainable parameter. Zygote, Flux's default automatic differentiation engine, will be able to compute gradients for bias as long as the operations within the forward pass (like .+ and relu) are differentiable, which they are.
The following diagram illustrates the main components involved in defining and using a custom Flux layer like our BiasedActivation.
Components of defining and using a custom
BiasedActivationlayer within the Flux framework.
zeros for biases, weights often require more sophisticated initialization (e.g., Glorot/Xavier or He initialization) to aid training. Flux provides functions like Flux.glorot_uniform and Flux.kaiming_uniform which you can use in your constructors.CuArrays (or can be converted to them) and that all operations in the forward pass are compatible with CUDA.jl. Flux's gpu(layer) function will attempt to move any parameters registered via @functor to the GPU. If x becomes a CuArray, operations like .+ and standard activation functions provided by Flux or NNlib will work on the GPU. Your custom layer's code (l::MyLayer)(x) usually doesn't need explicit GPU logic if it relies on these standard, overloaded operations.Zygote.gradient for this:
# Example: Test gradient for 'bias'
layer_instance = BiasedActivation(3, tanh)
input_data = randn(Float32, 3, 2)
# Get gradients with respect to the first parameter (bias)
# and the input (x)
grads = Zygote.gradient((l, x_val) -> sum(l(x_val)), layer_instance, input_data)
println("Gradient for bias: ", grads[1].bias) # Accessing gradient for the 'bias' field
# grads[1] contains gradients for the layer's parameters
# grads[2] contains gradients for input_data
Ensure that grads[1].bias is not nothing and has the expected shape.Creating custom layers in Flux allows you to extend the framework to meet nearly any modeling requirement. By following the pattern of defining a struct, a constructor, registering parameters with @functor, and implementing the forward pass as a callable method, you can integrate your own innovative components into Flux's powerful deep learning ecosystem. This flexibility is a significant advantage when exploring new architectures or adapting existing ones to unique problem domains.
Was this section helpful?
© 2026 ApX Machine LearningAI Ethics & Transparency•