Custom Layer Creation in Flux

While Flux.jl offers a rich set of pre-built layers sufficient for many common neural network architectures, there will inevitably be times when you need something more specific. Perhaps you're implementing a novel layer from a research paper, designing a unique data transformation, or require a layer with a specialized structure for its learnable parameters. Fortunately, Flux.jl is designed with extensibility in mind, making the creation of custom layers a relatively straightforward process, largely thanks to Julia's powerful features like multiple dispatch and a flexible type system.

A Flux layer is typically a Julia struct that holds the layer's state, primarily its learnable parameters (like weights and biases), and any fixed hyperparameters. To integrate with Flux's ecosystem, this struct needs to be callable (i.e., act like a function to perform its forward pass) and its learnable parameters must be discoverable by Flux's optimization machinery.

Let's walk through creating a custom layer. We'll build a simple layer called BiasedActivation which adds a learnable bias vector to its input and then applies a user-specified activation function.

Defining the Layer Structure

First, we define the struct that will hold our layer's data. For BiasedActivation, this includes the learnable bias vector and the activation_fn (which is fixed after construction).

using Flux, Functors

struct BiasedActivation
    bias         # Learnable bias vector
    activation_fn # User-specified activation function (e.g., relu, sigmoid)
end

Initializing the Layer

Next, we need a constructor to create instances of our layer. This constructor will initialize the bias vector. We'll initialize it with zeros, which is a common practice for biases. The size of the bias vector will depend on the number of output features this layer is intended to produce or match. The activation function will be passed as an argument.

# Constructor
function BiasedActivation(output_dims::Int, activation_fn=identity)
    # Initialize bias as a column vector of zeros
    bias_init = zeros(Float32, output_dims) 
    return BiasedActivation(bias_init, activation_fn)
end

Here, output_dims determines the size of the bias vector. If the input x to this layer has dimensions (features, batch_size), then bias should have dimensions (features, 1) or just (features,) to be broadcastable. Our constructor initializes bias as a vector of output_dims elements.

Making Parameters Discoverable with Functors.@functor

For Flux to recognize and train the bias parameter, we need to tell the Functors.jl package about it. This is done using the @functor macro. We explicitly list bias as a trainable parameter. activation_fn is not listed, so Flux will treat it as a fixed part of the layer's structure, not something to be updated during training.

Functors.@functor BiasedActivation (bias,)

By specifying (bias,), we are telling Flux that bias is a field containing parameters that should be managed (e.g., moved to GPU, gradients calculated for). If a layer had multiple parameter fields, say weights and bias, we would list them like (weights, bias).

Implementing the Forward Pass

To make our layer usable, it needs to be callable. This means we define a method that allows an instance of BiasedActivation to be called like a function, taking an input x and returning the transformed output. This method implements the layer's forward pass logic.

function (layer::BiasedActivation)(x::AbstractArray)
    # Add bias (element-wise, broadcasting over batches if necessary)
    # Then apply the activation function
    return layer.activation_fn.(x .+ layer.bias)
end

In this forward pass, x .+ layer.bias performs element-wise addition. If x is a matrix of size (features, batch_size) and layer.bias is a vector of size (features,) (or a column vector (features,1)), Julia's broadcasting rules will handle the addition correctly. The result is then passed through the layer.activation_fn element-wise.

Enhancing Usability with Base.show

It's good practice to define how your layer should be displayed, for example, when printing a Chain containing it. We can do this by overloading Base.show.

function Base.show(io::IO, l::BiasedActivation)
    print(io, "BiasedActivation(output_dims=", size(l.bias, 1), 
          ", activation=", nameof(typeof(l.activation_fn)), ")")
end

This will give a cleaner representation, for instance: BiasedActivation(output_dims=10, activation=relu).

Putting It All Together: Using the Custom Layer

Now let's see our BiasedActivation layer in action.

# Create an instance of our custom layer
# Let's say it operates on 5 features and uses relu activation
custom_layer = BiasedActivation(5, relu)

# Check its parameters
params_found = Flux.params(custom_layer)
println("Parameters found by Flux: ", params_found)
# Output should show the bias vector

# Create some dummy input data
# (features, batch_size)
dummy_input = randn(Float32, 5, 3) 

# Perform a forward pass
output = custom_layer(dummy_input)
println("Output shape: ", size(output))

# Custom layers can be part of a Chain
model = Chain(
    Dense(10, 5), # Standard Dense layer
    custom_layer, # Our custom layer
    Dense(5, 2),
    softmax
)

println("\nModel structure:")
println(model)

# Test with some data through the model
test_data = randn(Float32, 10, 4) # Input to the model
model_output = model(test_data)
println("Model output shape: ", size(model_output))

When you run this, Flux.params(custom_layer) will correctly identify the bias vector as a trainable parameter. Zygote, Flux's default automatic differentiation engine, will be able to compute gradients for bias as long as the operations within the forward pass (like .+ and relu) are differentiable, which they are.

The following diagram illustrates the main components involved in defining and using a custom Flux layer like our BiasedActivation.

Components of defining and using a custom BiasedActivation layer within the Flux framework.

Important Notes for Custom Layers

Parameter Initialization: While we used zeros for biases, weights often require more sophisticated initialization (e.g., Glorot/Xavier or He initialization) to aid training. Flux provides functions like Flux.glorot_uniform and Flux.kaiming_uniform which you can use in your constructors.
GPU Compatibility: To make your custom layer GPU-compatible, ensure that its parameters are stored as CuArrays (or can be converted to them) and that all operations in the forward pass are compatible with CUDA.jl. Flux's gpu(layer) function will attempt to move any parameters registered via @functor to the GPU. If x becomes a CuArray, operations like .+ and standard activation functions provided by Flux or NNlib will work on the GPU. Your custom layer's code (l::MyLayer)(x) usually doesn't need explicit GPU logic if it relies on these standard, overloaded operations.
Type Stability: For optimal performance in Julia, aim for type-stable code within your layer's forward pass. While Flux and Zygote are quite strong, type instabilities can sometimes lead to slower execution or compilation overhead.

Testing: Thoroughly test your custom layer. This includes verifying the output dimensions and values for known inputs, and also checking that gradients are computed correctly. You can use Zygote.gradient for this:

# Example: Test gradient for 'bias'
layer_instance = BiasedActivation(3, tanh)
input_data = randn(Float32, 3, 2)

# Get gradients with respect to the first parameter (bias)
# and the input (x)
grads = Zygote.gradient((l, x_val) -> sum(l(x_val)), layer_instance, input_data)

println("Gradient for bias: ", grads[1].bias) # Accessing gradient for the 'bias' field
# grads[1] contains gradients for the layer's parameters
# grads[2] contains gradients for input_data

Ensure that grads[1].bias is not nothing and has the expected shape.

Creating custom layers in Flux allows you to extend the framework to meet nearly any modeling requirement. By following the pattern of defining a struct, a constructor, registering parameters with @functor, and implementing the forward pass as a callable method, you can integrate your own innovative components into Flux's powerful deep learning ecosystem. This flexibility is a significant advantage when exploring new architectures or adapting existing ones to unique problem domains.

Was this section helpful?

References

Flux: High-Performance ML - Custom Layers, The Flux Community, 2025 - Official documentation for Flux.jl, detailing how to define custom layers, register parameters, and integrate them into Flux models.
Functors.jl Documentation, The Functors.jl Contributors, 2024 - Documentation for the Functors.jl package, which is essential for making custom layer parameters discoverable and trainable by Flux's optimization machinery.
Deep Learning, Ian Goodfellow, Yoshua Bengio, and Aaron Courville, 2016 (MIT Press) - A comprehensive textbook covering deep learning concepts, including neural network architectures, layer types, activation functions, and parameter initialization strategies, which are fundamental to designing custom layers.
The Julia Language Manual - Types and Methods, The Julia Language Contributors, 2024 - Official Julia language documentation explaining its type system and multiple dispatch, which are core features enabling the flexible and extensible design of custom layers in Flux.