All Courses

Creating Custom tf.keras Layers

While Keras provides a rich set of built-in layers (Dense, Conv2D, LSTM, etc.), you'll inevitably encounter situations where you need a layer that performs a specific, non-standard operation. This could involve implementing a novel transformation from a research paper, combining existing operations in a unique sequence, or creating a stateful layer with custom logic. TensorFlow allows you to define your own layers by subclassing the tf.keras.layers.Layer base class.

The `tf.keras.layers.Layer` Base Class

At its core, a Keras layer encapsulates two primary components:

State: The layer's weights (trainable or non-trainable variables). These are typically created once and updated during training (if trainable).
Computation: The logic defining how input tensors are transformed into output tensors, using the layer's state if necessary. This is defined in the call method.

To create a custom layer, you inherit from tf.keras.layers.Layer and typically implement three important methods:

__init__(self, ...): The constructor. Use this to define layer-specific configuration arguments (like the number of units in a dense layer) and perform initial setup that doesn't depend on the input shape. Always call super().__init__(**kwargs) first.
build(self, input_shape): This method is the standard place to create the layer's weights (variables) using self.add_weight(). It's called automatically by Keras the first time the layer is invoked with an input. The input_shape argument (a tf.TensorShape object or a structure of them) allows you to create weights whose dimensions depend on the input dimensions. You don't need to call super().build(input_shape) explicitly in most cases, but ensure weights are created here.
call(self, inputs, ...): This method defines the layer's forward pass logic. It takes input tensors (and potentially other arguments like training for layers behaving differently during training vs. inference) and returns the output tensor(s). All the layer's computation happens here.

Why `build`?

You might wonder why weights aren't typically created in __init__. Using build provides flexibility. Often, the exact shape of a layer's weights depends on the shape of its input. For example, a Dense layer's kernel matrix needs input_shape[-1] rows. Delaying weight creation until build means the layer doesn't need to know the input shape during instantiation, only when it's first used. This simplifies model construction, especially in functional or sequential APIs where input shapes might not be immediately known.

Simplified lifecycle of a custom Keras layer, showing the order in which __init__, build, and call are typically executed.

Example: A Simple Dense Layer

Let's implement a simplified version of a Dense layer to illustrate these concepts. Our layer will perform the core operation $output = activation(dot(input, kernel) + bias)$ .

import tensorflow as tf

class SimpleDense(tf.keras.layers.Layer):
    """A basic fully connected layer implementation.

    Args:
        units: Positive integer, dimensionality of the output space.
        activation: Activation function to use. If you don't specify anything,
          no activation is applied (ie. 'linear' activation: `a(x) = x`).
        name: String, name of the layer.
    """
    def __init__(self, units, activation=None, name=None, **kwargs):
        super().__init__(name=name, **kwargs)
        self.units = units
        # Get the activation function; tf.keras.activations.get converts
        # string identifiers or functions into activation functions.
        self.activation = tf.keras.activations.get(activation)
        # We defer weight creation to the build method

    def build(self, input_shape):
        """Creates the weights of the layer."""
        # input_shape is a tf.TensorShape object.
        # The last dimension of the input shape is the number of input features.
        input_dim = input_shape[-1]

        # self.add_weight creates the layer's variables (weights).
        self.kernel = self.add_weight(
            name='kernel',
            shape=(input_dim, self.units),
            initializer='glorot_uniform',  # Common initializer
            trainable=True)               # Weight is trainable

        self.bias = self.add_weight(
            name='bias',
            shape=(self.units,),
            initializer='zeros',          # Initialize bias to zeros
            trainable=True)
        
        # The 'built' attribute is set to True automatically
        # after build() completes successfully.
        # You don't typically need to set self.built = True manually.

    def call(self, inputs):
        """Defines the computation performed by the layer."""
        # Perform the matrix multiplication
        z = tf.matmul(inputs, self.kernel)
        # Add the bias
        z = z + self.bias
        # Apply the activation function if specified
        if self.activation is not None:
            return self.activation(z)
        return z

    # Optional: Implement get_config for serialization
    def get_config(self):
        config = super().get_config()
        config.update({
            'units': self.units,
            # Serialize the activation function identifier (e.g., 'relu')
            'activation': tf.keras.activations.serialize(self.activation)
        })
        return config

    # Optional: Implement from_config for deserialization
    @classmethod
    def from_config(cls, config):
        # Deserialize the activation function
        config['activation'] = tf.keras.activations.deserialize(config['activation'])
        return cls(**config)

Let's break down the build method's use of self.add_weight:

name: Provides a meaningful name for the variable, useful for debugging and saving.
shape: Defines the dimensions of the weight tensor. Here, the kernel shape depends on input_dim (derived from input_shape) and self.units. The bias shape depends only on self.units.
initializer: Specifies how to initialize the weight values (e.g., 'zeros', 'glorot_uniform'). Keras provides many standard initializers.
trainable: A boolean indicating whether the variable's value should be updated by the optimizer during training. Most weights are trainable, but sometimes you might want non-trainable state variables.

The call method implements the actual math using standard TensorFlow operations (tf.matmul, +). It takes the inputs tensor and returns the transformed output.

You can now use this custom layer just like any built-in Keras layer:

# Example Usage
# Create some dummy data
input_data = tf.random.normal(shape=(32, 64)) # Batch of 32 samples, 64 features each

# Instantiate the custom layer
custom_dense_layer = SimpleDense(units=128, activation='relu')

# Call the layer on the data (this triggers the build method implicitly)
output_data = custom_dense_layer(input_data)

# Check the output shape
print(f"Input shape: {input_data.shape}")
print(f"Output shape: {output_data.shape}") # Should be (32, 128)

# Inspect the created weights
print(f"Kernel shape: {custom_dense_layer.kernel.shape}") # Should be (64, 128)
print(f"Bias shape: {custom_dense_layer.bias.shape}") # Should be (128,)

Serialization: `get_config` and `from_config`

For a model containing your custom layer to be saved (e.g., using model.save()) and loaded (tf.keras.models.load_model()), the layer needs to be serializable. This usually involves implementing the get_config method.

get_config(self): This method should return a JSON-serializable dictionary containing the configuration arguments needed to recreate the layer instance. Typically, you call the parent class's get_config and update the resulting dictionary with the arguments specific to your layer (like units and activation in our SimpleDense example). Note that complex objects like activation functions should be serialized using Keras utilities (e.g., tf.keras.activations.serialize).

Sometimes, if your layer has complex initialization logic or requires custom deserialization (just passing the config dictionary to the constructor), you might also need to implement the class method from_config(cls, config). Keras calls this method with the dictionary returned by get_config to reconstruct the layer object during model loading. Often, the default implementation (which passes config to the constructor cls(**config)) is sufficient if get_config is implemented correctly, but it's shown in the example for completeness, especially for handling deserialization of objects like activations.

Layers with Different Training/Inference Behavior

Some layers, like Dropout or Batch Normalization, behave differently during training (e.g., applying dropout, updating moving averages) than during inference (e.g., disabling dropout, using frozen statistics). Keras handles this through an optional training argument in the call method.

If your custom layer needs distinct behaviors, accept a training argument in call:

import tensorflow as tf

class CustomLayerWithTrainingArg(tf.keras.layers.Layer):
    # ... __init__ and build ...

    def call(self, inputs, training=None): # Accept the training argument
        if training:
            # Behavior during training
            # e.g., apply dropout, update state
            print("Executing in training mode")
            # Example: simplified dropout
            # return tf.nn.dropout(inputs, rate=0.5) 
        else:
            # Behavior during inference
            print("Executing in inference mode")
            # Example: no dropout
            # return inputs 
        
        # Placeholder return
        return inputs + 0.0 # Make sure to return something

# Keras automatically passes the correct boolean value for 'training'
# when you use model.fit(), model.evaluate(), or model.predict().

When you use standard Keras training loops (model.fit()), evaluation (model.evaluate()), or prediction (model.predict()), Keras automatically passes the correct boolean value (True or False) for the training argument to your layer's call method. If you are writing a custom training loop, you will need to pass this argument explicitly.

Creating custom layers is a powerful technique for extending TensorFlow's capabilities. By understanding the roles of __init__, build, and call, and how to manage weights and serialization, you can implement virtually any layer architecture required for your specific machine learning tasks.

Was this section helpful?

Creating Custom tf.keras Layers

The tf.keras.layers.Layer Base Class

Why build?

Example: A Simple Dense Layer

Serialization: get_config and from_config

Layers with Different Training/Inference Behavior

The `tf.keras.layers.Layer` Base Class

Why `build`?

Serialization: `get_config` and `from_config`