All Courses

Hands-on Practical: Building a Custom Model Pipeline

Let's put the concepts from this chapter into practice. We'll build a small machine learning pipeline featuring several custom components: a custom layer, a custom model structure defined via subclassing, a custom loss function, and a manually implemented training loop. This exercise demonstrates how to combine these elements to gain fine-grained control over your model's architecture and training dynamics.

Scenario: Binary Classification with Customization

Imagine you have a binary classification problem where you need a specific type of layer interaction and a loss function tailored to handle potential class imbalance or specific error costs. We'll simulate this with synthetic data.

First, let's generate some data:

import tensorflow as tf
import numpy as np
from sklearn.datasets import make_circles
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Generate synthetic data (non-linearly separable)
X, y = make_circles(n_samples=1000, noise=0.1, factor=0.5, random_state=42)

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Reshape y to be a column vector for TF
y = y.reshape(-1, 1).astype(np.float32)

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42
)

# Convert to TensorFlow Datasets
BATCH_SIZE = 32
train_dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=len(X_train)).batch(BATCH_SIZE)

test_dataset = tf.data.Dataset.from_tensor_slices((X_test, y_test))
test_dataset = test_dataset.batch(BATCH_SIZE)

print(f"X_train shape: {X_train.shape}, y_train shape: {y_train.shape}")
print(f"Data sample: {X_train[0]}, Label: {y_train[0]}")

1. Creating a Custom Keras Layer

Let's create a simple custom dense layer. While Keras provides tf.keras.layers.Dense, building our own helps illustrate the mechanics of subclassing tf.keras.layers.Layer. We'll call it MySimpleDense.

class MySimpleDense(tf.keras.layers.Layer):
    """A basic dense layer implementation for demonstration."""
    def __init__(self, units, activation=None, **kwargs):
        super().__init__(**kwargs)
        self.units = units
        self.activation = tf.keras.activations.get(activation)
        print(f"Initializing MySimpleDense with {units} units.")

    def build(self, input_shape):
        """Create the layer's weights. Called the first time the layer is used."""
        input_dim = input_shape[-1]
        # Add weight variable
        self.w = self.add_weight(
            shape=(input_dim, self.units),
            initializer="glorot_uniform", # Xavier uniform initializer
            trainable=True,
            name="kernel" # Standard name
        )
        # Add bias variable
        self.b = self.add_weight(
            shape=(self.units,),
            initializer="zeros",
            trainable=True,
            name="bias" # Standard name
        )
        print(f"Building MySimpleDense: Input shape {input_shape}, Weight shape {self.w.shape}")
        super().build(input_shape) # Ensure the build method of the parent class is called

    def call(self, inputs):
        """Defines the forward pass logic of the layer."""
        # Matrix multiplication: inputs @ w
        z = tf.matmul(inputs, self.w) + self.b
        if self.activation is not None:
            return self.activation(z)
        return z

    def get_config(self):
        """Enables serialization."""
        config = super().get_config()
        config.update({
            "units": self.units,
            "activation": tf.keras.activations.serialize(self.activation)
        })
        return config

Important points:

__init__: Stores configuration like the number of units and activation function. Doesn't create weights.
build: Creates the trainable weights (w and b) using add_weight. This method is called automatically by Keras the first time the layer processes an input, inferring the input dimension.
call: Defines the layer's computation using the input tensor and the created weights.
get_config: Important for saving and loading models containing this custom layer.

2. Subclassing `tf.keras.Model`

Now, we'll build our model by subclassing tf.keras.Model. This gives us maximum flexibility in defining the forward pass. Our model will use our MySimpleDense layer.

class CustomClassifier(tf.keras.Model):
    """A simple classifier model using our custom dense layer."""
    def __init__(self, num_hidden_units, name="custom_classifier", **kwargs):
        super().__init__(name=name, **kwargs)
        self.num_hidden_units = num_hidden_units
        # Instantiate layers in __init__
        self.hidden_layer = MySimpleDense(num_hidden_units, activation="relu")
        self.output_layer = tf.keras.layers.Dense(1, activation="sigmoid") # Standard Dense for output
        print("Initializing CustomClassifier model.")

    def call(self, inputs, training=None):
        """Defines the forward pass logic of the model."""
        x = self.hidden_layer(inputs)
        # You could add more complex logic here if needed
        return self.output_layer(x)

    # Optional: define build if needed for complex input shape logic,
    # but often __init__ and the first call are sufficient.

    # Optional: customize train_step, test_step, predict_step if not using a custom loop
    # (We will use a custom loop below, so we don't override these here)

    def get_config(self):
        """Enables serialization."""
        config = super().get_config()
        config.update({"num_hidden_units": self.num_hidden_units})
        return config

    @classmethod
    def from_config(cls, config):
        # Need to handle custom layer deserialization if necessary
        # For simple cases like this, Keras might handle it automatically
        # if the custom layer is registered or passed via custom_objects
        return cls(**config)

# Instantiate the model
model = CustomClassifier(num_hidden_units=10)

# Build the model by calling it once (or use model.build)
# This triggers the build methods of the internal layers
_ = model(tf.keras.Input(shape=(X_train.shape[1],)))
model.summary()

Here, we define the layers in __init__ and specify how data flows through them in the call method. model.summary() confirms our custom layer is part of the architecture.

3. Implementing a Custom Loss Function

Let's define a simple custom loss function. We'll implement a basic binary cross-entropy manually. While tf.keras.losses.BinaryCrossentropy exists, this shows the process.

def manual_binary_crossentropy(y_true, y_pred):
    """Calculates binary cross-entropy loss manually."""
    # Add a small epsilon to prevent log(0)
    epsilon = tf.keras.backend.epsilon()
    y_pred = tf.clip_by_value(y_pred, epsilon, 1. - epsilon)

    # Calculate loss term for positive instances
    loss_pos = y_true * tf.math.log(y_pred)
    # Calculate loss term for negative instances
    loss_neg = (1. - y_true) * tf.math.log(1. - y_pred)

    # Combine and compute the mean loss over the batch
    loss = -tf.reduce_mean(loss_pos + loss_neg)
    return loss

# Example usage with dummy data:
y_true_ex = tf.constant([[1.], [0.], [1.], [0.]], dtype=tf.float32)
y_pred_ex = tf.constant([[0.9], [0.2], [0.8], [0.1]], dtype=tf.float32)
loss_value = manual_binary_crossentropy(y_true_ex, y_pred_ex)
print(f"\nCustom Loss Example: {loss_value.numpy()}")

# Compare with Keras implementation (should be very close)
bce = tf.keras.losses.BinaryCrossentropy()
keras_loss_value = bce(y_true_ex, y_pred_ex)
print(f"Keras BCE Loss Example: {keras_loss_value.numpy()}")

This function takes true labels and predictions, calculates the cross-entropy term by term, and averages over the batch. It mirrors the standard definition. For more complex losses involving layer weights or internal model states, you might subclass tf.keras.losses.Loss.

4. Writing a Custom Training Loop

Now, we orchestrate the training process using tf.GradientTape. This gives us explicit control over each step.

# Hyperparameters
learning_rate = 0.01
epochs = 20

# Optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)

# Metrics to track
train_loss_metric = tf.keras.metrics.Mean(name='train_loss')
train_accuracy_metric = tf.keras.metrics.BinaryAccuracy(name='train_accuracy')
test_loss_metric = tf.keras.metrics.Mean(name='test_loss')
test_accuracy_metric = tf.keras.metrics.BinaryAccuracy(name='test_accuracy')

# The core training step, decorated with tf.function for performance
@tf.function
def train_step(features, labels):
    with tf.GradientTape() as tape:
        # Forward pass
        predictions = model(features, training=True)
        # Calculate loss using our custom function
        loss = manual_binary_crossentropy(labels, predictions)
        # Add potential regularization losses from the model/layers
        if model.losses: # Important if layers add regularization losses
          loss += tf.add_n(model.losses)

    # Calculate gradients
    gradients = tape.gradient(loss, model.trainable_variables)
    # Apply gradients to update weights
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))

    # Update training metrics
    train_loss_metric.update_state(loss)
    train_accuracy_metric.update_state(labels, predictions)

# The testing/evaluation step
@tf.function
def test_step(features, labels):
    # Forward pass in inference mode
    predictions = model(features, training=False)
    # Calculate loss
    loss = manual_binary_crossentropy(labels, predictions)

    # Update test metrics
    test_loss_metric.update_state(loss)
    test_accuracy_metric.update_state(labels, predictions)

# History dictionary to store metrics per epoch
history = {'loss': [], 'accuracy': [], 'val_loss': [], 'val_accuracy': []}

# The main training loop
print("\nStarting Custom Training Loop...")
for epoch in range(epochs):
    # Reset metrics at the start of each epoch
    train_loss_metric.reset_state()
    train_accuracy_metric.reset_state()
    test_loss_metric.reset_state()
    test_accuracy_metric.reset_state()

    # Iterate over training batches
    for batch_features, batch_labels in train_dataset:
        train_step(batch_features, batch_labels)

    # Iterate over testing batches for validation
    for batch_features, batch_labels in test_dataset:
        test_step(batch_features, batch_labels)

    # Get metric results
    epoch_loss = train_loss_metric.result()
    epoch_acc = train_accuracy_metric.result()
    epoch_val_loss = test_loss_metric.result()
    epoch_val_acc = test_accuracy_metric.result()

    # Store history
    history['loss'].append(epoch_loss.numpy())
    history['accuracy'].append(epoch_acc.numpy())
    history['val_loss'].append(epoch_val_loss.numpy())
    history['val_accuracy'].append(epoch_val_acc.numpy())

    # Print progress
    print(f"Epoch {epoch + 1}/{epochs} - "
          f"Loss: {epoch_loss:.4f} - Accuracy: {epoch_acc:.4f} - "
          f"Val Loss: {epoch_val_loss:.4f} - Val Accuracy: {epoch_val_acc:.4f}")

print("Custom Training Loop Finished.")

Important aspects of the custom loop:

tf.GradientTape: Records operations executed within its context to enable automatic differentiation.
Forward Pass: model(features, training=True) executes the model's call method. Setting training=True is important for layers like Dropout or BatchNormalization that behave differently during training and inference.
Loss Calculation: Uses our manual_binary_crossentropy function. We also check for and add any regularization losses defined within the model or its layers (model.losses).
Gradient Calculation: tape.gradient(loss, model.trainable_variables) computes the gradients of the loss with respect to the model's trainable parameters.
Weight Update: optimizer.apply_gradients() applies the computed gradients to update the model's weights according to the optimizer's algorithm (Adam, in this case).
Metrics: tf.keras.metrics objects are used to accumulate statistics (like mean loss or accuracy) across batches and epochs. Remember to reset_state() at the beginning of each epoch.
@tf.function Decorator: Compiles the Python function (train_step, test_step) into a callable TensorFlow graph. This generally provides significant performance improvements by reducing Python overhead and enabling graph optimizations.

Visualizing Training Progress

We can use the history dictionary to plot the training and validation metrics.

epochs_range = range(1, epochs + 1)

# Plotting Loss
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, history['loss'], label='Training Loss', color='#1c7ed6', marker='o')
plt.plot(epochs_range, history['val_loss'], label='Validation Loss', color='#f76707', marker='x')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.6)

# Plotting Accuracy
plt.subplot(1, 2, 2)
plt.plot(epochs_range, history['accuracy'], label='Training Accuracy', color='#1c7ed6', marker='o')
plt.plot(epochs_range, history['val_accuracy'], label='Validation Accuracy', color='#f76707', marker='x')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.6)

plt.tight_layout()
plt.show()

Training and validation loss and accuracy curves over epochs.

This visualization helps assess model convergence and identify potential overfitting (where training performance keeps improving, but validation performance stagnates or worsens).

Summary

This practical exercise demonstrated how to integrate several advanced TensorFlow/Keras features:

We defined a MySimpleDense layer by subclassing tf.keras.layers.Layer, managing its weights and defining its forward pass.
We created a CustomClassifier model by subclassing tf.keras.Model, incorporating our custom layer and defining the model's structure.
We implemented a manual_binary_crossentropy function, showing how custom loss calculations can be integrated.
We built a custom training loop using tf.GradientTape, controlling the gradient computation, weight updates, and metric tracking explicitly.

Mastering these techniques provides the foundation for implementing highly customized architectures, loss functions, and training procedures necessary for cutting-edge research or specialized application requirements that go further than the standard model.fit() workflow.

Was this section helpful?