Let's put the concepts from this chapter into practice. We'll build a small machine learning pipeline featuring several custom components: a custom layer, a custom model structure defined via subclassing, a custom loss function, and a manually implemented training loop. This exercise demonstrates how to combine these elements to gain fine-grained control over your model's architecture and training dynamics.
Imagine you have a binary classification problem where you need a specific type of layer interaction and a loss function tailored to handle potential class imbalance or specific error costs. We'll simulate this with synthetic data.
First, let's generate some data:
import tensorflow as tf
import numpy as np
from sklearn.datasets import make_circles
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
# Generate synthetic data (non-linearly separable)
X, y = make_circles(n_samples=1000, noise=0.1, factor=0.5, random_state=42)
# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Reshape y to be a column vector for TF
y = y.reshape(-1, 1).astype(np.float32)
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X_scaled, y, test_size=0.2, random_state=42
)
# Convert to TensorFlow Datasets
BATCH_SIZE = 32
train_dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train))
train_dataset = train_dataset.shuffle(buffer_size=len(X_train)).batch(BATCH_SIZE)
test_dataset = tf.data.Dataset.from_tensor_slices((X_test, y_test))
test_dataset = test_dataset.batch(BATCH_SIZE)
print(f"X_train shape: {X_train.shape}, y_train shape: {y_train.shape}")
print(f"Data sample: {X_train[0]}, Label: {y_train[0]}")
Let's create a simple custom dense layer. While Keras provides tf.keras.layers.Dense
, building our own helps illustrate the mechanics of subclassing tf.keras.layers.Layer
. We'll call it MySimpleDense
.
class MySimpleDense(tf.keras.layers.Layer):
"""A basic dense layer implementation for demonstration."""
def __init__(self, units, activation=None, **kwargs):
super().__init__(**kwargs)
self.units = units
self.activation = tf.keras.activations.get(activation)
print(f"Initializing MySimpleDense with {units} units.")
def build(self, input_shape):
"""Create the layer's weights. Called the first time the layer is used."""
input_dim = input_shape[-1]
# Add weight variable
self.w = self.add_weight(
shape=(input_dim, self.units),
initializer="glorot_uniform", # Xavier uniform initializer
trainable=True,
name="kernel" # Standard name
)
# Add bias variable
self.b = self.add_weight(
shape=(self.units,),
initializer="zeros",
trainable=True,
name="bias" # Standard name
)
print(f"Building MySimpleDense: Input shape {input_shape}, Weight shape {self.w.shape}")
super().build(input_shape) # Ensure the build method of the parent class is called
def call(self, inputs):
"""Defines the forward pass logic of the layer."""
# Matrix multiplication: inputs @ w
z = tf.matmul(inputs, self.w) + self.b
if self.activation is not None:
return self.activation(z)
return z
def get_config(self):
"""Enables serialization."""
config = super().get_config()
config.update({
"units": self.units,
"activation": tf.keras.activations.serialize(self.activation)
})
return config
Key points:
__init__
: Stores configuration like the number of units and activation function. Doesn't create weights.build
: Creates the trainable weights (w
and b
) using add_weight
. This method is called automatically by Keras the first time the layer processes an input, inferring the input dimension.call
: Defines the layer's computation using the input tensor and the created weights.get_config
: Important for saving and loading models containing this custom layer.tf.keras.Model
Now, we'll build our model by subclassing tf.keras.Model
. This gives us maximum flexibility in defining the forward pass. Our model will use our MySimpleDense
layer.
class CustomClassifier(tf.keras.Model):
"""A simple classifier model using our custom dense layer."""
def __init__(self, num_hidden_units, name="custom_classifier", **kwargs):
super().__init__(name=name, **kwargs)
self.num_hidden_units = num_hidden_units
# Instantiate layers in __init__
self.hidden_layer = MySimpleDense(num_hidden_units, activation="relu")
self.output_layer = tf.keras.layers.Dense(1, activation="sigmoid") # Standard Dense for output
print("Initializing CustomClassifier model.")
def call(self, inputs, training=None):
"""Defines the forward pass logic of the model."""
x = self.hidden_layer(inputs)
# You could add more complex logic here if needed
return self.output_layer(x)
# Optional: define build if needed for complex input shape logic,
# but often __init__ and the first call are sufficient.
# Optional: customize train_step, test_step, predict_step if not using a custom loop
# (We will use a custom loop below, so we don't override these here)
def get_config(self):
"""Enables serialization."""
config = super().get_config()
config.update({"num_hidden_units": self.num_hidden_units})
return config
@classmethod
def from_config(cls, config):
# Need to handle custom layer deserialization if necessary
# For simple cases like this, Keras might handle it automatically
# if the custom layer is registered or passed via custom_objects
return cls(**config)
# Instantiate the model
model = CustomClassifier(num_hidden_units=10)
# Build the model by calling it once (or use model.build)
# This triggers the build methods of the internal layers
_ = model(tf.keras.Input(shape=(X_train.shape[1],)))
model.summary()
Here, we define the layers in __init__
and specify how data flows through them in the call
method. model.summary()
confirms our custom layer is part of the architecture.
Let's define a simple custom loss function. We'll implement a basic binary cross-entropy manually. While tf.keras.losses.BinaryCrossentropy
exists, this shows the process.
def manual_binary_crossentropy(y_true, y_pred):
"""Calculates binary cross-entropy loss manually."""
# Add a small epsilon to prevent log(0)
epsilon = tf.keras.backend.epsilon()
y_pred = tf.clip_by_value(y_pred, epsilon, 1. - epsilon)
# Calculate loss term for positive instances
loss_pos = y_true * tf.math.log(y_pred)
# Calculate loss term for negative instances
loss_neg = (1. - y_true) * tf.math.log(1. - y_pred)
# Combine and compute the mean loss over the batch
loss = -tf.reduce_mean(loss_pos + loss_neg)
return loss
# Example usage with dummy data:
y_true_ex = tf.constant([[1.], [0.], [1.], [0.]], dtype=tf.float32)
y_pred_ex = tf.constant([[0.9], [0.2], [0.8], [0.1]], dtype=tf.float32)
loss_value = manual_binary_crossentropy(y_true_ex, y_pred_ex)
print(f"\nCustom Loss Example: {loss_value.numpy()}")
# Compare with Keras implementation (should be very close)
bce = tf.keras.losses.BinaryCrossentropy()
keras_loss_value = bce(y_true_ex, y_pred_ex)
print(f"Keras BCE Loss Example: {keras_loss_value.numpy()}")
This function takes true labels and predictions, calculates the cross-entropy term by term, and averages over the batch. It mirrors the standard definition. For more complex losses involving layer weights or internal model states, you might subclass tf.keras.losses.Loss
.
Now, we orchestrate the training process using tf.GradientTape
. This gives us explicit control over each step.
# Hyperparameters
learning_rate = 0.01
epochs = 20
# Optimizer
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
# Metrics to track
train_loss_metric = tf.keras.metrics.Mean(name='train_loss')
train_accuracy_metric = tf.keras.metrics.BinaryAccuracy(name='train_accuracy')
test_loss_metric = tf.keras.metrics.Mean(name='test_loss')
test_accuracy_metric = tf.keras.metrics.BinaryAccuracy(name='test_accuracy')
# The core training step, decorated with tf.function for performance
@tf.function
def train_step(features, labels):
with tf.GradientTape() as tape:
# Forward pass
predictions = model(features, training=True)
# Calculate loss using our custom function
loss = manual_binary_crossentropy(labels, predictions)
# Add potential regularization losses from the model/layers
if model.losses: # Important if layers add regularization losses
loss += tf.add_n(model.losses)
# Calculate gradients
gradients = tape.gradient(loss, model.trainable_variables)
# Apply gradients to update weights
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
# Update training metrics
train_loss_metric.update_state(loss)
train_accuracy_metric.update_state(labels, predictions)
# The testing/evaluation step
@tf.function
def test_step(features, labels):
# Forward pass in inference mode
predictions = model(features, training=False)
# Calculate loss
loss = manual_binary_crossentropy(labels, predictions)
# Update test metrics
test_loss_metric.update_state(loss)
test_accuracy_metric.update_state(labels, predictions)
# History dictionary to store metrics per epoch
history = {'loss': [], 'accuracy': [], 'val_loss': [], 'val_accuracy': []}
# The main training loop
print("\nStarting Custom Training Loop...")
for epoch in range(epochs):
# Reset metrics at the start of each epoch
train_loss_metric.reset_state()
train_accuracy_metric.reset_state()
test_loss_metric.reset_state()
test_accuracy_metric.reset_state()
# Iterate over training batches
for batch_features, batch_labels in train_dataset:
train_step(batch_features, batch_labels)
# Iterate over testing batches for validation
for batch_features, batch_labels in test_dataset:
test_step(batch_features, batch_labels)
# Get metric results
epoch_loss = train_loss_metric.result()
epoch_acc = train_accuracy_metric.result()
epoch_val_loss = test_loss_metric.result()
epoch_val_acc = test_accuracy_metric.result()
# Store history
history['loss'].append(epoch_loss.numpy())
history['accuracy'].append(epoch_acc.numpy())
history['val_loss'].append(epoch_val_loss.numpy())
history['val_accuracy'].append(epoch_val_acc.numpy())
# Print progress
print(f"Epoch {epoch + 1}/{epochs} - "
f"Loss: {epoch_loss:.4f} - Accuracy: {epoch_acc:.4f} - "
f"Val Loss: {epoch_val_loss:.4f} - Val Accuracy: {epoch_val_acc:.4f}")
print("Custom Training Loop Finished.")
Key aspects of the custom loop:
tf.GradientTape
: Records operations executed within its context to enable automatic differentiation.model(features, training=True)
executes the model's call
method. Setting training=True
is important for layers like Dropout or BatchNormalization that behave differently during training and inference.manual_binary_crossentropy
function. We also check for and add any regularization losses defined within the model or its layers (model.losses
).tape.gradient(loss, model.trainable_variables)
computes the gradients of the loss with respect to the model's trainable parameters.optimizer.apply_gradients()
applies the computed gradients to update the model's weights according to the optimizer's algorithm (Adam, in this case).tf.keras.metrics
objects are used to accumulate statistics (like mean loss or accuracy) across batches and epochs. Remember to reset_state()
at the beginning of each epoch.@tf.function
Decorator: Compiles the Python function (train_step
, test_step
) into a callable TensorFlow graph. This generally provides significant performance improvements by reducing Python overhead and enabling graph optimizations.We can use the history
dictionary to plot the training and validation metrics.
epochs_range = range(1, epochs + 1)
# Plotting Loss
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, history['loss'], label='Training Loss', color='#1c7ed6', marker='o')
plt.plot(epochs_range, history['val_loss'], label='Validation Loss', color='#f76707', marker='x')
plt.title('Training and Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.6)
# Plotting Accuracy
plt.subplot(1, 2, 2)
plt.plot(epochs_range, history['accuracy'], label='Training Accuracy', color='#1c7ed6', marker='o')
plt.plot(epochs_range, history['val_accuracy'], label='Validation Accuracy', color='#f76707', marker='x')
plt.title('Training and Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True, linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()
Training and validation loss and accuracy curves over epochs.
This visualization helps assess model convergence and identify potential overfitting (where training performance keeps improving, but validation performance stagnates or worsens).
This practical exercise demonstrated how to integrate several advanced TensorFlow/Keras features:
MySimpleDense
layer by subclassing tf.keras.layers.Layer
, managing its weights and defining its forward pass.CustomClassifier
model by subclassing tf.keras.Model
, incorporating our custom layer and defining the model's structure.manual_binary_crossentropy
function, showing how custom loss calculations can be integrated.tf.GradientTape
, controlling the gradient computation, weight updates, and metric tracking explicitly.Mastering these techniques provides the foundation for implementing highly customized architectures, loss functions, and training procedures necessary for cutting-edge research or specialized application requirements that go beyond the standard model.fit()
workflow.
© 2025 ApX Machine Learning