While the Sequential
API provides a straightforward way to build models where data flows linearly through a stack of layers, many real-world applications require more complex architectures. You might need models that:
For these scenarios, Keras offers the Functional API. It's a more flexible way to define models where you treat layers as functions that operate on tensors and connect them directly to build a graph.
Think of a Keras layer instance as a callable object. You pass it an input tensor (or tensors), and it returns an output tensor (or tensors).
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
# Example: A Dense layer instance
dense_layer = layers.Dense(units=64, activation='relu')
# We need an input tensor first (symbolic tensor)
# Shape: (batch_size, input_features) - batch_size is often None here
input_tensor = keras.Input(shape=(784,))
# Call the layer on the input tensor
output_tensor = dense_layer(input_tensor)
print(f"Input Tensor Shape: {input_tensor.shape}")
print(f"Output Tensor Shape: {output_tensor.shape}")
The keras.Input
object creates a symbolic tensor-like object that represents the model's entry point. It defines the expected shape and data type (dtype
) of the input data. You then connect layers by calling them sequentially, passing the output tensor of one layer as the input tensor to the next.
Finally, you define a keras.Model
by specifying the model's input(s) and output(s).
# Define the complete model
model = keras.Model(inputs=input_tensor, outputs=output_tensor)
# Display the model structure
model.summary()
This simple example using the Functional API creates the exact same single-layer architecture as keras.Sequential([layers.Dense(64, activation='relu', input_shape=(784,))])
. The power comes when we move beyond simple linear stacks.
Imagine you want to build a model that predicts the priority of a support ticket based on both its textual description (processed perhaps through an embedding and LSTM) and some categorical metadata (like ticket type, source). The Functional API makes this natural.
Input
layer for each type of input.layers.Concatenate
).Model
with a list of inputs and the final output(s).# Define inputs
text_input = keras.Input(shape=(None,), dtype='int32', name='text') # Variable length sequence of integers
metadata_input = keras.Input(shape=(5,), dtype='float32', name='metadata') # 5 metadata features
# Text processing branch
text_features = layers.Embedding(input_dim=10000, output_dim=64)(text_input)
text_features = layers.LSTM(32)(text_features)
# Metadata processing branch (optional, could directly concatenate)
metadata_features = layers.Dense(16, activation='relu')(metadata_input)
# Combine branches
combined_features = layers.Concatenate()([text_features, metadata_features])
# Output layer
priority_output = layers.Dense(1, activation='sigmoid', name='priority')(combined_features)
# Create the model
multi_input_model = keras.Model(
inputs=[text_input, metadata_input],
outputs=priority_output
)
# Visualize the model structure (requires pydot and graphviz)
# keras.utils.plot_model(multi_input_model, "multi_input_model.png", show_shapes=True)
A model architecture with two distinct input branches that are later merged.
When training this model using model.fit()
, you would provide input data as a list or dictionary matching the defined inputs:
# Dummy data generation (replace with actual data loading)
import numpy as np
num_samples = 100
dummy_text = np.random.randint(1, 10000, size=(num_samples, 50)) # Max sequence length 50
dummy_metadata = np.random.rand(num_samples, 5)
dummy_priority = np.random.randint(0, 2, size=(num_samples, 1))
# Training call structure (illustrative)
# multi_input_model.compile(optimizer='adam', loss='binary_crossentropy')
# multi_input_model.fit(
# {'text': dummy_text, 'metadata': dummy_metadata}, # Input dictionary
# {'priority': dummy_priority}, # Output dictionary
# epochs=5,
# batch_size=32
# )
# Alternatively, use a list for inputs if the order is consistent:
# multi_input_model.fit([dummy_text, dummy_metadata], dummy_priority, ...)
Similarly, a model might need to predict multiple things from the same input. For instance, an image analysis model could classify the main object and predict its bounding box coordinates.
Model
with the input(s) and a list of outputs.# Input
image_input = keras.Input(shape=(128, 128, 3), name='image')
# Shared convolutional base
x = layers.Conv2D(32, 3, activation='relu')(image_input)
x = layers.MaxPooling2D(2)(x)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.MaxPooling2D(2)(x)
base_output = layers.Flatten()(x) # Common features
# Branch 1: Classification head
class_output = layers.Dense(10, activation='softmax', name='class_label')(base_output)
# Branch 2: Bounding box regression head
bbox_output = layers.Dense(4, name='bounding_box')(base_output) # 4 coords: x, y, width, height
# Create the model
multi_output_model = keras.Model(
inputs=image_input,
outputs=[class_output, bbox_output]
)
# Visualize
# keras.utils.plot_model(multi_output_model, "multi_output_model.png", show_shapes=True)
A model with a shared convolutional base splitting into two output heads: classification and regression.
When compiling this model, you typically provide separate loss functions and potentially loss weights for each output.
# Dummy data
num_samples = 100
dummy_images = np.random.rand(num_samples, 128, 128, 3)
dummy_classes = np.random.randint(0, 10, size=(num_samples, 1))
dummy_classes_one_hot = tf.keras.utils.to_categorical(dummy_classes, num_classes=10)
dummy_bboxes = np.random.rand(num_samples, 4)
# Compile with multiple losses and potentially weights
# multi_output_model.compile(
# optimizer='adam',
# loss={
# 'class_label': 'categorical_crossentropy',
# 'bounding_box': 'mse' # Mean Squared Error for regression
# },
# loss_weights={'class_label': 1.0, 'bounding_box': 0.5} # Example weighting
# )
# Training call structure (illustrative)
# multi_output_model.fit(
# {'image': dummy_images},
# {'class_label': dummy_classes_one_hot, 'bounding_box': dummy_bboxes},
# epochs=5,
# batch_size=16
# )
# Alternatively, use a list for outputs if the order is consistent:
# multi_output_model.fit(dummy_images, [dummy_classes_one_hot, dummy_bboxes], ...)
The Functional API naturally supports layer sharing. You simply instantiate a layer once and call it multiple times on different inputs. The layer reuses the same set of weights for each call. This is common in models like Siamese networks or when applying the same processing to different inputs.
# Input tensors for two text sequences
input_a = keras.Input(shape=(None,), dtype='int32', name='text_a')
input_b = keras.Input(shape=(None,), dtype='int32', name='text_b')
# Shared embedding layer
shared_embedding = layers.Embedding(input_dim=10000, output_dim=128, name='shared_embed')
# Apply the shared layer to both inputs
encoded_a = shared_embedding(input_a)
encoded_b = shared_embedding(input_b)
# Example: Calculate cosine similarity after some processing (e.g., LSTM)
lstm_layer = layers.LSTM(64, name='lstm') # Can also be shared if needed
vector_a = lstm_layer(encoded_a)
# To share LSTM weights: vector_b = lstm_layer(encoded_b)
# To use separate LSTM weights:
lstm_layer_b = layers.LSTM(64, name='lstm_b')
vector_b = lstm_layer_b(encoded_b)
# Example output: Cosine similarity (requires custom layer or tf operations)
# For illustration, just concatenate
concatenated_vectors = layers.Concatenate()([vector_a, vector_b])
output = layers.Dense(1, activation='sigmoid', name='similarity')(concatenated_vectors)
# Create the model
shared_layer_model = keras.Model(inputs=[input_a, input_b], outputs=output)
# Visualize
# keras.utils.plot_model(shared_layer_model, "shared_layer_model.png", show_shapes=True)
A model using a single
Embedding
layer instance (shared_embed
) for two different text inputs. Note separate LSTMs are used here for illustration, but they could also be shared.
A prevalent pattern in deep learning is the residual connection (or skip connection), famously used in ResNet architectures. It involves adding the input of a block of layers to its output, helping gradients flow more easily during training and enabling deeper networks.
# Input
input_tensor = keras.Input(shape=(32, 32, 3))
# Initial convolution
x = layers.Conv2D(64, 3, padding='same', activation='relu')(input_tensor)
# Residual Block
residual = x # Store the input to the block
x = layers.Conv2D(64, 3, padding='same', activation='relu')(x)
x = layers.Conv2D(64, 3, padding='same')(x) # No activation before adding
# Add the residual connection
x = layers.Add()([x, residual])
x = layers.Activation('relu')(x) # Apply activation after adding
# Final layers (example)
x = layers.GlobalAveragePooling2D()(x)
output_tensor = layers.Dense(10, activation='softmax')(x)
# Model
resnet_like_model = keras.Model(inputs=input_tensor, outputs=output_tensor)
# Visualize
# keras.utils.plot_model(resnet_like_model, "resnet_like_model.png", show_shapes=True)
A simplified model structure demonstrating a residual connection where the output of
Conv2D_Initial
is added to the output ofConv2D_2
.
The layers.Add()
layer performs element-wise addition of a list of tensors (which must have compatible shapes).
The Functional API provides the flexibility needed to construct these sophisticated model architectures. While slightly more verbose than the Sequential
API for simple linear stacks, its ability to define complex graphs of layers makes it indispensable for advanced deep learning tasks. Once a Model
is defined using the Functional API, compiling it with losses/optimizers/metrics and training it with fit()
follows the same process you'll learn about in the next chapter.
© 2025 ApX Machine Learning