All Courses

Model Summary and Visualization

Once you have defined the architecture of your neural network using either the Sequential or Functional API, it's often helpful to inspect its structure. Understanding the layers, their connections, the shape of the data as it flows through the network, and the number of parameters is important for debugging, verifying your design, and estimating computational complexity. Keras provides convenient utilities for this purpose.

Using model.summary()

The most straightforward way to get an overview of your model is the summary() method. It provides a text-based description of the model, layer by layer. Let's consider a simple Sequential model:

import keras
from keras import layers

# Define a simple Sequential model
model = keras.Sequential(
    [
        keras.Input(shape=(784,), name="input_layer"), # Input layer specifying the shape
        layers.Dense(128, activation="relu", name="hidden_layer_1"),
        layers.Dense(64, activation="relu", name="hidden_layer_2"),
        layers.Dense(10, activation="softmax", name="output_layer"),
    ],
    name="simple_classifier",
)

# Print the model summary
model.summary()

Running model.summary() will output something like this:

Model: "simple_classifier"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Layer (type)                    ┃ Output Shape              ┃    Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ hidden_layer_1 (Dense)          │ (None, 128)               │    100,480 │
├─────────────────────────────────┼───────────────────────────┼────────────┤
│ hidden_layer_2 (Dense)          │ (None, 64)                │      8,256 │
├─────────────────────────────────┼───────────────────────────┼────────────┤
│ output_layer (Dense)            │ (None, 10)                │        650 │
└─────────────────────────────────┴───────────────────────────┴────────────┘
 Total params: 109,386 (427.29 KB)
 Trainable params: 109,386 (427.29 KB)
 Non-trainable params: 0 (0.00 B)

Let's break down this output:

Model Name: The name given to the model (simple_classifier in this case).
Layer Name and Type: Each row corresponds to a layer in the model, showing its user-defined name (if provided, otherwise Keras generates one) and its type (e.g., Dense). Note that the Input object itself isn't listed as a layer in the summary table, but the input shape is used to calculate the parameters of the first connected layer.
Output Shape: This column shows the shape of the output tensor produced by each layer. The None dimension represents the batch size, which is typically flexible and not determined until training time. For example, (None, 128) means the first Dense layer outputs a tensor where each sample in the batch has a shape of (128,).
Param #: This shows the total number of trainable parameters (weights and biases) in each layer. Let's see how these are calculated for Dense layers:
- hidden_layer_1: Input shape is (784,). The layer has 128 units. The number of weights is $input\_units \times layer\_units = 784 \times 128 = 100352$ . Each unit also has a bias term, so total biases = 128. Total parameters = $100352 + 128 = 100480$ .
- hidden_layer_2: Input shape is the output shape of the previous layer, (128,). The layer has 64 units. Weights = $128 \times 64 = 8192$ . Biases = 64. Total parameters = $8192 + 64 = 8256$ .
- output_layer: Input shape is (64,). The layer has 10 units. Weights = $64 \times 10 = 640$ . Biases = 10. Total parameters = $640 + 10 = 650$ .
Total Params: The sum of parameters across all layers.
Trainable Params: The number of parameters that will be updated during training via backpropagation.
Non-trainable Params: Parameters that are not updated during training (e.g., parameters in a frozen layer, often used in transfer learning).

The summary() method is invaluable for quickly checking if your layers are connected as expected, if the output shapes make sense, and for getting a feel for the model's size.

Visualizing Model Architecture

While summary() is useful, a visual diagram can often provide a clearer picture of the model's structure, especially for more complex architectures built with the Functional API involving multiple inputs, outputs, or shared layers. Keras provides the keras.utils.plot_model function for this.

To use plot_model, you might need to install additional libraries: pydot and graphviz. You can typically install them using pip:

pip install pydot graphviz

(Note: You might also need to install the Graphviz binaries separately on your operating system if the Python package doesn't include them. Check the Graphviz documentation for installation instructions.)

Once the dependencies are ready, you can plot your model:

# Assuming the 'model' variable holds the Keras model defined earlier
keras.utils.plot_model(
    model,
    to_file="simple_classifier_model.png", # Save the plot to a file
    show_shapes=True,             # Display shape information
    show_layer_names=True,        # Display layer names
    show_layer_activations=True,  # Display activation functions
    rankdir="TB"                  # Orientation: TB=Top-to-Bottom, LR=Left-to-Right
)

This code generates an image file (simple_classifier_model.png) containing a diagram of the network. The show_shapes, show_layer_names, and show_layer_activations arguments add useful details to the nodes in the graph.

Here's a representation of what plot_model might generate for our simple classifier, represented using graphviz dot language:

A diagram representing the simple classifier model. Nodes show layer names, types, output shapes, and activations. Arrows indicate the flow of data.

Visualizing the model is particularly beneficial when working with the Functional API, as it clearly illustrates the connections between layers, which can be non-linear. It helps confirm that you've connected the layers correctly, especially in models with branches or multiple inputs/outputs.

Both summary() and plot_model are essential tools in your Keras toolbox for inspecting, understanding, and debugging the neural network architectures you build.

Was this section helpful?