Layers are the fundamental building blocks of Keras models, much like bricks in a wall. Each layer performs a specific transformation on the data passing through it. By stacking these layers in meaningful ways, you construct your neural network architecture. Keras provides a comprehensive set of pre-built layers, simplifying the implementation of even complex models. Let's look at some of the most frequently used ones.
The Dense
layer, also known as a fully connected layer, is one of the most basic and common layer types. Each neuron in a Dense
layer receives input from all neurons in the previous layer (hence "fully connected"). It computes a weighted sum of its inputs, adds a bias, and then typically applies an activation function.
Mathematically, the operation performed by a Dense
layer can be represented as:
Where:
input
represents the input tensor.kernel
is the weights matrix created and managed by the layer.bias
is a bias vector created and managed by the layer (unless use_bias
is set to False
).activation
is the element-wise activation function applied.Common uses for Dense
layers include:
softmax
activation to output probabilities for each class.You configure a Dense
layer primarily by specifying the number of output neurons (units) and the activation function.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
# Example: A Dense layer with 64 output units and ReLU activation
dense_layer = layers.Dense(units=64, activation='relu')
# Example: The final Dense layer for a 10-class classification problem
output_layer = layers.Dense(units=10, activation='softmax')
Convolutional layers, particularly Conv2D
, are the cornerstone of modern computer vision models. They are designed specifically to process grid-like data, such as images (2D grids of pixels).
Instead of being fully connected, a Conv2D
layer uses learnable filters (or kernels) that slide across the input spatial dimensions (height and width), applying a convolution operation. Each filter detects specific local patterns (like edges, corners, textures) in its receptive field. The layer outputs feature maps, where each map corresponds to the responses of one filter across the input.
Key parameters for Conv2D
:
filters
: The number of output filters (feature maps) the layer should learn.kernel_size
: An integer or tuple specifying the height and width of the convolutional window (filter). Common sizes are (3, 3) or (5, 5).strides
: An integer or tuple specifying the step size the kernel takes as it slides across the input. Default is (1, 1).padding
: Either 'valid'
(no padding) or 'same'
(output has the same spatial dimensions as input, padding applied if needed).activation
: The activation function to use after the convolution.Convolutional layers excel at capturing spatial hierarchies; layers closer to the input learn simple patterns, while deeper layers combine these to learn more complex structures.
# Example: A 2D convolutional layer
conv_layer = layers.Conv2D(filters=32, kernel_size=(3, 3), activation='relu', padding='same')
# Example for input shape (batch_size, height, width, channels)
# Assuming input shape is (None, 28, 28, 1) for MNIST images
input_shape = (28, 28, 1)
model = keras.Sequential([
layers.Input(shape=input_shape),
layers.Conv2D(filters=32, kernel_size=(3, 3), activation='relu')
# ... other layers
])
# print(model.summary()) # Would show output shape
Pooling layers are often used in conjunction with convolutional layers to downsample the feature maps. MaxPooling2D
is a common choice. It reduces the spatial dimensions (height and width) of the input, while retaining the most salient information (the maximum value) within each pooling window.
Benefits of Max Pooling:
Key parameters for MaxPooling2D
:
pool_size
: An integer or tuple specifying the size of the pooling window. Common is (2, 2).strides
: An integer or tuple specifying the step size of the pooling window. If None
, it defaults to pool_size
.padding
: 'valid'
or 'same'
.# Example: A Max Pooling layer
max_pool_layer = layers.MaxPooling2D(pool_size=(2, 2))
# Often used after a Conv2D layer
model = keras.Sequential([
layers.Input(shape=(128, 128, 3)), # Example input: 128x128 RGB image
layers.Conv2D(filters=64, kernel_size=(3, 3), activation='relu', padding='same'),
layers.MaxPooling2D(pool_size=(2, 2)) # Output feature map size becomes 64x64
# ... other layers
])
# print(model.summary()) # Would show reduced dimensions
The Flatten
layer serves a simple but important purpose: it transforms a multi-dimensional tensor into a one-dimensional tensor (a vector). This is typically needed when transitioning from convolutional/pooling layers, which operate on spatial data, to Dense
layers, which operate on vectors.
It takes the input tensor and reshapes it by unrolling all dimensions except the batch dimension (usually the first dimension). For example, an input of shape (batch_size, height, width, channels)
would be flattened to (batch_size, height * width * channels)
.
# Example: Flattening the output of Conv/Pool layers before a Dense layer
model = keras.Sequential([
layers.Input(shape=(28, 28, 1)),
layers.Conv2D(32, (3,3), activation='relu'),
layers.MaxPooling2D((2,2)),
layers.Flatten(), # Flattens the 3D feature map to 1D
layers.Dense(10, activation='softmax') # Connect to a Dense layer
])
# print(model.summary()) # Shows the shape change after Flatten
Dropout
is a regularization technique used to prevent overfitting in neural networks. During training, the Dropout
layer randomly sets a fraction of its input units to 0 at each update step. This forces the network to learn more robust features that are not overly reliant on any single neuron.
The key parameter is rate
, which specifies the fraction of input units to drop (e.g., rate=0.25
means 25% of inputs are dropped).
Importantly, Dropout
is only active during training. During evaluation or inference (model.evaluate()
or model.predict()
), the layer simply passes through all inputs, but scales them down by a factor equal to the rate
to balance out the fact that more units are active than during training.
# Example: Applying Dropout after a Dense layer
dropout_layer = layers.Dropout(rate=0.5) # Drops 50% of inputs during training
model = keras.Sequential([
layers.Input(shape=(784,)),
layers.Dense(128, activation='relu'),
layers.Dropout(0.3), # Apply 30% dropout
layers.Dense(64, activation='relu'),
layers.Dropout(0.3), # Apply 30% dropout
layers.Dense(10, activation='softmax')
])
The LSTM
(Long Short-Term Memory) layer is a type of Recurrent Neural Network (RNN) layer specifically designed to handle sequential data, such as time series or natural language. Standard RNNs can struggle to capture long-range dependencies in sequences due to the vanishing gradient problem.
LSTMs address this with an internal cell state and gating mechanisms (input gate, forget gate, output gate). These gates control the flow of information, allowing the LSTM to selectively remember relevant information over long periods and forget irrelevant details.
While the internal workings are complex, using an LSTM
layer in Keras is straightforward. You primarily specify the number of units
(dimensionality of the output space, also the internal hidden state).
# Example: An LSTM layer for sequence processing
lstm_layer = layers.LSTM(units=64)
# Often used for tasks like text classification or time series forecasting
# Input shape might be (batch_size, timesteps, features)
model = keras.Sequential([
layers.Input(shape=(None, 10)), # Variable sequence length, 10 features per step
layers.LSTM(32), # 32 LSTM units
layers.Dense(1, activation='sigmoid') # Example: binary classification output
])
# print(model.summary())
These are just some of the workhorse layers available in tf.keras.layers
. Keras offers many others, including different types of recurrent layers (GRU, SimpleRNN), convolutional layers (Conv1D, Conv3D), normalization layers (BatchNormalization), and attention layers. Understanding these common layers provides a solid foundation for building a wide variety of neural network models using Keras. As you progress, you'll learn how to combine them creatively to tackle different machine learning problems.
© 2025 ApX Machine Learning