Now that we understand the conceptual mechanics of a simple Recurrent Neural Network and how it learns via Backpropagation Through Time, let's translate this into practice using popular deep learning frameworks. Writing RNNs from scratch using only basic matrix operations is instructive but rarely done in practice. Frameworks like TensorFlow (using the Keras API) and PyTorch provide highly optimized, pre-built layers that handle the complexities of state management and gradient calculation for us.
These high-level APIs allow us to define recurrent layers with just a few lines of code, specifying the essential configuration parameters without needing to implement the underlying recurrence relation or BPTT manually. This abstraction significantly speeds up development and allows us to focus on model architecture and application.
tf.keras.layers.SimpleRNN
TensorFlow, through its high-level Keras API, provides the SimpleRNN
layer. This layer implements the basic recurrent cell structure we discussed previously.
To use it, you typically import it and instantiate it within a Keras model (usually a Sequential
model or using the Functional API).
import tensorflow as tf
# Define a SimpleRNN layer
# 'units' specifies the dimensionality of the hidden state and output space.
rnn_layer = tf.keras.layers.SimpleRNN(units=64)
# Example: Adding it as the first layer in a Sequential model
# Requires specifying the input shape: (time_steps, features)
# Note: Batch size is usually omitted here, handled implicitly.
model = tf.keras.Sequential([
tf.keras.layers.SimpleRNN(units=32, input_shape=(10, 5)) # 10 time steps, 5 features per step
# ... potentially add more layers like Dense for output
])
# You can inspect the layer configuration
# print(rnn_layer.get_config())
Key parameters for tf.keras.layers.SimpleRNN
:
units
: This is a required positive integer defining the number of hidden units in the RNN cell. This also determines the dimension of the output vector produced by the layer at each time step (if return_sequences=True
) or the final time step (if return_sequences=False
). Think of this as the size of the network's "memory".activation
: The activation function applied to the hidden state. The default is 'tanh'
, which is commonly used in simple RNNs as its output range (-1 to 1) can help mitigate exploding gradients to some extent compared to ReLU, although it's still susceptible to vanishing gradients. Other options like 'relu'
can also be used.return_sequences
: A boolean value (default: False
).
False
, the layer only returns the output from the last time step, with a shape of (batch_size, units)
. This is suitable when you only need a final summary of the sequence, for example, in sequence classification.True
, the layer returns the full sequence of outputs for each time step, with a shape of (batch_size, time_steps, units)
. This is necessary when stacking multiple recurrent layers or for sequence-to-sequence tasks where an output is needed at each step.return_state
: A boolean value (default: False
). If True
, the layer returns two additional outputs besides the main output: the hidden state(s) from the final time step. For SimpleRNN
, this is just one final hidden state tensor. This is less commonly needed for simple RNNs but is useful in more complex models like encoder-decoder architectures.input_shape
: Required only for the first layer in a Sequential
model. It's a tuple specifying the shape of the input sequence, excluding the batch size. The format is typically (time_steps, features)
. For subsequent layers, Keras automatically infers the input shape.torch.nn.RNN
PyTorch provides the torch.nn.RNN
module to create simple recurrent layers.
import torch
import torch.nn as nn
# Define an RNN layer
# input_size: Number of features in the input x at each time step
# hidden_size: Number of features in the hidden state h (equivalent to 'units' in Keras)
# batch_first=True makes input/output tensors shape (batch, seq_len, features)
rnn_layer = nn.RNN(input_size=10, hidden_size=20, batch_first=True)
# Example usage with dummy input:
# Input shape: (batch_size, sequence_length, input_features)
batch_size = 5
sequence_length = 15
input_features = 10
dummy_input = torch.randn(batch_size, sequence_length, input_features)
# Initialize hidden state (optional, defaults to zeros if not provided)
# Shape: (num_layers * num_directions, batch_size, hidden_size) -> (1, 5, 20) for this layer
initial_hidden_state = torch.randn(1, batch_size, 20)
# Pass input and initial hidden state through the layer
# Output: contains output features for each time step
# Final hidden state: contains the hidden state for the last time step
output, final_hidden_state = rnn_layer(dummy_input, initial_hidden_state)
print("Input shape:", dummy_input.shape) # torch.Size([5, 15, 10])
print("Output shape:", output.shape) # torch.Size([5, 15, 20])
print("Final hidden state shape:", final_hidden_state.shape) # torch.Size([1, 5, 20])
Key parameters for torch.nn.RNN
:
input_size
: The number of expected features in the input x for each time step.hidden_size
: The number of features in the hidden state h. This defines the dimensionality of the layer's internal memory and its output.num_layers
: The number of recurrent layers stacked vertically (default: 1). We'll discuss stacking later, but this allows creating deeper RNNs easily.nonlinearity
: The activation function. 'tanh'
(default) or 'relu'
. Similar considerations apply as with Keras's activation
.batch_first
: A boolean value (default: False
). This is an important parameter affecting the expected shape of input and output tensors.
False
, inputs and outputs are expected in the shape (sequence_length, batch_size, features)
.True
, inputs and outputs use the shape (batch_size, sequence_length, features)
. This is often more convenient as it aligns with how data is typically loaded and processed by other layer types. It's generally recommended to set batch_first=True
for consistency.bias
: Whether to include bias terms in the calculations (default: True
).When you pass an input tensor through a PyTorch RNN
layer, it returns two outputs:
output
: A tensor containing the output hidden state htā for each time step from the final recurrent layer. If batch_first=True
, its shape is (batch_size, sequence_length, hidden_size)
.h_n
: A tensor containing the final hidden state(s) for t = sequence_length (the last time step) for each layer in the stack. Its shape is (num_layers * num_directions, batch_size, hidden_size)
. Even for a single layer (num_layers=1
), it has this 3D shape.Note that unlike Keras where return_sequences
controls the output shape, PyTorch's nn.RNN
always provides the full sequence of outputs in the output
tensor. If you only need the final time step's output (equivalent to Keras' return_sequences=False
), you would manually select it from the output
tensor (e.g., output[:, -1, :]
if batch_first=True
). The h_n
output directly gives you the final hidden state.
Using these high-level APIs in TensorFlow/Keras and PyTorch allows us to readily incorporate simple RNN capabilities into our models. The next step involves understanding how to correctly shape our input data and manage the output shapes produced by these layers, which we will cover in the following section.
Ā© 2025 ApX Machine Learning