Now that we understand the fundamental concept of Recurrent Neural Networks processing sequences step-by-step while maintaining a memory, let's look at the simplest implementation available in Keras: the SimpleRNN
layer. This layer serves as a good starting point for understanding recurrent operations before moving to more complex variants.
The SimpleRNN
layer processes sequences by iterating through the time steps of the input sequence. At each time step t, it takes the input for that step, xt, and the hidden state from the previous step, ht−1, to compute the current hidden state, ht, and optionally an output, yt.
The core operation can be conceptually represented by the following update rule for the hidden state ht:
ht=activation(Wxhxt+Whhht−1+bh)And if an output is produced at each step (controlled by a parameter we'll discuss soon), it's typically calculated as:
yt=activationout(Whyht+by)Where:
SimpleRNN
, the default hidden state activation is hyperbolic tangent ('tanh').The key idea is the reuse of the weight matrices (Wxh, Whh, Why) and biases (bh, by) across all time steps. This parameter sharing makes RNNs efficient for sequences of varying lengths and allows them to generalize patterns learned at one point in the sequence to others.
Think of the hidden state ht as the network's memory at time step t. It captures information from all previous steps (0 to t−1) that the network deems relevant for processing the current step t and future steps.
SimpleRNN
in KerasYou can add a SimpleRNN
layer to your Keras models just like any other layer. It's available under keras.layers.SimpleRNN
.
import keras
from keras import layers
# Example: Adding a SimpleRNN layer to a Sequential model
model = keras.Sequential()
# Define the input shape: (timesteps, features_per_timestep)
# For example, a sequence of 10 time steps, each with 8 features.
model.add(keras.Input(shape=(10, 8)))
# Add a SimpleRNN layer with 32 units (dimensionality of hidden state/output)
# By default, it only returns the output of the *last* time step.
model.add(layers.SimpleRNN(units=32))
# You might add Dense layers after the RNN for classification/regression
model.add(layers.Dense(units=1, activation='sigmoid')) # Example for binary classification
model.summary()
units
: This is the most important argument. It defines the dimensionality of the hidden state and, consequently, the output space if return_sequences=False
. Choosing the right number of units is problem-dependent and often requires experimentation. A larger number allows the network to potentially store more complex patterns but increases computational cost and the risk of overfitting.activation
: Specifies the activation function to use for the hidden state computation. The default is 'tanh'
(hyperbolic tangent), which outputs values between -1 and 1. Other activations like 'relu'
can sometimes be used, but 'tanh'
is traditional for simple RNNs.input_shape
: As with other Keras layers, you need to specify the shape of the input for the first layer in your model. For RNNs, this shape is typically a tuple (timesteps, features)
, where timesteps
is the length of the sequence (can be None
for variable-length sequences) and features
is the number of features at each time step. For example, text data might have (sequence_length, embedding_dimension)
, while time series might have (time_periods, num_sensor_readings)
.return_sequences
: This boolean argument controls what the layer outputs:
False
(Default): The layer only outputs the hidden state for the final time step. The output shape will be (batch_size, units)
. This is common when you only need a summary of the entire sequence, for example, before a final Dense
layer for classification.True
: The layer outputs the hidden state for every time step in the sequence. The output shape will be (batch_size, timesteps, units)
. This is necessary if you want to stack multiple RNN layers (as the next RNN layer needs a sequence as input) or if you are building sequence-to-sequence models (like machine translation or time series forecasting where you predict values at each step).To stack SimpleRNN
layers, all preceding RNN layers must return their full sequence of outputs.
import keras
from keras import layers
# Example: Stacking SimpleRNN layers
model = keras.Sequential(name="Stacked_SimpleRNN")
model.add(keras.Input(shape=(None, 10))) # Variable length sequence, 10 features per step
# First SimpleRNN layer: MUST return sequences to feed the next RNN layer
model.add(layers.SimpleRNN(units=64, return_sequences=True))
# Second SimpleRNN layer: Can return only the last output if it's the final RNN layer
# before a Dense layer, or return sequences if followed by another RNN.
model.add(layers.SimpleRNN(units=32, return_sequences=False)) # Only last output
# Add a Dense layer for classification
model.add(layers.Dense(1, activation='sigmoid'))
model.summary()
SimpleRNN
The SimpleRNN
layer is conceptually straightforward and provides a basic mechanism for processing sequential information. It can work reasonably well for tasks where the relevant information is contained within relatively short-term dependencies in the sequence.
However, SimpleRNN
suffers from a significant limitation known as the vanishing gradient problem. During backpropagation, gradients can become exponentially smaller as they are propagated back through time. This makes it very difficult for the network to learn connections between events that are far apart in the sequence. Effectively, the network struggles to "remember" information from many time steps ago. Conversely, gradients can also explode (become very large), though this is often easier to manage with techniques like gradient clipping.
Because of the vanishing gradient issue, SimpleRNN
is often less effective than more advanced recurrent layers like LSTMs and GRUs for tasks requiring the capture of long-range dependencies, which are common in natural language processing, complex time series analysis, and more. We will explore these more powerful layers in the upcoming sections.
© 2025 ApX Machine Learning