Long Short-Term Memory (LSTM) cells are recurrent neural network units designed to handle sequence data effectively. They achieve this through an internal structure that includes forget, input, and output gates () and a separate cell state () for managing information over time. In deep learning frameworks, applying this architecture is streamlined by libraries like TensorFlow (with its Keras API) and PyTorch. These libraries offer high-level abstractions, enabling the integration of LSTMs into models without requiring manual implementation of the gate logic.
These libraries offer pre-built LSTM layers that encapsulate the complex computations we discussed. Our focus shifts from the gate equations to understanding how to correctly configure and connect these layers within a larger neural network.
TensorFlow, through its user-friendly Keras API, provides the tf.keras.layers.LSTM layer. Instantiating this layer is straightforward.
import tensorflow as tf
# Example: Creating an LSTM layer
lstm_layer = tf.keras.layers.LSTM(units=64)
In this example, units=64 specifies the dimensionality of the output space, which also corresponds to the size of the hidden state () and the cell state (). Let's look at some significant parameters for the LSTM layer:
units: (Required) Positive integer, dimensionality of the output space (and hidden/cell state).activation: Activation function to use for the cell state update and output gate. Defaults to 'tanh'. The choice of 'tanh' helps keep the cell state values bounded between -1 and 1.recurrent_activation: Activation function to use for the input, forget, and output gates. Defaults to 'sigmoid'. The sigmoid function is suitable here as gates typically output values between 0 and 1, representing proportions or probabilities (e.g., how much to forget).return_sequences: Boolean. If True, the layer returns the full sequence of hidden states for each time step (). If False (default), it only returns the final hidden state (). Returning the full sequence is necessary when stacking LSTM layers or when the output requires information from every time step (e.g., sequence-to-sequence tasks).return_state: Boolean. If True, the layer returns the last hidden state and the last cell state () in addition to the outputs. This is useful for initializing the state of another LSTM layer, particularly in encoder-decoder architectures. Defaults to False.input_shape: (Optional, typically needed for the first layer in a Sequential model) A tuple specifying the shape of the input, excluding the batch size. For sequence data, this is usually (timesteps, features). For example, input_shape=(10, 32) means sequences of 10 time steps, each with 32 features.Input Shape: Keras LSTM layers expect input data in a 3D tensor format: (batch_size, timesteps, features).
batch_size: The number of sequences processed concurrently.timesteps: The length of each sequence.features: The number of features representing the input at each time step.Output Shape:
return_sequences=False (default): The output is a 2D tensor of shape (batch_size, units).return_sequences=True: The output is a 3D tensor of shape (batch_size, timesteps, units).return_state=True: The layer returns a list containing [outputs, final_hidden_state, final_cell_state]. The shape of outputs depends on return_sequences, while final_hidden_state and final_cell_state both have shape (batch_size, units).Here's a minimal example of using an LSTM layer within a Keras Sequential model:
# Define sample input shape (e.g., 32 sequences, 10 time steps, 8 features)
batch_size = 32
timesteps = 10
features = 8
input_data = tf.random.normal((batch_size, timesteps, features))
# Create a simple model with one LSTM layer
model = tf.keras.Sequential([
tf.keras.layers.LSTM(units=64, input_shape=(timesteps, features), return_sequences=True),
# Potentially add more layers here
tf.keras.layers.Dense(1) # Example output layer
])
# Get the output
output = model(input_data)
print("Input shape:", input_data.shape)
# Output shape depends on the last layer's config (return_sequences=True here)
print("LSTM Output shape:", model.layers[0].output.shape)
print("Final Output shape:", output.shape)
PyTorch provides the torch.nn.LSTM layer. Its initialization differs slightly from Keras but serves the same purpose.
import torch
import torch.nn as nn
# Example: Creating an LSTM layer
# input_size = number of features per time step
# hidden_size = number of units in the hidden/cell state
input_size = 8
hidden_size = 64
lstm_layer = nn.LSTM(input_size=input_size, hidden_size=hidden_size, batch_first=True)
Significant parameters for torch.nn.LSTM:
input_size: (Required) The number of expected features in the input at each time step.hidden_size: (Required) The number of features in the hidden state (and cell state ). This corresponds to units in Keras.num_layers: Number of recurrent layers. Stacking layers is done via this parameter. Defaults to 1.batch_first: Boolean. If True (recommended and common), the input and output tensors are provided as (batch_size, seq_len, features). If False (default), the format is (seq_len, batch_size, features). Using batch_first=True often feels more intuitive and aligns with how data is typically handled in other parts of the pipeline and with Keras' default.dropout: If non-zero, introduces a Dropout layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to dropout. Defaults to 0.bidirectional: If True, becomes a bidirectional LSTM. Defaults to False. We will discuss this later.Input Shape:
batch_first=True: Input shape is (batch_size, seq_len, input_size).batch_first=False: Input shape is (seq_len, batch_size, input_size).Output Shape: The nn.LSTM layer returns a tuple: (output, (h_n, c_n)).
output: Contains the output features () from the last layer of the LSTM, for each time step.
batch_first=True: Shape is (batch_size, seq_len, num_directions * hidden_size).batch_first=False: Shape is (seq_len, batch_size, num_directions * hidden_size). (num_directions is 2 if bidirectional=True, else 1).h_n: Contains the final hidden state for each element in the batch. Shape is (num_layers * num_directions, batch_size, hidden_size).c_n: Contains the final cell state for each element in the batch. Shape is (num_layers * num_directions, batch_size, hidden_size).Note that output in PyTorch always contains the hidden states for all time steps (similar to return_sequences=True in Keras). If you only need the final hidden state, you typically index into the output tensor (e.g., output[:, -1, :] if batch_first=True) or use h_n.
Here's a minimal PyTorch example:
# Define sample input shape (batch_first=True)
batch_size = 32
seq_len = 10
input_size = 8 # features
hidden_size = 64
input_data = torch.randn(batch_size, seq_len, input_size)
# Create an LSTM layer
lstm_layer = nn.LSTM(input_size=input_size, hidden_size=hidden_size, batch_first=True)
# Pass data through the layer
# We can optionally provide initial hidden/cell states (h_0, c_0)
# If not provided, they default to zeros.
output, (h_n, c_n) = lstm_layer(input_data)
print("Input shape:", input_data.shape)
print("Output shape (all timesteps):", output.shape) # (batch, seq_len, hidden_size)
print("Final hidden state shape (h_n):", h_n.shape) # (num_layers*num_directions, batch, hidden_size)
print("Final cell state shape (c_n):", c_n.shape) # (num_layers*num_directions, batch, hidden_size)
# To get only the last time step's output from the 'output' tensor:
last_step_output = output[:, -1, :]
print("Last time step output shape:", last_step_output.shape) # (batch, hidden_size)
By leveraging these high-level LSTM layers, we can easily incorporate the power of LSTMs into our sequence models. The frameworks handle the intricate gate calculations, allowing us to focus on the overall model architecture, parameter tuning (like the number of units or hidden_size), and preparing the data in the expected (batch, timesteps, features) format. The next sections will build upon this by exploring GRU layers, stacking recurrent layers, and implementing bidirectional processing.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with