Information moves through a Long Short-Term Memory (LSTM) cell in a specific flow during a single time step. This flow is essential for understanding how LSTMs manage to maintain context over long sequences, effectively addressing the shortcomings of simple Recurrent Neural Networks (RNNs).
At each time step , the LSTM cell receives three inputs:
These inputs interact with the gates and the cell state to produce two outputs:
Let's follow the data path:
The first step involves the forget gate (). Its job is to decide which parts of the old cell state are no longer relevant and should be discarded. It looks at the previous hidden state and the current input . A sigmoid activation function () squashes the output to a value between 0 and 1 for each number in the cell state vector.
Here, represents the concatenation of the two vectors. and are the weight matrix and bias vector for the forget gate, learned during training.
An output of 1 means "completely keep this information," while an output of 0 means "completely get rid of this information." This gate's output is then multiplied element-wise () with the previous cell state .
Next, the cell needs to determine what new information from the current input and previous hidden state should be added to the cell state. This involves two parts:
and are the respective weights and biases for these layers. The function outputs values between -1 and 1.
Now we update the old cell state to the new cell state . We combine the results from the forget gate and the input gate:
This additive interaction is a significant difference from the repeated matrix multiplications in simple RNNs. It allows gradients to flow more easily through time during backpropagation, reducing the vanishing gradient problem. The cell state acts like a conveyor belt, carrying information along with only minor linear interactions (multiplication by and addition of ), making it easier to preserve context over many steps.
Finally, we need to decide what the hidden state (and potentially the output for this time step) should be. This output will be a filtered version of the cell state .
The resulting is the hidden state passed to the next time step. It also serves as the cell's output at time step if needed for prediction. and are the weights and bias for the output gate.
The following diagram illustrates how these components connect and how data flows through an LSTM cell during one time step:
Diagram illustrating the flow of information and calculations within an LSTM cell for a single time step . Inputs , , are processed through forget (), input (), and output () gates, along with a candidate state (), to compute the new cell state and hidden state . Sigmoid () and activations control the gating and state updates. Element-wise multiplication () and addition (+) combine the intermediate results. Dashed red arrows indicate the passing of and to the next time step.
By carefully regulating what information is kept, discarded, added, and output at each step, the LSTM cell creates pathways for gradients to flow more effectively during training. The cell state acts as an explicit memory channel, protected by the gates, allowing the network to learn and remember information over extended periods, which is fundamental for tackling complex sequence modeling tasks.
Was this section helpful?
© 2026 ApX Machine LearningAI Ethics & Transparency•