Having grasped the fundamental components of an LSTM cell in the previous sections, let's trace how information actually moves through it during a single time step. Understanding this flow is essential to appreciating how LSTMs manage to maintain context over long sequences, addressing the shortcomings of simple RNNs.
At each time step t, the LSTM cell receives three inputs:
These inputs interact with the gates and the cell state to produce two outputs:
Let's follow the data path:
The first step involves the forget gate (ft). Its job is to decide which parts of the old cell state ct−1 are no longer relevant and should be discarded. It looks at the previous hidden state ht−1 and the current input xt. A sigmoid activation function (σ) squashes the output to a value between 0 and 1 for each number in the cell state vector.
ft=σ(Wf[ht−1,xt]+bf)Here, [ht−1,xt] represents the concatenation of the two vectors. Wf and bf are the weight matrix and bias vector for the forget gate, learned during training.
An output of 1 means "completely keep this information," while an output of 0 means "completely get rid of this information." This gate's output ft is then multiplied element-wise (⊙) with the previous cell state ct−1.
Next, the cell needs to determine what new information from the current input xt and previous hidden state ht−1 should be added to the cell state. This involves two parts:
Wi,bi and WC,bC are the respective weights and biases for these layers. The tanh function outputs values between -1 and 1.
Now we update the old cell state ct−1 to the new cell state ct. We combine the results from the forget gate and the input gate:
This additive interaction is a significant difference from the repeated matrix multiplications in simple RNNs. It allows gradients to flow more easily through time during backpropagation, reducing the vanishing gradient problem. The cell state acts like a conveyor belt, carrying information along with only minor linear interactions (multiplication by ft and addition of it⊙c~t), making it easier to preserve context over many steps.
Finally, we need to decide what the hidden state ht (and potentially the output for this time step) should be. This output will be a filtered version of the cell state ct.
The resulting ht is the hidden state passed to the next time step. It also serves as the cell's output at time step t if needed for prediction. Wo and bo are the weights and bias for the output gate.
The following diagram illustrates how these components connect and how data flows through an LSTM cell during one time step:
Diagram illustrating the flow of information and calculations within an LSTM cell for a single time step t. Inputs xt, ht−1, ct−1 are processed through forget (ft), input (it), and output (ot) gates, along with a candidate state (c~t), to compute the new cell state ct and hidden state ht. Sigmoid (σ) and tanh activations control the gating and state updates. Element-wise multiplication (⊙) and addition (+) combine the intermediate results. Dashed red arrows indicate the passing of ct and ht to the next time step.
By carefully regulating what information is kept, discarded, added, and output at each step, the LSTM cell creates pathways for gradients to flow more effectively during training. The cell state acts as an explicit memory channel, protected by the gates, allowing the network to learn and remember information over extended periods, which is fundamental for tackling complex sequence modeling tasks.
© 2025 ApX Machine Learning