Having determined what information to discard from the previous state using the forget gate, and what new information might be relevant using the input gate, the LSTM cell needs to combine these results to form the new cell state, Ct. This update process is central to the LSTM's ability to maintain long-range dependencies.
Recall the previous steps:
The cell state update combines these pieces in a straightforward yet effective manner. First, the old cell state Ct−1 is multiplied element-wise by the output of the forget gate ft. This selectively drops the information marked for forgetting.
Forgotten State=ft⊙Ct−1
Second, the candidate values C~t are multiplied element-wise by the output of the input gate it. This selects only the relevant parts of the new candidate information.
Selected New Information=it⊙C~t
Finally, these two results are added together element-wise to create the updated cell state Ct:
Ct=ft⊙Ct−1+it⊙C~t
Here, ⊙ denotes element-wise multiplication (Hadamard product).
Diagram illustrating the update mechanism for the LSTM cell state (Ct). The previous state (Ct−1) is scaled by the forget gate (ft), the candidate state (C~t) is scaled by the input gate (it), and the results are added.
This additive update mechanism is a significant departure from the update rule in simple RNNs, which primarily involves matrix multiplications. The cell state acts like a conveyor belt. Information can travel along it mostly undisturbed if the forget gate is set close to 1 and the input gate is close to 0 for those components. Conversely, old information can be entirely dropped (ft≈0) and new information fully integrated (it≈1).
This structure makes it much easier for gradients to flow backward through time without vanishing or exploding as rapidly as they might in a simple RNN. By controlling the flow via additions and element-wise multiplications regulated by gates, LSTMs can preserve error signals over longer durations, enabling the learning of dependencies across extended time intervals. The cell state essentially carries the long-term memory, which is selectively modified at each time step based on the current input and the previous hidden state.
© 2025 ApX Machine Learning