Within the GRU cell for a given time step t, the actual hidden state, ht, is computed as the final step. This computation incorporates the update gate (zt), the reset gate (rt), and the candidate hidden state (h~t). The resulting hidden state represents the information the GRU will carry forward to the next time step.
The calculation of the final hidden state ht is where the update gate zt plays its central role. Recall that zt determines how much of the previous hidden state ht−1 should be maintained and how much of the new candidate state h~t should be incorporated. The GRU achieves this through a direct interpolation mechanism.
The equation for the final hidden state ht is:
ht=zt⊙ht−1+(1−zt)⊙h~tLet's break down this equation:
1. zt⊙ht−1: This term calculates how much of the previous hidden state ht−1 to keep. The update gate zt contains values between 0 and 1. If a value in zt is close to 1, the corresponding element from the previous state ht−1 is largely preserved. This aligns with zt's role as the "retention" gate. Conversely, if zt is close to 0, that part of the previous state is mostly forgotten. The symbol ⊙ represents element-wise multiplication (Hadamard product).
2. (1−zt)⊙h~t: This term calculates how much of the candidate hidden state h~t to incorporate. If a value in zt is close to 0, the corresponding value in (1−zt) is close to 1, meaning the element from the candidate state h~t (which contains the new information) is strongly included. If zt is close to 1, this term is close to 0, and the new information is mostly ignored.
3. Addition (+): The two resulting vectors are added together element-wise. This addition completes the interpolation. Each element of the final hidden state ht is a weighted sum of the corresponding elements from the previous state ht−1 and the candidate state h~t, with the weights controlled by the update gate zt.
This mechanism allows the GRU cell to dynamically adjust how much information flows from the past versus how much new information is introduced at each time step. If the update gate determines that the previous state is still relevant (high zt values), it can pass it through largely unchanged. If it decides new information is more important (low zt values), it incorporates more of the candidate state. This gating mechanism is simpler than the LSTM's separate cell state and output gate but provides a powerful way to manage information flow and combat vanishing gradient issues.
The following diagram illustrates how the previous state, candidate state, and update gate combine to form the final hidden state:
The final hidden state ht is computed by interpolating between the previous hidden state ht−1 and the candidate hidden state h~t. The update gate zt controls the balance of this interpolation.
The resulting vector ht serves two purposes: it is the output of the GRU cell for the current time step t (often passed to subsequent layers or used for prediction), and it becomes the "previous hidden state" ht−1 for the next time step t+1. This recurrent connection allows the GRU to process sequences step-by-step, maintaining and updating its internal state based on the sequence elements it encounters.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with