After calculating the update gate (zt), the reset gate (rt), and the candidate hidden state (h~t), the final step within the GRU cell for a given time step t is to compute the actual hidden state, ht. This state represents the information the GRU will carry forward to the next time step.
The calculation of the final hidden state ht is where the update gate zt plays its central role. Recall that zt determines how much of the previous hidden state ht−1 should be maintained and how much of the new candidate state h~t should be incorporated. The GRU achieves this through a direct interpolation mechanism.
The equation for the final hidden state ht is:
ht=(1−zt)⊙ht−1+zt⊙h~tLet's break down this equation:
(1−zt)⊙ht−1: This term calculates how much of the previous hidden state ht−1 to keep. The update gate zt contains values between 0 and 1 (due to the sigmoid activation). If a value in zt is close to 0, the corresponding value in (1−zt) will be close to 1, meaning the corresponding element from the previous state ht−1 is largely preserved. Conversely, if a value in zt is close to 1, the corresponding value in (1−zt) is close to 0, effectively forgetting that part of the previous state. The symbol ⊙ represents element-wise multiplication (Hadamard product).
zt⊙h~t: This term calculates how much of the candidate hidden state h~t to incorporate. If a value in zt is close to 1, the corresponding element from the candidate state h~t (which contains the new information proposed for this time step, influenced by the reset gate) is strongly included. If a value in zt is close to 0, the corresponding element from the candidate state is mostly ignored.
Addition (+): The two resulting vectors are added together element-wise. This addition completes the interpolation. Each element of the final hidden state ht is a weighted sum of the corresponding elements from the previous state ht−1 and the candidate state h~t, with the weights controlled by the update gate zt.
This mechanism allows the GRU cell to dynamically adjust how much information flows from the past versus how much new information is introduced at each time step. If the update gate determines that the previous state is still relevant (low zt values), it can pass it through largely unchanged. If it decides new information is more important (high zt values), it incorporates more of the candidate state. This gating mechanism is simpler than the LSTM's separate cell state and output gate but provides a powerful way to manage information flow and combat vanishing gradient issues.
The following diagram illustrates how the previous state, candidate state, and update gate combine to form the final hidden state:
The final hidden state ht is computed by interpolating between the previous hidden state ht−1 and the candidate hidden state h~t. The update gate zt controls the balance of this interpolation.
The resulting vector ht serves two purposes: it is the output of the GRU cell for the current time step t (often passed to subsequent layers or used for prediction), and it becomes the "previous hidden state" ht−1 for the next time step t+1. This recurrent connection allows the GRU to process sequences step-by-step, maintaining and updating its internal state based on the sequence elements it encounters.
© 2025 ApX Machine Learning