Having introduced the GRU cell and its two main gates, the reset gate (rt) and the update gate (zt), let's focus on how the GRU generates a proposal for the new hidden state. This proposal, known as the candidate hidden state (often denoted as h~t), represents the new information the cell considers adding to its memory.
The calculation of the candidate hidden state is where the reset gate (rt) plays its part. Remember, the reset gate determines how much of the previous hidden state (ht−1) should influence the current candidate calculation. If the reset gate output is close to 0 for certain dimensions, the corresponding dimensions of the previous hidden state are effectively "forgotten" or ignored when computing the new candidate state. Conversely, if the reset gate output is close to 1, the previous state information is passed through.
The core idea is to combine the current input (xt) with a version of the previous hidden state (ht−1) that has been selectively filtered by the reset gate (rt). This combination is then passed through a hyperbolic tangent (tanh) activation function, similar to how the hidden state is calculated in a simple RNN.
Mathematically, the candidate hidden state h~t at time step t is computed as follows:
h~t=tanh(Wh~xt+Uh~(rt⊙ht−1)+bh~)Let's break down this equation:
The weight matrices (Wh~, Uh~) and the bias (bh~) are learned during the training process. They determine how the current input and the relevant parts of the past (as determined by the reset gate) are combined to form the candidate state.
The diagram below illustrates the data flow for calculating the candidate hidden state h~t.
Flow diagram showing the computation of the candidate hidden state (h~t) using the current input (xt), the previous hidden state (ht−1), and the reset gate (rt).
Essentially, h~t represents what the GRU cell could update its state to, based purely on the current input and the selectively remembered parts of the previous state. It's a proposal for the new memory content.
It's important to remember that this candidate state h~t is not the final hidden state ht for the current time step. The next step, which involves the update gate zt, will determine how much of this new candidate state h~t is actually mixed with the previous hidden state ht−1 to produce the final output ht. We will cover that combination process in the section "Calculating the Final Hidden State".
© 2025 ApX Machine Learning