The reset gate in a Gated Recurrent Unit (GRU) plays a specific and important role in managing information flow. It determines how much of the past information, carried by the previous hidden state (), should be ignored or "reset" when calculating the new candidate hidden state (). This gate acts as a filter, deciding the relevance of past context for proposing an updated memory state.
Like the update gate, the reset gate's activation, denoted as , is computed based on the current input and the previous hidden state . It uses a sigmoid activation function, ensuring its output values are between 0 and 1.
The calculation involves learning separate weight matrices ( and ) and a bias term ():
Here:
The output is a vector of the same dimension as the hidden state. Each element in corresponds to a dimension of the hidden state, acting as a gate value for that specific dimension.
The values in the reset gate vector directly control the influence of the previous hidden state when computing the candidate hidden state . A value close to 0 in for a particular dimension effectively "resets" or nullifies the contribution from the corresponding dimension in . Conversely, a value close to 1 allows that part of the previous hidden state to pass through mostly unchanged.
This mechanism is applied via element-wise multiplication () between the reset gate and the previous hidden state . This modulated previous state is then used in the calculation of the candidate hidden state:
Notice how determines exactly which parts of the previous state are combined with the current input to form the candidate state . If an element in is 0, the corresponding element in is effectively zeroed out before the weighted sum inside the function.
Flow showing the calculation of the reset gate and its element-wise multiplication () with the previous hidden state to influence the candidate hidden state .
The reset gate gives the GRU unit the ability to dynamically adjust how much the proposed new state () should depend on the immediate past state (). If the current input suggests a significant shift in context or topic compared to what was encoded in , the reset gate can learn to activate close to 0. This effectively allows the unit to "start fresh" in computing the candidate state, focusing more on the current input rather than blending it with potentially irrelevant past information.
For example, in language modeling, if the network encounters the end of a sentence (signaled perhaps by punctuation in ), the reset gate might activate strongly (values near 0) to diminish the influence of the previous sentence's hidden state when calculating the candidate state for the beginning of the next sentence.
In summary, the reset gate acts as a controller, selectively diminishing parts of the previous hidden state before calculating the candidate hidden state. This allows the GRU to effectively forget information that is deemed irrelevant for the immediate next step, contributing to its ability to handle dependencies over time.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with