Gated Recurrent Units (GRUs) are a type of recurrent neural network architecture designed to effectively capture dependencies in sequential data. They achieve this by employing gating mechanisms that regulate information flow. A GRU cell features two primary gates: the reset gate and the update gate. These gates control how new input data and previous hidden state information are used to update the current hidden state. The GRU simplifies its internal structure by directly combining the functions of a cell state and hidden state into a single hidden state vector, . This design often results in a simpler model compared to architectures like LSTMs, which typically use three gates and separate cell and hidden states.
Let's examine the components and the flow of information within a single GRU cell at time step . The cell receives the current input and the hidden state from the previous time step . It then computes the new hidden state , which also serves as the output for this time step.
The components are:
Here is a view of the GRU cell's architecture:
A simplified view of the GRU cell structure. Inputs and feed into the reset () and update () gates. The reset gate modulates the influence of on the candidate state . The update gate controls the mix between and to produce the final output .
Now, let's look at the calculations performed within the cell.
The reset gate decides how much information from the previous hidden state should be disregarded when computing the candidate hidden state . It takes the current input and the previous hidden state as inputs.
The calculation is:
Here, and are weight matrices, is a bias vector, and is the sigmoid activation function. The sigmoid function outputs values between 0 and 1. A value close to 0 means "reset" or ignore the corresponding element in the previous state, while a value close to 1 means "keep" it when calculating the candidate state.
The update gate controls the extent to which the hidden state is updated with new information versus retaining old information. It determines how much of the previous hidden state carries over to the final hidden state . Similar to the reset gate, it uses the current input and the previous hidden state .
The calculation is:
Again, , , and are learned parameters (weights and bias), and is the sigmoid function. A value of close to 1 indicates that the new hidden state should primarily be based on the candidate state , while a value close to 0 suggests retaining most of the previous state .
The candidate hidden state is calculated similarly to the hidden state in a simple RNN, but with a modification involving the reset gate. It aims to capture the new information from the current input , potentially tempered by the relevant parts of the previous state .
The calculation involves the current input and the previous hidden state , element-wise multiplied () by the reset gate's output :
Here, , , and are learned parameters. The tanh activation function squashes the output to be between -1 and 1. The element-wise multiplication is significant: if an element in is close to 0, the corresponding element from contributes very little to the calculation of , effectively allowing the cell to "forget" irrelevant past information when generating the candidate state.
The final hidden state for the current time step is computed by linearly interpolating between the previous hidden state and the candidate hidden state . The update gate controls this interpolation.
The calculation is:
This equation shows how the GRU updates its state. The vector acts element-wise:
This mechanism allows the GRU to maintain information over long sequences (when is close to 0 for many steps) or update rapidly based on new inputs (when is close to 1). Notably, the GRU does not have a separate cell state like the LSTM; the hidden state carries all the necessary information forward.
This architecture, with its two gates and combined state representation, provides a powerful yet potentially more computationally efficient way to handle sequential data compared to LSTMs, which we will compare more directly later in this chapter.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with