In the previous chapter, we saw that simple RNNs struggle with long sequences because gradients can vanish or explode during backpropagation through time. LSTMs introduce gating mechanisms precisely to address this. These gates are like regulators, carefully controlling how information flows into, through, and out of the core memory component of the LSTM, the cell state.
One of these critical regulators is the input gate. Its job is to decide what new information from the current input (xt) and the previous hidden state (ht−1) should be stored in the cell state (Ct). It doesn't operate in isolation; it works alongside the forget gate (which decides what to throw away from the old cell state) to manage the cell's memory effectively.
The input gate's decision process involves two main parts:
Deciding Which Values to Update: First, a sigmoid layer determines which parts of the cell state should be updated. The sigmoid function, often denoted as σ, squashes its input to a range between 0 and 1. A value close to 1 means "let this information through," while a value close to 0 means "block this information." This layer takes the previous hidden state (ht−1) and the current input (xt) and produces an output vector, let's call it it.
The calculation is:
it=σ(Wi⋅[ht−1,xt]+bi)Here, [ht−1,xt] represents the concatenation of the previous hidden state and the current input vector. Wi is the weight matrix and bi is the bias vector specific to this part of the input gate. The sigmoid function (σ) is applied element-wise. Each element in it corresponds to an element in the cell state, acting as a filter or gate value for that specific element.
Creating Candidate Values: Concurrently, a tanh
layer creates a vector of new candidate values, denoted as C~t (pronounced "C-tilde sub t"). These are the potential values that could be added to the cell state. Like the sigmoid layer, this layer also uses the previous hidden state (ht−1) and the current input (xt). The tanh
activation function squashes its input to a range between -1 and 1.
The calculation is:
C~t=tanh(WC⋅[ht−1,xt]+bC)Again, WC and bC are the weight matrix and bias vector for this specific layer. The output C~t represents the new information extracted from the current input and previous context, scaled between -1 and 1.
Think of it as the gatekeeper deciding how much of each potential new piece of information (C~t) should actually be considered for adding to the memory. C~t holds the potential updates, and it holds the filtering values (between 0 and 1) that scale these candidates.
Diagram illustrating the two components of the input gate. It takes the current input (xt) and previous hidden state (ht−1), processes them through parallel sigmoid and tanh layers, and produces the gate activation (it) and candidate values (C~t). These are then combined element-wise (it∗C~t) to form the update information for the cell state.
The crucial step, which connects the input gate to the cell state, involves combining the outputs of these two layers. This is typically done using element-wise multiplication: it∗C~t. This product represents the filtered candidate values, the new information that has been selected and scaled by the input gate.
This resulting vector (it∗C~t) is what gets added to the (appropriately forgotten) previous cell state Ct−1 to form the new cell state Ct. We will examine this addition process in detail when we discuss updating the cell state in the next section. For now, the significant point is that the input gate provides the mechanism for selectively incorporating new information into the LSTM's memory.
© 2025 ApX Machine Learning