Before diving into the convenient APIs provided by deep learning frameworks like TensorFlow or PyTorch, it's instructive to see how the core logic of a simple RNN cell can be implemented using fundamental operations. This helps solidify the understanding gained from the mathematical formulation discussed in the previous chapter and bridges the gap between theory and high-level library usage.
Recall the basic equations governing a simple RNN cell at a single time step t:
Here, Wxh, Whh, and Why represent weight matrices, while bh and by are bias vectors. The tanh function is a common activation function used for the hidden state.
Let's translate this into a simplified Python implementation using NumPy, which is often used for numerical operations in machine learning.
Imagine we want to represent this computation in a function. This function would take the current input and the previous hidden state as arguments and return the new hidden state and the output for that time step. It would also need access to the network's parameters (weights and biases).
import numpy as np
def simple_rnn_cell_forward(xt, h_prev, parameters):
"""
Performs a single forward step for a simple RNN cell.
Arguments:
xt -- Input data for the current time step, shape (n_features,)
h_prev -- Hidden state from the previous time step, shape (n_hidden,)
parameters -- Python dictionary containing:
Wxh -- Weight matrix multiplying the input, shape (n_hidden, n_features)
Whh -- Weight matrix multiplying the hidden state, shape (n_hidden, n_hidden)
Why -- Weight matrix relating hidden state to output, shape (n_output, n_hidden)
bh -- Bias for the hidden state, shape (n_hidden,)
by -- Bias for the output, shape (n_output,)
Returns:
h_next -- Next hidden state, shape (n_hidden,)
yt_pred -- Prediction at this time step, shape (n_output,)
"""
# Retrieve parameters
Wxh = parameters['Wxh']
Whh = parameters['Whh']
Why = parameters['Why']
bh = parameters['bh']
by = parameters['by']
# Ensure inputs are column vectors if they are 1D arrays
xt = xt.reshape(-1, 1) # Shape (n_features, 1)
h_prev = h_prev.reshape(-1, 1) # Shape (n_hidden, 1)
# Calculate the new hidden state
# Note: np.dot performs matrix multiplication
h_next = np.tanh(np.dot(Wxh, xt) + np.dot(Whh, h_prev) + bh.reshape(-1, 1))
# Calculate the output
yt_pred = np.dot(Why, h_next) + by.reshape(-1, 1)
# Return flattened arrays for consistency if needed elsewhere
return h_next.flatten(), yt_pred.flatten()
# Example Usage (Illustrative - requires defined parameters and data)
# n_features = 10 # Number of input features
# n_hidden = 20 # Number of hidden units
# n_output = 5 # Number of output units
# # Initialize parameters (randomly for demonstration)
# parameters = {
# 'Wxh': np.random.randn(n_hidden, n_features) * 0.01,
# 'Whh': np.random.randn(n_hidden, n_hidden) * 0.01,
# 'Why': np.random.randn(n_output, n_hidden) * 0.01,
# 'bh': np.zeros((n_hidden, 1)),
# 'by': np.zeros((n_output, 1))
# }
# # Example input and previous hidden state
# xt_sample = np.random.randn(n_features)
# h_prev_sample = np.zeros((n_hidden,)) # Initial hidden state often starts as zeros
# # Perform one step
# h_next_sample, yt_pred_sample = simple_rnn_cell_forward(xt_sample, h_prev_sample, parameters)
# print("Next Hidden State Shape:", h_next_sample.shape)
# print("Output Prediction Shape:", yt_pred_sample.shape)
To process an entire sequence, you would iterate this simple_rnn_cell_forward
function over each time step. The hidden state h_next
calculated at time step t becomes h_prev
for time step t+1.
def simple_rnn_forward(x_sequence, h0, parameters):
"""
Performs the forward pass for a sequence using a simple RNN.
Arguments:
x_sequence -- Input sequence, shape (n_features, sequence_length)
h0 -- Initial hidden state, shape (n_hidden,)
parameters -- Dictionary of parameters (Wxh, Whh, Why, bh, by)
Returns:
h -- Hidden states for all time steps, shape (n_hidden, sequence_length)
y_pred -- Predictions for all time steps, shape (n_output, sequence_length)
"""
# Get dimensions
n_features, sequence_length = x_sequence.shape
n_hidden = parameters['Whh'].shape[0]
n_output = parameters['Why'].shape[0]
# Initialize hidden states and predictions arrays
h = np.zeros((n_hidden, sequence_length))
y_pred = np.zeros((n_output, sequence_length))
# Initialize the first hidden state
h_next = h0.copy() # Start with the initial hidden state
# Loop over time steps
for t in range(sequence_length):
# Get the input for the current time step
xt = x_sequence[:, t]
# Update the hidden state and get the prediction
h_next, yt = simple_rnn_cell_forward(xt, h_next, parameters)
# Store the results
h[:, t] = h_next
y_pred[:, t] = yt
return h, y_pred
# Example Usage (Illustrative)
# sequence_length = 15
# x_seq_sample = np.random.randn(n_features, sequence_length)
# h0_sample = np.zeros((n_hidden,))
# h_all, y_pred_all = simple_rnn_forward(x_seq_sample, h0_sample, parameters)
# print("All Hidden States Shape:", h_all.shape)
# print("All Predictions Shape:", y_pred_all.shape)
This conceptual implementation highlights several aspects:
Keep in mind that this is a simplified view. Real-world framework implementations handle batches of sequences simultaneously, incorporate optimizations for performance, manage parameter initialization, and provide mechanisms for backpropagation (BPTT). However, understanding this basic forward pass provides a solid foundation before using these more abstract tools, which we will cover next.
© 2025 ApX Machine Learning