We'll implement the forward propagation steps for a small feedforward neural network using Python and NumPy. This example will solidify your understanding of how input data travels through the network to produce an output.Scenario: A Simple Two-Layer NetworkImagine a network designed for a binary classification task. It takes 2 input features, has one hidden layer with 3 neurons (using the ReLU activation function), and one output neuron (using the Sigmoid activation function to produce a probability between 0 and 1).Here's a visualization of our simple network:digraph G { rankdir=LR; splines=line; node [shape=circle, style=filled, fillcolor="#a5d8ff", fixedsize=true, width=0.5]; edge [color="#868e96"]; subgraph cluster_0 { label = "Input Layer"; style=filled; color="#e9ecef"; node [shape=circle, fillcolor="#fab005"]; x1 [label="x1"]; x2 [label="x2"]; } subgraph cluster_1 { label = "Hidden Layer (ReLU)"; style=filled; color="#e9ecef"; node [shape=circle, fillcolor="#74c0fc"]; h1 [label="h1"]; h2 [label="h2"]; h3 [label="h3"]; } subgraph cluster_2 { label = "Output Layer (Sigmoid)"; style=filled; color="#e9ecef"; node [shape=circle, fillcolor="#69db7c"]; o1 [label="y"]; } x1 -> h1; x1 -> h2; x1 -> h3; x2 -> h1; x2 -> h2; x2 -> h3; h1 -> o1; h2 -> o1; h3 -> o1; }Network architecture: 2 input neurons, 3 hidden neurons with ReLU activation, 1 output neuron with Sigmoid activation.Setup: Libraries and ParametersFirst, we need NumPy for numerical operations. We'll also define our network parameters (weights and biases) and some sample input data. For reproducibility, we'll use fixed values for weights and biases. In a real training scenario, these would be initialized randomly and then learned.import numpy as np # --- Activation Functions --- def sigmoid(z): """Computes the sigmoid activation.""" return 1 / (1 + np.exp(-z)) def relu(z): """Computes the ReLU activation.""" return np.maximum(0, z) # --- Network Parameters --- # Weights connecting Input Layer to Hidden Layer (shape: features x hidden_neurons) W1 = np.array([[ 0.5, -0.2, 0.8], [-0.3, 0.7, -0.1]]) # (2x3) # Biases for Hidden Layer (shape: 1 x hidden_neurons) b1 = np.array([[0.1, -0.4, 0.2]]) # (1x3) # Weights connecting Hidden Layer to Output Layer (shape: hidden_neurons x output_neurons) W2 = np.array([[ 0.6], [-0.4], [ 0.9]]) # (3x1) # Bias for Output Layer (shape: 1 x output_neurons) b2 = np.array([[-0.1]]) # (1x1) # --- Sample Input Data --- # A single data point with 2 features (shape: 1 x features) X = np.array([[0.8, 0.2]]) # (1x2) print("Input X (1x2):\n", X) print("\nWeights W1 (2x3):\n", W1) print("Biases b1 (1x3):\n", b1) print("\nWeights W2 (3x1):\n", W2) print("Biases b2 (1x1):\n", b2)Step 1: Calculate Hidden Layer Input (Linear Transformation)We compute the weighted sum of inputs plus the bias for the hidden layer. Using matrix multiplication, this is $Z_1 = X \cdot W_1 + b_1$.$X$ has shape (1, 2)$W_1$ has shape (2, 3)$X \cdot W_1$ will have shape (1, 3)$b_1$ has shape (1, 3) (NumPy handles broadcasting correctly here)# Calculate the linear combination for the hidden layer Z1 = np.dot(X, W1) + b1 print("Shape of X:", X.shape) print("Shape of W1:", W1.shape) print("Shape of b1:", b1.shape) print("\nLinear combination Z1 (X * W1 + b1) (1x3):\n", Z1) print("Shape of Z1:", Z1.shape)The result $Z_1$ contains the input values for the activation function of each neuron in the hidden layer.Step 2: Apply Activation Function to Hidden LayerNow, we apply the ReLU activation function element-wise to $Z_1$ to get the hidden layer's output, $A_1 = \text{ReLU}(Z_1)$.# Apply ReLU activation function A1 = relu(Z1) print("Hidden Layer Activation A1 = ReLU(Z1) (1x3):\n", A1) print("Shape of A1:", A1.shape)$A_1$ represents the output signals from the hidden layer neurons. Notice how any negative values in $Z_1$ have been replaced by 0.Step 3: Calculate Output Layer Input (Linear Transformation)Next, we compute the weighted sum for the output layer using the activations from the hidden layer ($A_1$) as input: $Z_2 = A_1 \cdot W_2 + b_2$.$A_1$ has shape (1, 3)$W_2$ has shape (3, 1)$A_1 \cdot W_2$ will have shape (1, 1)$b_2$ has shape (1, 1)# Calculate the linear combination for the output layer Z2 = np.dot(A1, W2) + b2 print("Shape of A1:", A1.shape) print("Shape of W2:", W2.shape) print("Shape of b2:", b2.shape) print("\nLinear combination Z2 (A1 * W2 + b2) (1x1):\n", Z2) print("Shape of Z2:", Z2.shape)$Z_2$ is the input to the final activation function in the output layer.Step 4: Apply Activation Function to Output LayerFinally, we apply the Sigmoid activation function to $Z_2$ to get the network's final output (prediction), $A_2 = \text{Sigmoid}(Z_2)$.# Apply Sigmoid activation function A2 = sigmoid(Z2) print("Output Layer Activation (Prediction) A2 = Sigmoid(Z2) (1x1):\n", A2) print("Shape of A2:", A2.shape)The value $A_2$ is the network's prediction for the input $X$. Since we used a Sigmoid function, this value is between 0 and 1, often interpreted as a probability in classification tasks. For our specific input [[0.8, 0.2]] and the defined weights/biases, the network predicts approximately 0.58.Encapsulating Forward Propagation in a FunctionWe can wrap these steps into a reusable function:def forward_propagation(X, W1, b1, W2, b2): """ Performs forward propagation for a 2-layer network. Args: X (np.array): Input data (batch_size x num_features). W1 (np.array): Weights from input to hidden layer (num_features x num_hidden). b1 (np.array): Biases for hidden layer (1 x num_hidden). W2 (np.array): Weights from hidden to output layer (num_hidden x num_output). b2 (np.array): Biases for output layer (1 x num_output). Returns: tuple: (A1, A2) where A1 is the hidden layer activation and A2 is the output prediction. """ # Hidden Layer Z1 = np.dot(X, W1) + b1 A1 = relu(Z1) # Output Layer Z2 = np.dot(A1, W2) + b2 A2 = sigmoid(Z2) return A1, A2 # --- Test the function with our data --- hidden_output, final_prediction = forward_propagation(X, W1, b1, W2, b2) print("\n--- Using the forward_propagation function ---") print("Input X:\n", X) print("Hidden Layer Output (A1):\n", hidden_output) print("Final Prediction (A2):\n", final_prediction)This function performs the complete forward pass. During training, this output ($A_2$) would be compared against the true label using a loss function, and the difference would guide the backpropagation process to update $W_1$, $b_1$, $W_2$, and $b_2$.You have now successfully implemented the forward propagation mechanism, calculating how a neural network generates a prediction from a given input and set of parameters. This is a fundamental building block for understanding how networks operate and learn.