Machine learning models, especially neural networks, are often built by composing functions together. Imagine a simple process: an input goes into a function , producing an output . This output then becomes the input to another function , yielding the final result . Mathematically, we write this as where , or more concisely as .
Now, suppose we want to know how a small change in the initial input affects the final output . This requires finding the derivative of the composite function with respect to , denoted as or . We already know how to find (how changes with its direct input ) and (how changes with its input ). The chain rule provides the connection.
The chain rule for single-variable functions states that the derivative of a composite function is the product of the derivative of the outer function with respect to its argument (evaluated at the inner function ) and the derivative of the inner function with respect to .
Using Leibniz notation, which is often helpful for visualizing the "chain" effect, we write:
Alternatively, using Lagrange notation:
Let's break down . It means:
Then, multiply this result by the derivative of the inner function, .
Consider the function . This is a composition of functions. Let the inner function be . Let the outer function be .
First, find the derivatives of the individual functions:
Now, apply the chain rule: . Substitute and :
Finally, substitute the expression for back into the equation, as the derivative needs to be evaluated at the inner function's output ():
Let's look at . We can see this as and .
Find the individual derivatives:
Apply the chain rule:
Substitute :
This rule is fundamental because it allows us to break down the differentiation of complex, nested functions into manageable steps. We compute the rate of change layer by layer, multiplying these rates together to find the overall rate of change. This exact principle, extended to functions with multiple variables, forms the core mechanism of backpropagation in neural networks, allowing us to calculate how changes in weights deep inside the network affect the final output or error. We'll explore this multivariable extension next.
Was this section helpful?
© 2026 ApX Machine LearningAI Ethics & Transparency•