As we discussed, machine learning models, especially neural networks, are often built by composing functions together. Imagine a simple process: an input x goes into a function g, producing an output u. This output u then becomes the input to another function f, yielding the final result y. Mathematically, we write this as y=f(u) where u=g(x), or more concisely as y=h(x)=f(g(x)).
Now, suppose we want to know how a small change in the initial input x affects the final output y. This requires finding the derivative of the composite function h(x) with respect to x, denoted as dxdy or h′(x). We already know how to find dudf (how f changes with its direct input u) and dxdg (how g changes with its input x). The chain rule provides the connection.
The chain rule for single-variable functions states that the derivative of a composite function h(x)=f(g(x)) is the product of the derivative of the outer function f with respect to its argument (evaluated at the inner function g(x)) and the derivative of the inner function g with respect to x.
Using Leibniz notation, which is often helpful for visualizing the "chain" effect, we write:
dxdy=dudy⋅dxduAlternatively, using Lagrange notation:
h′(x)=f′(g(x))⋅g′(x)Let's break down f′(g(x)). It means:
Then, multiply this result by the derivative of the inner function, g′(x).
Consider the function h(x)=(x2+5)3. This is a composition of functions. Let the inner function be g(x)=u=x2+5. Let the outer function be f(u)=y=u3.
First, find the derivatives of the individual functions:
Now, apply the chain rule: dxdy=dudy⋅dxdu. Substitute f′(u) and g′(x):
dxdy=(3u2)⋅(2x)Finally, substitute the expression for u back into the equation, as the derivative f′(u) needs to be evaluated at the inner function's output (u=g(x)):
dxdy=3(x2+5)2⋅(2x)=6x(x2+5)2Let's look at h(x)=e3x. We can see this as y=f(u)=eu and u=g(x)=3x.
Find the individual derivatives:
Apply the chain rule: dxdy=dudy⋅dxdu
dxdy=(eu)⋅(3)Substitute u=3x:
dxdy=e3x⋅3=3e3xThis rule is fundamental because it allows us to break down the differentiation of complex, nested functions into manageable steps. We compute the rate of change layer by layer, multiplying these rates together to find the overall rate of change. This exact principle, extended to functions with multiple variables, forms the core mechanism of backpropagation in neural networks, allowing us to calculate how changes in weights deep inside the network affect the final output or error. We'll explore this multivariable extension next.
© 2025 ApX Machine Learning