Calculating derivatives using the limit definition, limh→0hf(x+h)−f(x), works, but it can be tedious for complex functions. Imagine applying this definition repeatedly within a machine learning optimization loop. It would be computationally inefficient. Fortunately, calculus provides a set of rules that allow us to find derivatives much more easily by breaking down complex functions into simpler parts. These rules are the building blocks for differentiating the kinds of functions we encounter when defining model costs or activation functions in machine learning.
Let's explore the most common and useful differentiation rules. Assume f(x) and g(x) are differentiable functions, and c is a constant.
The Constant Rule
The simplest rule involves functions that always output the same value, like f(x)=5. Since the function's value doesn't change, its rate of change (the derivative) is always zero.
Rule: If f(x)=c, where c is a constant, then f′(x)=0.
In Leibniz notation:dxd(c)=0
Example: If f(x)=10, then f′(x)=0.
The Power Rule
This is one of the most frequently used rules, especially for polynomial functions common in modeling. It applies to functions of the form x raised to a power.
Rule: If f(x)=xn, where n is any real number, then f′(x)=nxn−1.
In Leibniz notation:dxd(xn)=nxn−1
Explanation: To differentiate xn, you bring the exponent n down as a multiplier and then decrease the original exponent by 1.
Examples:
If f(x)=x3, then f′(x)=3x3−1=3x2.
If g(x)=x (which is x1), then g′(x)=1x1−1=1x0=1. This makes sense, the slope of the line y=x is 1.
If h(x)=x=x1/2, then h′(x)=21x1/2−1=21x−1/2=2x1.
If k(x)=x21=x−2, then k′(x)=−2x−2−1=−2x−3=−x32.
The Constant Multiple Rule
What happens if the function is multiplied by a constant, like f(x)=5x3? This rule states that you can essentially pull the constant out, differentiate the rest of the function, and then multiply the constant back in.
Rule: If h(x)=c⋅f(x), then h′(x)=c⋅f′(x).
In Leibniz notation:dxd(c⋅f(x))=c⋅dxd(f(x))
Example: If f(x)=5x3. We know dxd(x3)=3x2. Using the rule:
f′(x)=5⋅dxd(x3)=5⋅(3x2)=15x2
The Sum/Difference Rule
Machine learning cost functions are often sums of terms (like the sum of squared errors). This rule allows us to differentiate such functions term by term.
Rule: If h(x)=f(x)+g(x), then h′(x)=f′(x)+g′(x). Similarly, if h(x)=f(x)−g(x), then h′(x)=f′(x)−g′(x).
In Leibniz notation:dxd(f(x)±g(x))=dxd(f(x))±dxd(g(x))
Explanation: The derivative of a sum (or difference) is the sum (or difference) of the derivatives.
Example: Let h(x)=x4+6x2−7. We can differentiate term by term using the Power Rule, Constant Multiple Rule, and Constant Rule:
h′(x)=dxd(x4)+dxd(6x2)−dxd(7)h′(x)=(4x3)+(6⋅2x)−(0)h′(x)=4x3+12x
The Product Rule
This rule is needed when differentiating a function that is the product of two other functions, like h(x)=f(x)g(x). It's important not to simply multiply the individual derivatives.
Rule: If h(x)=f(x)g(x), then h′(x)=f′(x)g(x)+f(x)g′(x).
In Leibniz notation:dxd(f(x)g(x))=dxdfg(x)+f(x)dxdg
Explanation: The derivative of a product is the derivative of the first function times the second function plus the first function times the derivative of the second function.
Example: Let h(x)=(x2+1)(x3−x).
Let f(x)=x2+1, so f′(x)=2x.
Let g(x)=x3−x, so g′(x)=3x2−1.
Applying the Product Rule:
h′(x)=f′(x)g(x)+f(x)g′(x)h′(x)=(2x)(x3−x)+(x2+1)(3x2−1)h′(x)=(2x4−2x2)+(3x4−x2+3x2−1)h′(x)=2x4−2x2+3x4+2x2−1h′(x)=5x4−1
(Self-check: Expand h(x) first: h(x)=x5−x3+x3−x=x5−x. Then h′(x)=5x4−1. The rule works!)
The Quotient Rule
This rule handles functions that are formed by dividing one function by another, h(x)=g(x)f(x). Like the product rule, it has a specific structure.
Rule: If h(x)=g(x)f(x), where g(x)=0, then h′(x)=[g(x)]2f′(x)g(x)−f(x)g′(x).
In Leibniz notation:dxd(g(x)f(x))=[g(x)]2dxdfg(x)−f(x)dxdg
Explanation: A common mnemonic is "low d-high minus high d-low, square the bottom and away we go," where "low" is g(x), "high" is f(x), and "d-" means "the derivative of".
Example: Let h(x)=x−1x2.
Let f(x)=x2 (high), so f′(x)=2x.
Let g(x)=x−1 (low), so g′(x)=1.
Applying the Quotient Rule:
h′(x)=[g(x)]2f′(x)g(x)−f(x)g′(x)h′(x)=(x−1)2(2x)(x−1)−(x2)(1)h′(x)=(x−1)22x2−2x−x2h′(x)=(x−1)2x2−2x
These rules, particularly the Power, Constant Multiple, and Sum/Difference rules, are fundamental for differentiating the polynomial-like terms often found in cost functions (like Mean Squared Error). The Product and Quotient rules become essential when dealing with more complex model structures or activation functions. Mastering these rules allows us to analytically compute the derivatives needed for optimization without resorting to the limit definition every time. Later, we'll see how the Chain Rule combines with these to handle even more complex, nested functions, which are characteristic of neural networks.