In the previous chapter, we focused on functions with a single input variable, like f(x). We learned how derivatives tell us about the rate of change of these functions. However, many situations in machine learning involve functions that depend on multiple inputs simultaneously. Think about predicting a house price. The price doesn't just depend on the square footage; it might also depend on the number of bedrooms, the age of the house, and its location. Similarly, the cost function we aim to minimize during model training usually depends on all the model's parameters (weights and biases), which can number in the thousands or even millions for complex models.
To handle these scenarios, we need to extend our calculus toolkit to functions of multiple variables.
A function of multiple variables takes more than one input value and produces an output value (often a single scalar value in our context). If a function f depends on n variables, say x1,x2,…,xn, we write it as:
f(x1,x2,…,xn)
Alternatively, we can group the input variables into a vector x=[x1,x2,…,xn]T. Then, the function can be written more compactly as:
f(x)
The output is typically a single real number, especially when dealing with cost functions in machine learning, where the output represents a measure of error or 'cost'.
Example: Consider a simple function of two variables, x and y:
f(x,y)=x2+2y2
Here, the input consists of a pair of values (x,y), and the output is a single number calculated using the formula. For instance, f(1,2)=12+2(22)=1+8=9.
Visualizing functions becomes more challenging as the number of input variables increases.
Let's visualize the function z=x2+y2. This equation describes a paraboloid, opening upwards, with its minimum value at (0,0).
A 3D surface plot representing the function z=x2+y2. The height (z) corresponds to the function's value for each pair of (x,y) coordinates. The minimum value (0) occurs at (x,y)=(0,0).
Functions of multiple variables are fundamental in machine learning:
Cost Functions: As mentioned, the cost function J measures how well our model performs based on its parameters (weights w and bias b). For a model with n weights, the cost function is J(w1,w2,…,wn,b). Training the model involves finding the values of w1,…,wn,b that minimize this multivariable function. For example, the mean squared error cost for linear regression with two features (x1,x2) and parameters w1,w2,b is: J(w1,w2,b)=2m1∑i=1m((w1x1(i)+w2x2(i)+b)−y(i))2 Here, m is the number of training examples, (x1(i),x2(i)) are the features of the i-th example, and y(i) is its true label. The function J depends on the three parameters w1,w2, and b.
Model Predictions: The prediction function h(x) itself is often a multivariable function. For linear regression, hw,b(x)=w1x1+w2x2+⋯+wnxn+b. The prediction depends on the input features x1,…,xn.
Understanding how to analyze these multivariable functions, particularly how they change as we adjust their inputs (the parameters), is essential for optimization. This naturally leads us to the concept of partial derivatives, which measure the rate of change with respect to one variable while holding others constant. We will explore this next.
© 2025 ApX Machine Learning