So far, we've mostly worked with functions like f(x)=x2 or f(x)=3x+5. These are single-variable functions: they take one input (x) and produce one output. You can easily visualize them as a curve on a 2D graph, showing how the output changes as the single input changes.
However, many real-world scenarios, especially in machine learning, involve more than one influencing factor. Think about predicting a house's price. Is it based only on its size in square feet? Probably not. You'd likely also consider the number of bedrooms, the age of the house, or the neighborhood's quality score. Each of these is an independent input variable.
Similarly, when we train a machine learning model, like the simple linear regression model y=mx+b we'll see later, we need to find the best values for both m (the slope) and b (the intercept). The model's performance, often measured by a "cost" or "error" function, depends on the specific values chosen for both m and b.
This leads us to functions of multiple variables. Instead of just f(x), we now deal with functions like:
All these functions still produce a single output value. The output depends on the specific combination of all the input values provided.
A function of multiple variables maps a set of input values to a single output value. We can write this as z=f(x,y) for two variables, or youtput=f(x1,x2,…,xn) for n variables. Here, x1,x2,…,xn are called the independent variables, and the resulting output (z or youtput) is the dependent variable.
Let's look at a simple example:
f(x,y)=x2+y2This function takes two inputs, x and y. To find the output, we square each input and add the results.
Another example, perhaps closer to something you might see related to model error:
g(w,b)=(w⋅5+b−12)2Here, the function g depends on two inputs, w and b. If w=2 and b=3, then:
g(2,3)=(2⋅5+3−12)2=(10+3−12)2=(1)2=1In machine learning, w and b might represent the weights or parameters of a simple model, and the value 5 might be an input feature from our data, while 12 could be the true target value. The function g(w,b) could represent the squared error for that single data point. The goal of training the model would be to find the values of w and b that make this error (and the error across all data points) as small as possible.
We saw that functions of a single variable f(x) can be visualized as a 2D curve. What about functions of two variables, like z=f(x,y)?
Since we have two inputs (x, y) and one output (z), we need three dimensions to plot the function. We can imagine an xy-plane representing the possible input combinations, and a vertical z-axis representing the output. For each point (x,y) in the input plane, the function f(x,y) gives a height z. Plotting these points (x,y,z) typically creates a surface in 3D space.
Consider our example f(x,y)=x2+y2. This function describes a bowl shape, called a paraboloid, opening upwards. Its lowest point is at (x,y)=(0,0), where f(0,0)=0.
The surface plot of z=x2+y2. The inputs x and y form the base plane, and the height z represents the function's output. Notice the minimum value occurs at the origin (0, 0).
What about functions with more than two inputs? Visualizing f(x1,x2,x3) would require 4 dimensions (3 for input, 1 for output), and f(x1,…,xn) would require n+1 dimensions. Our brains aren't equipped to directly visualize spaces beyond three dimensions!
However, the mathematical concepts we develop work just fine in higher dimensions. Even if we can't draw a picture of a cost function that depends on thousands of model parameters, we can still analyze how the cost changes when we adjust those parameters.
Understanding functions of multiple variables is essential because most machine learning models have multiple adjustable parameters (like the weights and biases in a neural network) and often process input data with multiple features. Cost functions, which measure how well the model is doing, are therefore naturally functions of many variables (the parameters).
Our goal in training is often to minimize this cost function. Since the cost depends on multiple parameters, we need a way to figure out how changing each individual parameter affects the cost. This is precisely where the concept of partial derivatives, which we'll explore next, comes into play. It allows us to examine the rate of change with respect to one variable at a time, even when the function depends on many others.
© 2025 ApX Machine Learning