At its heart, a machine learning model is a mathematical construct designed to learn patterns from data. The most direct way to think about many supervised learning models is as functions. Just like a mathematical function f(x)=y takes an input x and produces an output y, a machine learning model takes input data (features) and produces an output (a prediction or decision).
Let's break down the components:
Inputs (Features): These are the measurable characteristics of the data you feed into the model. If you're predicting house prices, inputs might include the square footage, number of bedrooms, and age of the house. In machine learning, we often represent these inputs as a vector, typically denoted by x. Each element in the vector corresponds to a specific feature. For example, x=[square footage,num bedrooms,age].
Outputs (Predictions): This is the result the model generates based on the inputs. For house price prediction, the output would be the predicted price (a continuous value). For image classification, it might be a label like 'cat' or 'dog' (a discrete category), or probabilities for each category. We often denote the model's prediction as y^ (pronounced "y-hat") to distinguish it from the true target value, y.
Model Parameters: These are the internal variables that the model learns from the training data. They define the specific transformation from inputs to outputs. Think of them as the "knobs" that are tuned during the training process to make the model's predictions y^ as close as possible to the actual target values y. Parameters are often represented by Greek letters like θ (theta) or specific symbols like w for weights and b for biases.
We can express the relationship between inputs, parameters, and outputs using function notation:
y^=f(x;θ)This equation states that the prediction y^ is generated by a function f, which takes the input features x and is configured by the parameters θ. The specific form of the function f defines the type of model.
Consider one of the simplest models: linear regression with a single input feature. The goal is to find a line that best fits the relationship between the input x (e.g., years of experience) and the output y (e.g., salary).
The function for this model is the equation of a line:
y^=wx+bHere:
The learning process involves finding the optimal values for w and b that make the predicted salaries y^ closely match the actual salaries y in the training data.
A simple linear regression model f(x;w,b)=wx+b represented as a line fitting data points. The parameters w and b define the line's slope and intercept.
While linear regression provides a clear example, this functional representation applies to far more complex models:
Understanding models as functions f(x;θ) is fundamental. It frames the machine learning task as finding the best parameters θ for a chosen function f so that its outputs y^ accurately reflect the real-world outcomes y. This perspective naturally leads to the concept of optimization, where we need tools to measure how changes in θ affect the model's performance. This is precisely where calculus, specifically derivatives and gradients, becomes indispensable, as we'll see in the following sections.
© 2025 ApX Machine Learning