Okay, you've successfully trained your machine learning model! It has learned patterns from the training data. But the real test of a model isn't how well it knows the data it trained on; it's how well it performs on new, unseen data. This is where making predictions comes in. The purpose of building a model, after all, is usually to apply it to future situations where the outcome is unknown.
Once a model is trained (often using a fit()
method in libraries like Scikit-learn), it stores the learned patterns internally (as parameters). To get predictions, you typically use a method called predict()
. This method takes new input data (features only, no labels) and uses the learned patterns to generate an output, which is the model's prediction for each input instance.
Imagine you trained a linear regression model to predict house prices based on square footage. The training process found the best line (defined by slope and intercept) through your training data points. Now, if you have a new house with a known square footage (but unknown price), you can feed this square footage into the predict()
method of your trained model. The model will use its learned line equation (y=mx+b) to calculate the predicted price (y) for that square footage (x).
A very important point: the new data you provide to the predict()
method must have the exact same structure and preprocessing as the data the model was trained on. This means:
Failure to prepare the input data correctly is a common source of errors and nonsensical predictions. The model only understands data in the format it learned from.
Let's assume you have a trained model object named model
(it could be a LinearRegression
, LogisticRegression
, KNeighborsClassifier
, etc.) and your new, unseen data (features only) is stored in a variable called X_new
. This X_new
should be formatted appropriately (e.g., a NumPy array or pandas DataFrame with the correct columns and scaling).
Making predictions is usually straightforward:
# Assuming 'model' is your trained Scikit-learn model
# Assuming 'X_new' is your new data, preprocessed correctly
predictions = model.predict(X_new)
# Print the predictions
print(predictions)
The predictions
variable will now hold the model's output for each instance in X_new
.
The nature of the values in predictions
depends on the type of problem your model was trained for:
model
is a regression model (like LinearRegression
), predictions
will contain continuous numerical values. For example, [250000.50, 180000.75, 310000.00]
might be predicted house prices.model
is a classification model (like LogisticRegression
or KNeighborsClassifier
), predictions
will contain predicted class labels. For example, ['spam', 'not spam', 'spam']
or [1, 0, 1]
(where 1 might represent 'spam' and 0 'not spam').For many classification algorithms, you can get more nuanced information than just the predicted class. Instead of asking "Which class does the model predict?", you can ask "What probability does the model assign to each class?". This is often done using a predict_proba()
method.
# Assuming 'model' is a trained classification model
# Assuming 'X_new' is your new data, preprocessed correctly
probabilities = model.predict_proba(X_new)
# Print the probabilities
print(probabilities)
If you have two classes (e.g., 0 and 1), the output for predict_proba()
might look something like this for three input instances:
[[0.1 0.9] # Instance 1: 10% prob of class 0, 90% prob of class 1
[0.8 0.2] # Instance 2: 80% prob of class 0, 20% prob of class 1
[0.4 0.6]] # Instance 3: 40% prob of class 0, 60% prob of class 1
Each row corresponds to an instance in X_new
. Each column corresponds to a class (usually in sorted order, e.g., class 0 then class 1). The values in each row sum to 1. The standard predict()
method typically just chooses the class with the highest probability. For instance 1 above, predict()
would output class 1 (since 0.9 > 0.1). Knowing the probabilities can be valuable, especially if you want to understand the model's confidence or set different decision thresholds.
Now that you know how to use your trained model to make predictions on new data, the next logical step is to figure out how good those predictions actually are. We need ways to measure the model's performance, which is exactly what we'll cover in the next section on evaluation.
© 2025 ApX Machine Learning