Okay, let's bring our prepared data and chosen algorithm together. We've reached the central step in the machine learning workflow: training the model. This is where the "learning" in machine learning actually happens.
Think of an untrained model like a student before class. It has the potential to learn, represented by the algorithm's structure (like Linear Regression or K-Nearest Neighbors), but it doesn't yet know anything about the specific problem we want it to solve. Training is the process of showing the model our prepared data (X_train
and y_train
) and letting the algorithm figure out the relationship between the input features and the output target.
fit()
Method: Teaching the ModelLuckily, popular machine learning libraries like Scikit-learn provide a consistent and straightforward way to train models. The most common method you'll use is called fit()
. Conceptually, you can think of fit()
as the command that tells the model: "Learn from this data."
To train a model using Scikit-learn, you typically need two main ingredients that you prepared in the previous steps:
X_train
.
y_train
will be a 1D array of continuous numbers (e.g., house prices).y_train
will be a 1D array of category labels (e.g., 'spam'/'not spam', 0/1, 'cat'/'dog').The fit()
method takes these two pieces of data (X_train
, y_train
) and runs the chosen algorithm's learning process.
fit()
?When you call model.fit(X_train, y_train)
, the algorithm gets to work. While the specifics depend on the algorithm (we saw Gradient Descent for Linear Regression earlier, for instance), the general idea is the same:
X_train
) and the associated target values (y_train
).X_train
get as close as possible to the actual y_train
values. How "close" is measured depends on the algorithm and the problem (e.g., minimizing Mean Squared Error in regression).The crucial outcome is that the model
object itself is modified. After fit()
completes, the object now contains the learned parameters. It has transformed from a generic algorithm blueprint into a trained model specific to our dataset.
Let's see how simple this is in code. Assume you've already:
from sklearn.linear_model import LinearRegression
or from sklearn.neighbors import KNeighborsClassifier
).model = LinearRegression()
or model = KNeighborsClassifier(n_neighbors=3)
).X_train
and y_train
.Training the model is then just one line of code:
# X_train: Your prepared training features (e.g., a NumPy array or Pandas DataFrame)
# y_train: Your prepared training target variable (e.g., a NumPy array or Pandas Series)
# model: An instance of your chosen algorithm (e.g., LinearRegression, KNeighborsClassifier)
# Train the model
model.fit(X_train, y_train)
# The 'model' object is now trained and contains the learned parameters!
print("Model training complete.")
That's it! The fit()
method handles all the underlying calculations. It doesn't usually return anything significant itself; instead, it modifies the model
object directly. This trained model
is now ready for the next steps: making predictions on new data and evaluating how well it learned.
© 2025 ApX Machine Learning