All Courses

Training a Simple Model

Okay, let's bring our prepared data and chosen algorithm together. We've reached the central step in the machine learning workflow: training the model. This is where the "learning" in machine learning actually happens.

Think of an untrained model like a student before class. It has the potential to learn, represented by the algorithm's structure (like Linear Regression or K-Nearest Neighbors), but it doesn't yet know anything about the specific problem we want it to solve. Training is the process of showing the model our prepared data (X_train and y_train) and letting the algorithm figure out the relationship between the input features and the output target.

The `fit()` Method: Teaching the Model

Luckily, popular machine learning libraries like Scikit-learn provide a consistent and straightforward way to train models. The most common method you'll use is called fit(). You can think of fit() as the command that tells the model: "Learn from this data."

To train a model using Scikit-learn, you typically need two main ingredients that you prepared in the previous steps:

The Features (X_train): This is your training data, usually represented as a 2D array or DataFrame where rows are samples (e.g., individual customers, houses, or images) and columns are the features (e.g., age, square footage, pixel values). This is the input the model will learn from.
The Target Variable (y_train): This is the corresponding "correct answer" for each sample in X_train.
- For regression, y_train will be a 1D array of continuous numbers (e.g., house prices).
- For classification, y_train will be a 1D array of category labels (e.g., 'spam'/'not spam', 0/1, 'cat'/'dog').

The fit() method takes these two pieces of data (X_train, y_train) and runs the chosen algorithm's learning process.

What Happens Inside `fit()`?

When you call model.fit(X_train, y_train), the algorithm gets to work. While the specifics depend on the algorithm (we saw Gradient Descent for Linear Regression earlier, for instance), the general idea is the same:

The algorithm looks at the features (X_train) and the associated target values (y_train).
It adjusts its internal parameters. For Linear Regression, these are the slope and intercept of the line ( $m$ and $b$ in $y = mx + b$ ). For other models, they might be different kinds of parameters.
The goal is usually to adjust these parameters so that the model's predictions on X_train get as close as possible to the actual y_train values. How "close" is measured depends on the algorithm and the problem (e.g., minimizing Mean Squared Error in regression).
This adjustment process continues until the algorithm finds a set of parameters that represent the patterns in the training data reasonably well.

The important outcome is that the model object itself is modified. After fit() completes, the object now contains the learned parameters. It has transformed from a generic algorithm blueprint into a trained model specific to our dataset.

Training in Practice with Scikit-learn

Let's see how simple this is in code. Assume you've already:

Imported your chosen algorithm class (e.g., from sklearn.linear_model import LinearRegression or from sklearn.neighbors import KNeighborsClassifier).
Created an instance of the model (e.g., model = LinearRegression() or model = KNeighborsClassifier(n_neighbors=3)).
Prepared your training data X_train and y_train.

Training the model is then just one line of code:

# X_train: Your prepared training features (e.g., a NumPy array or Pandas DataFrame)
# y_train: Your prepared training target variable (e.g., a NumPy array or Pandas Series)
# model: An instance of your chosen algorithm (e.g., LinearRegression, KNeighborsClassifier)

# Train the model
model.fit(X_train, y_train)

# The 'model' object is now trained and contains the learned parameters!
print("Model training complete.")

That's it! The fit() method handles all the underlying calculations. It doesn't usually return anything significant itself; instead, it modifies the model object directly. This trained model is now ready for the next steps: making predictions on new data and evaluating how well it learned.

Was this section helpful?

Training a Simple Model

The fit() Method: Teaching the Model

What Happens Inside fit()?

Training in Practice with Scikit-learn

The `fit()` Method: Teaching the Model

What Happens Inside `fit()`?