Okay, we've successfully partitioned our dataset into two distinct parts: the training set and the test set. The next logical step in our evaluation workflow involves using the training set to actually teach our machine learning model. This phase is known as model training or fitting the model.
Think of the training set as the textbook and practice problems you use to study for an exam. The model, like a student, examines this material to learn the underlying concepts and patterns.
During training, a machine learning algorithm takes the training data (both the input features and the corresponding known outcomes or labels) and iteratively adjusts its internal settings, often called parameters or weights. The specific algorithm used depends on the type of problem (classification or regression) and the chosen model (like Linear Regression, Logistic Regression, Decision Tree, etc.).
For example:
The core idea is that the model attempts to find the best possible mapping from the input features to the output target based only on the examples provided in the training set.
The training process takes the training data and uses a machine learning algorithm to produce a trained model capable of making predictions.
The output of this training phase isn't data; it's the model itself, now configured with the learned parameters. This "trained model" encapsulates the patterns discovered in the training data. It's now ready to be applied to new, unseen data to make predictions.
It is absolutely essential to remember that the test set is never shown to the model during training. The test set acts like the final exam. If the student (our model) got to see the exam questions (the test data) while studying (training), their performance on the exam wouldn't be a true reflection of their learning. Similarly, exposing the model to the test data during training would lead to an overly optimistic and unreliable evaluation of its performance on genuinely new data.
This training step is purely about learning from the designated training portion of the data. Once the model is trained, we proceed to the next step in our workflow: using this trained model to make predictions on the unseen test set, which will allow us to evaluate how well it has truly learned to generalize.
© 2025 ApX Machine Learning