Construct a simple CI pipeline that implements Continuous Integration principles. This pipeline uses GitHub Actions, a powerful automation tool integrated directly into GitHub, to automatically test machine learning code whenever changes are made.
This hands-on exercise will guide you through setting up a workflow that checks for code quality and runs basic tests, ensuring that our project remains stable and reliable.
To follow along, you will need a GitHub account and a new, empty repository. We will create a small but complete machine learning project inside this repository.
First, create these three files in your project's root directory:
A data file (data.csv): A small sample of data. For this example, we can use a few lines from the famous Iris dataset.
sepal_length,sepal_width,petal_length,petal_width,species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
6.2,2.9,4.3,1.3,versicolor
5.9,3.0,5.1,1.8,virginica
A Python training script (train.py): This script will train a simple model using our data. Note that it doesn't do much, its main purpose is to be a runnable piece of code for our CI pipeline to check.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
def run_training():
"""
A simple function to load data and train a model.
"""
# Load the dataset
df = pd.read_csv('data.csv')
# Define features and target
X = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
y = df['species']
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42
)
# Initialize and train the model
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)
print("Training completed successfully.")
if __name__ == '__main__':
run_training()
A dependencies file (requirements.txt): This file lists the Python libraries our project needs. Our CI pipeline will use this file to create a consistent environment. We are including flake8 for code linting and pytest for testing.
pandas
scikit-learn
flake8
pytest
Commit these three files to the main branch of your GitHub repository. With our project structure in place, we can now define our automation.
A GitHub Actions workflow is an automated process defined in a YAML file. You store these workflow files in a special directory within your repository: .github/workflows/. When an event occurs in your repository, like a code push, GitHub can trigger the corresponding workflow.
Our goal is to create a workflow that performs three main tasks:
In your repository, create a new directory named .github, and inside it, another directory named workflows. Inside .github/workflows/, create a new file named ci-pipeline.yml.
Copy the following YAML content into ci-pipeline.yml. We will then break down what each part does.
# A descriptive name for your workflow
name: Basic ML Code CI
# Trigger the workflow on pushes to the main branch
on:
push:
branches: [ "main" ]
# Define the jobs to be run
jobs:
test-and-lint:
# Use the latest version of Ubuntu as the runner environment
runs-on: ubuntu-latest
# Define the sequence of steps in the job
steps:
# Step 1: Check out your repository code
- name: Check out repository code
uses: actions/checkout@v4
# Step 2: Set up a specific version of Python
- name: Set up Python 3.9
uses: actions/setup-python@v5
with:
python-version: '3.9'
# Step 3: Install project dependencies from requirements.txt
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
# Step 4: Lint code with flake8 to check for style issues
- name: Lint with flake8
run: |
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
# Step 5: Run a basic test to ensure the script executes
- name: Test with pytest
run: |
pytest
name: This is a simple, human-readable name that will appear in the "Actions" tab of your GitHub repository.on: push: branches: [ "main" ]: This is the trigger. It tells GitHub to run this workflow every time someone pushes a commit to the main branch.jobs: Workflows are made up of one or more jobs. Our workflow has a single job named test-and-lint.runs-on: ubuntu-latest: This specifies that the job will run on a fresh virtual machine hosted by GitHub, using the latest version of the Ubuntu operating system.steps: This is the most important part. It defines a sequence of tasks that the job will execute.
uses: actions/checkout@v4: This is a pre-built action that downloads a copy of your repository's code onto the runner, so the subsequent steps can access your files.uses: actions/setup-python@v5: Another pre-built action that installs a specific version of Python, in this case, version 3.9.run: pip install -r requirements.txt: The run command executes command-line instructions. Here, we use pip to install all the libraries listed in our requirements.txt file.run: flake8 . --count ...: This step runs the flake8 linter. A linter analyzes code for potential errors and stylistic issues without actually running it. This is a standard practice for maintaining code quality.run: pytest: This final step executes the pytest testing framework. We haven't created any tests yet, so let's do that now.For our CI pipeline to be meaningful, it needs something to test. Let's create a very basic test that confirms our train.py script can be imported and executed without crashing.
Create a new file in your project's root directory named test_script.py:
import pytest
from train import run_training
def test_training_runs():
"""
Tests if the training function executes without raising an exception.
"""
try:
run_training()
except Exception as e:
pytest.fail(f"run_training() raised an exception: {e}")
This test is simple but effective for a CI check. It imports the run_training function and calls it. If any error occurs during its execution, pytest.fail() will be triggered, causing the test step in our pipeline to fail.
The workflow we've defined follows a clear, linear sequence of steps. This process ensures that before any code is considered "good," it has been checked out, its environment has been built, and it has passed both quality and functional checks.
The sequence of automated steps in our Continuous Integration workflow. The process begins with a code push and proceeds through setup, linting, and testing.
You are now ready to see your pipeline in action.
ci-pipeline.yml and test_script.py files to your GitHub repository.main branch.Click on the workflow run to see the details. You can expand the test-and-lint job to see the log output for each step. If all steps complete successfully, you will see a green checkmark next to them. If a step fails, for instance, if flake8 finds a syntax error or a pytest test fails, the step will be marked with a red "X," and the entire workflow run will be marked as failed.
You have now successfully built a foundational CI pipeline. This simple automation adds a significant layer of safety and quality control to your project. It acts as an automated gatekeeper, ensuring that every change to your main branch is automatically vetted, freeing you to focus on developing new features. This is a fundamental practice in building reliable and maintainable machine learning systems.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with