A trained machine learning model, typically saved as a file, exists in isolation. To make it useful, other applications need a way to communicate with it, sending it new data and receiving predictions. This communication is accomplished by wrapping the model in an Application Programming Interface, or API. An API defines a standard contract for how software components should interact. To facilitate interaction with models over a network, building a web API is a common approach, allowing applications to access prediction capabilities using standard web protocols like HTTP.
This approach decouples the model from the application that uses it. The application, whether it's a mobile app, a web dashboard, or another backend service, doesn't need to know anything about Python or scikit-learn. It only needs to know how to send an HTTP request to a specific URL (an endpoint) and how to parse the response.
To build this web API, we will use Flask, a popular "micro" web framework for Python. The term "micro" doesn't mean it's lacking in features; it means Flask aims to keep its core simple and extensible. It provides the essential tools for building web applications and APIs without imposing a lot of structure or dependencies. This makes it an excellent choice for creating a lightweight service whose main job is to serve model predictions.
Before you begin, you will need to install Flask. You can do this using pip:
pip install Flask scikit-learn joblib
We include scikit-learn and joblib as we will assume we are loading a simple model saved in that format.
Essentially, a Flask API consists of a few main parts:
/predict or /health.Let's start by building the simplest possible Flask application to see these parts in action.
# app.py
from flask import Flask
# 1. Create the Flask application instance
app = Flask(__name__)
# 2. Define a route and its corresponding view function
@app.route("/")
def index():
# 3. The function returns the response
return "Model API is running."
In this code, @app.route("/") is a Python decorator that tells Flask that the index() function should be triggered whenever a web request is made to the root URL (/).
Now, let's evolve this basic structure into a functional prediction service. A critical performance consideration is to load the model into memory only once when the application starts, not every time a prediction request comes in. Loading a model from disk can be a slow operation, and doing it repeatedly would create a significant bottleneck.
We will create a new endpoint, /predict, that accepts data via an HTTP POST request. The input data will be in JSON format, a standard for sending structured data over the web.
Here is the complete code for a simple prediction API. Assume you have a trained scikit-learn model saved as model.joblib.
# app.py
from flask import Flask, request, jsonify
import joblib
import numpy as np
# Create the Flask application instance
app = Flask(__name__)
# Load the trained machine learning model
# This is done once when the application starts
model = joblib.load("model.joblib")
@app.route("/")
def index():
return "Model API is running."
@app.route("/predict", methods=['POST'])
def predict():
# Get the JSON data from the request
data = request.get_json()
# Basic validation
if not data or 'features' not in data:
return jsonify({"error": "Invalid input: 'features' is missing."}), 400
try:
# Extract features and convert to a NumPy array for the model
features = np.array(data['features']).reshape(1, -1)
# Get a prediction from the model
prediction = model.predict(features)
# Convert the prediction to a standard Python type
output = prediction.tolist()
# Return the prediction as a JSON response
return jsonify({"prediction": output})
except Exception as e:
# Handle potential errors during prediction
return jsonify({"error": str(e)}), 500
if __name__ == '__main__':
# Run the app on host 0.0.0.0 to make it accessible from outside the container
app.run(host='0.0.0.0', port=5000)
Let's break down the /predict function:
methods=['POST']: This specifies that the endpoint only responds to POST requests, which is the standard method for sending data to a server to create or update a resource.request.get_json(): Flask's request object gives us access to the incoming HTTP request. The get_json() method parses the request body as JSON and returns it as a Python dictionary.np.array(data['features']).reshape(1, -1): We extract the list of features from the input JSON and convert it into the 2D NumPy array format that scikit-learn models expect.model.predict(features): This is where we use our pre-loaded model to make the actual prediction.jsonify({"prediction": output}): We wrap our prediction result in a dictionary and use Flask's jsonify utility to properly format it as a JSON response with the correct HTTP headers.The diagram below illustrates the flow of a request through our API.
A client application sends features in a JSON payload to the Flask API. The API uses the loaded model to generate a prediction and returns it to the client as JSON.
After saving the code as app.py, you can run it from your terminal:
python app.py
You should see output indicating that the server is running, usually on port 5000. Now your API is listening for requests. You can test it from another terminal using a command-line tool like curl. The following command sends a JSON payload with a feature vector to your /predict endpoint.
curl -X POST http://127.0.0.1:5000/predict \
-H "Content-Type: application/json" \
-d '{"features": [5.1, 3.5, 1.4, 0.2]}'
If everything is working correctly, the API will respond with a JSON object containing the model's prediction:
{
"prediction": [0]
}
This simple API serves as the fundamental building block for model deployment. By containerizing this Flask application, as we saw in the previous section, we create a portable, isolated, and scalable unit of deployment. This unit can then be deployed in various production environments, ready to serve predictions to any application that can speak HTTP.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with