Creating a Model API with Flask

A trained machine learning model, typically saved as a file, exists in isolation. To make it useful, other applications need a way to communicate with it, sending it new data and receiving predictions. This communication is accomplished by wrapping the model in an Application Programming Interface, or API. An API defines a standard contract for how software components should interact. To facilitate interaction with models over a network, building a web API is a common approach, allowing applications to access prediction capabilities using standard web protocols like HTTP.

This approach decouples the model from the application that uses it. The application, whether it's a mobile app, a web dashboard, or another backend service, doesn't need to know anything about Python or scikit-learn. It only needs to know how to send an HTTP request to a specific URL (an endpoint) and how to parse the response.

Introducing Flask

To build this web API, we will use Flask, a popular "micro" web framework for Python. The term "micro" doesn't mean it's lacking in features; it means Flask aims to keep its core simple and extensible. It provides the essential tools for building web applications and APIs without imposing a lot of structure or dependencies. This makes it an excellent choice for creating a lightweight service whose main job is to serve model predictions.

Before you begin, you will need to install Flask. You can do this using pip:

pip install Flask scikit-learn joblib

We include scikit-learn and joblib as we will assume we are loading a simple model saved in that format.

The Anatomy of a Prediction API

Essentially, a Flask API consists of a few main parts:

Application Instance: A central object that coordinates the application.
Routes: URL paths that your API exposes, like /predict or /health.
View Functions: Python functions that are executed when a client sends a request to a specific route. These functions contain the logic for processing the request and generating a response.

Let's start by building the simplest possible Flask application to see these parts in action.

# app.py
from flask import Flask

# 1. Create the Flask application instance
app = Flask(__name__)

# 2. Define a route and its corresponding view function
@app.route("/")
def index():
    # 3. The function returns the response
    return "Model API is running."

In this code, @app.route("/") is a Python decorator that tells Flask that the index() function should be triggered whenever a web request is made to the root URL (/).

Loading the Model and Serving Predictions

Now, let's evolve this basic structure into a functional prediction service. A critical performance factor is to load the model into memory only once when the application starts, not every time a prediction request comes in. Loading a model from disk can be a slow operation, and doing it repeatedly would create a significant bottleneck.

We will create a new endpoint, /predict, that accepts data via an HTTP POST request. The input data will be in JSON format, a standard for sending structured data over the web.

Here is the complete code for a simple prediction API. Assume you have a trained scikit-learn model saved as model.joblib.

# app.py
from flask import Flask, request, jsonify
import joblib
import numpy as np

# Create the Flask application instance
app = Flask(__name__)

# Load the trained machine learning model
# This is done once when the application starts
model = joblib.load("model.joblib")

@app.route("/")
def index():
    return "Model API is running."

@app.route("/predict", methods=['POST'])
def predict():
    # Get the JSON data from the request
    data = request.get_json()

    # Basic validation
    if not data or 'features' not in data:
        return jsonify({"error": "Invalid input: 'features' is missing."}), 400

    try:
        # Extract features and convert to a NumPy array for the model
        features = np.array(data['features']).reshape(1, -1)

        # Get a prediction from the model
        prediction = model.predict(features)

        # Convert the prediction to a standard Python type
        output = prediction.tolist()

        # Return the prediction as a JSON response
        return jsonify({"prediction": output})

    except Exception as e:
        # Handle potential errors during prediction
        return jsonify({"error": str(e)}), 500

if __name__ == '__main__':
    # Run the app on host 0.0.0.0 to make it accessible from outside the container
    app.run(host='0.0.0.0', port=5000)

Let's break down the /predict function:

methods=['POST']: This specifies that the endpoint only responds to POST requests, which is the standard method for sending data to a server to create or update a resource.
request.get_json(): Flask's request object gives us access to the incoming HTTP request. The get_json() method parses the request body as JSON and returns it as a Python dictionary.
np.array(data['features']).reshape(1, -1): We extract the list of features from the input JSON and convert it into the 2D NumPy array format that scikit-learn models expect.
model.predict(features): This is where we use our pre-loaded model to make the actual prediction.
jsonify({"prediction": output}): We wrap our prediction result in a dictionary and use Flask's jsonify utility to properly format it as a JSON response with the correct HTTP headers.

The diagram below illustrates the flow of a request through our API.

A client application sends features in a JSON payload to the Flask API. The API uses the loaded model to generate a prediction and returns it to the client as JSON.

Testing Your Model API

After saving the code as app.py, you can run it from your terminal:

python app.py

You should see output indicating that the server is running, usually on port 5000. Now your API is listening for requests. You can test it from another terminal using a command-line tool like curl. The following command sends a JSON payload with a feature vector to your /predict endpoint.

curl -X POST http://127.0.0.1:5000/predict \
-H "Content-Type: application/json" \
-d '{"features": [5.1, 3.5, 1.4, 0.2]}'

If everything is working correctly, the API will respond with a JSON object containing the model's prediction:

{
  "prediction": [0]
}

This simple API serves as the fundamental building block for model deployment. By containerizing this Flask application, as we saw in the previous section, we create a portable, isolated, and scalable unit of deployment. This unit can then be deployed in various production environments, ready to serve predictions to any application that can speak HTTP.

Was this section helpful?

References

Flask Documentation, Pallets, 2024 - The official and comprehensive guide to the Flask micro web framework, essential for understanding its core concepts and how to build web applications and APIs.
Flask Web Development: Developing Applications with Python, Miguel Grinberg, 2018 (O'Reilly Media) - A practical book guiding readers through building Flask applications, covering API development, handling requests, and deployment considerations, suitable for both beginners and experienced developers.
Introducing MLOps: How to Go from Model to Production, Mark Treveil and Dataiku, 2020 (O'Reilly Media) - This book provides an overview of MLOps principles and practices, contextualizing model deployment via APIs within the broader machine learning lifecycle.
Building Machine Learning Powered Applications: Going from Idea to Product, Emmanuel Ameisen, 2020 (O'Reilly Media) - Offers practical strategies for transforming machine learning models into production-ready applications, including detailed discussions on API design and serving models.