After training and saving a machine learning model, the next challenge is often how an external application—such as a web app or another service—can actually use that model to obtain predictions. Direct access to the file system where a model.pkl or model.joblib file is stored is generally not a viable approach. Instead, a defined method for different software components to communicate is required. This is precisely the problem that Application Programming Interfaces, or APIs, are designed to solve.
Think of an API as a contract or a set of rules that allows one piece of software to request services or data from another. It defines the kinds of requests that can be made, how to make them, the data formats that should be used, and what responses to expect.
Imagine you're at a restaurant. You (the client application) want food (a prediction). You don't go directly into the kitchen (the model logic) and start cooking. Instead, you interact with a waiter (the API).
In this analogy:
In the context of web services, APIs often use the HyperText Transfer Protocol (HTTP), the same protocol your web browser uses to fetch web pages. When we build a prediction service, we typically create a web API. This means our model will listen for incoming requests over the network at specific URLs (often called endpoints).
A client application sends an HTTP request to a specific endpoint. This request usually includes:
POST (commonly used for sending data to create or update something, suitable for sending input features for prediction) or GET (typically for retrieving data).http://yourserver.com/predict).The server hosting the API receives this request, processes the input data, potentially loads the saved model, feeds the data to the model to get a prediction, and then sends an HTTP response back to the client. This response usually contains the prediction result, again often formatted as JSON.
A simple interaction flow: A client sends input data via an HTTP request to the API server, which uses the saved model to generate a prediction and sends it back in an HTTP response.
Why is this useful for machine learning deployment?
In this chapter, we'll focus on building such a web API using Flask, a popular and lightweight Python web framework. You'll learn how to create endpoints, handle incoming data, load your previously saved model, generate predictions, and send those predictions back as responses. This API will serve as the bridge between your trained model and the applications that need its intelligence.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with