Alright, let's put theory into practice! In the previous sections, you learned about APIs, the Flask web framework, and how to load your saved machine learning model within a Python application. Now, we'll combine these elements to build a working prediction service.
This hands-on exercise will guide you step-by-step through creating a simple web API using Flask. This API will load a pre-trained model (like the one you saved in Chapter 2) and expose an endpoint that accepts input data via HTTP, uses the model to make a prediction, and returns the result.
Before we start coding, make sure you have the following ready:
pip install Flask joblib scikit-learn pandas
(We include joblib
, scikit-learn
and pandas
as they are commonly used for saving/loading models and handling data, adjust if your model uses different libraries like pickle
).joblib
or pickle
. For this example, let's assume you have a model file named model.joblib
saved in the same directory where you'll create your Flask application. This model should be trained to predict something based on specific input features. We will also assume this model requires input as a Pandas DataFrame.scaler.joblib
or a full pipeline.joblib
), have that file ready too. For simplicity in this first example, we'll assume the model file contains the necessary steps or that preprocessing is simple enough to be done directly in the Flask app.Let's keep things organized. Create a new directory for your project, for instance, simple_ml_api
. Inside this directory, place your saved model file (model.joblib
). You will create your Flask application script, named app.py
, in this same directory.
simple_ml_api/
├── app.py # Your Flask application code
└── model.joblib # Your saved model file
app.py
)Open a new file named app.py
in your project directory and start by importing the necessary libraries and initializing Flask:
import joblib
import pandas as pd
from flask import Flask, request, jsonify
# Initialize the Flask application
app = Flask(__name__)
# Load the trained model (and preprocessor if you have one)
# We load it ONCE when the app starts, not inside the route function
# This is more efficient as it avoids reloading on every request.
try:
model = joblib.load("model.joblib")
print("Model loaded successfully.")
# If you have a separate preprocessor, load it here:
# preprocessor = joblib.load("preprocessor.joblib")
except FileNotFoundError:
print("Error: model.joblib not found. Make sure the model file is in the correct directory.")
model = None
except Exception as e:
print(f"Error loading model: {e}")
model = None
Here, we import Flask
for the web server, request
to handle incoming data, jsonify
to create JSON responses, joblib
to load our model, and pandas
because many models expect data in DataFrame format. We initialize the Flask app with app = Flask(__name__)
.
Crucially, we load the model.joblib
file right after initializing the app. This means the model is loaded into memory only once when the application starts. Loading it inside the prediction function would be very inefficient, as it would reload the model from disk for every single prediction request. We also added basic error handling in case the model file is missing or cannot be loaded.
Now, let's create the specific URL endpoint that will handle prediction requests. We'll use the route /predict
and specify that it should accept HTTP POST requests, as clients will be sending data to it.
# Add this below the model loading code in app.py
@app.route('/predict', methods=['POST'])
def predict():
# Check if the model loaded successfully
if model is None:
return jsonify({"error": "Model not loaded or failed to load."}), 500
# 1. Get data from the POST request
try:
data = request.get_json(force=True)
print(f"Received data: {data}") # Log received data
# Ensure data is in the expected format (e.g., a dictionary)
if not isinstance(data, dict):
raise ValueError("Input data must be a JSON object (dictionary).")
# 2. Prepare the data for the model
# Assuming the model expects a Pandas DataFrame with specific column names
# Adjust column names based on your model's training data
# Example: {'feature1': value1, 'feature2': value2, ...}
feature_values = list(data.values())
feature_names = list(data.keys()) # Or define expected columns explicitly
input_df = pd.DataFrame([feature_values], columns=feature_names)
print(f"Prepared DataFrame:\n{input_df}") # Log the DataFrame
# If you had a preprocessor, you would apply it here:
# input_processed = preprocessor.transform(input_df)
# prediction = model.predict(input_processed)
# 3. Make prediction
prediction = model.predict(input_df)
# Convert prediction to a standard Python type if necessary (e.g., from numpy array)
# Ensure the output is JSON serializable
output = prediction[0]
if hasattr(output, 'item'): # Handles numpy types
output = output.item()
print(f"Prediction result: {output}") # Log the prediction
# 4. Return the prediction as a JSON response
return jsonify({"prediction": output})
except ValueError as ve:
print(f"Value Error: {ve}")
return jsonify({"error": f"Invalid input data format: {ve}"}), 400
except KeyError as ke:
print(f"Key Error: {ke}")
return jsonify({"error": f"Missing expected feature in input data: {ke}"}), 400
except Exception as e:
# Catch other potential errors during processing or prediction
print(f"An error occurred: {e}")
return jsonify({"error": "An error occurred during prediction."}), 500
Let's break down the predict
function:
model
was loaded successfully during startup. If not, it returns an error.request.get_json(force=True)
attempts to parse the incoming request body as JSON. force=True
helps if the content type isn't explicitly set to application/json
, but it's good practice for clients to set it. We add basic validation to check if the received data is a dictionary.{"feature1": 1.0, "feature2": 2.5, ...}
. We convert this into a single-row Pandas DataFrame. You must adapt the column names and data preparation to match your specific model's requirements. If you saved a preprocessor or pipeline, you would apply its transform
method here.model.predict()
method, passing the prepared data (input_df
).prediction[0]
) and convert it to a standard Python type using .item()
if necessary, ensuring it can be easily converted to JSON.jsonify({"prediction": output})
creates a JSON response containing the prediction result.try...except
block catches potential issues like missing JSON data, incorrect data format (e.g., missing features), or errors during the prediction step itself, returning informative JSON error messages with appropriate HTTP status codes (400 for client errors, 500 for server errors).Finally, add the standard Python construct to make the script runnable and start the Flask development server:
# Add this at the very end of app.py
if __name__ == '__main__':
# Set host='0.0.0.0' to make the server accessible from other devices on your network
# Use a port different from the default 5000 if needed
app.run(host='0.0.0.0', port=5000, debug=True)
This code block checks if the script is being executed directly (not imported). app.run()
starts the Flask development server.
host='0.0.0.0'
makes the server listen on all available network interfaces, not just localhost. This is useful if you want to test from another device on your network or eventually run it in a container.port=5000
specifies the port number (5000 is the default for Flask).debug=True
enables debug mode. This provides more detailed error messages in the browser and automatically restarts the server when you save changes to the code. Important: Do not use debug=True
in a production environment due to security risks and performance overhead.Now you're ready to run your prediction service!
simple_ml_api
).model.joblib
file is present.python app.py
You should see output indicating that the model loaded successfully and the Flask server is running, typically something like:
Model loaded successfully.
* Serving Flask app 'app'
* Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:5000
* Running on http://[your-local-ip]:5000
Press CTRL+C to quit
* Restarting with stat
Model loaded successfully.
* Debugger is active!
* Debugger PIN: ...
Your API is now running and listening for requests on port 5000!
You need a way to send a POST request with JSON data to your running API. You can use tools like curl
(a command-line tool) or write a simple Python script using the requests
library.
Example Input Data:
Let's assume your model.joblib
expects four features named sepal_length
, sepal_width
, petal_length
, and petal_width
. Your input JSON should look like this:
{
"sepal_length": 5.1,
"sepal_width": 3.5,
"petal_length": 1.4,
"petal_width": 0.2
}
Testing with curl
:
Open another terminal window (leave the first one running the server) and execute the following command. Replace the feature values if needed.
curl -X POST -H "Content-Type: application/json" \
-d '{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}' \
http://127.0.0.1:5000/predict
-X POST
: Specifies the HTTP method as POST.-H "Content-Type: application/json"
: Tells the server the body contains JSON data.-d '{...}'
: Provides the JSON data in the request body.http://127.0.0.1:5000/predict
: The URL of your API endpoint.Testing with Python requests
:
Alternatively, create a small Python script (e.g., test_api.py
) or use an interactive Python session:
import requests
import json
# The URL of your Flask API endpoint
url = 'http://127.0.0.1:5000/predict'
# The input data as a Python dictionary
# Adjust feature names and values based on your model
input_data = {
"sepal_length": 5.1,
"sepal_width": 3.5,
"petal_length": 1.4,
"petal_width": 0.2
}
# Send the POST request with JSON data
response = requests.post(url, json=input_data)
# Check if the request was successful (status code 200)
if response.status_code == 200:
# Print the JSON response (prediction) from the API
result = response.json()
print(f"API Response: {result}")
# Example output might be: API Response: {'prediction': 0} or {'prediction': 'setosa'}
else:
# Print error information if the request failed
print(f"Error: {response.status_code}")
try:
print(f"Error details: {response.json()}")
except json.JSONDecodeError:
print(f"Error details: {response.text}")
Run this script (python test_api.py
).
Expected Output:
If everything works correctly, both curl
and the Python script should receive a JSON response from your API, similar to this (the actual prediction value depends on your model):
{
"prediction": 0
}
or perhaps (if your model predicts class names):
{
"prediction": "setosa"
}
You should also see log messages in the terminal where app.py
is running, showing the received data, the prepared DataFrame, and the prediction result.
Flow diagram showing a client sending a POST request with JSON data to the
/predict
endpoint of the running Flask application. The application processes the data, uses the loaded model to make a prediction, and returns the result as a JSON response back to the client.
app.py
Example CodeHere is the complete code for app.py
for easy reference:
import joblib
import pandas as pd
from flask import Flask, request, jsonify
# Initialize the Flask application
app = Flask(__name__)
# Load the trained model
try:
model = joblib.load("model.joblib")
print("Model loaded successfully.")
except FileNotFoundError:
print("Error: model.joblib not found.")
model = None
except Exception as e:
print(f"Error loading model: {e}")
model = None
@app.route('/predict', methods=['POST'])
def predict():
# Check if the model loaded successfully
if model is None:
return jsonify({"error": "Model not loaded or failed to load."}), 500
# 1. Get data from the POST request
try:
data = request.get_json(force=True)
print(f"Received data: {data}")
if not isinstance(data, dict):
raise ValueError("Input data must be a JSON object (dictionary).")
# 2. Prepare the data for the model
# IMPORTANT: Adjust feature names to match your model's training data
feature_values = list(data.values())
feature_names = list(data.keys()) # Or use a predefined list: ['sepal_length', 'sepal_width', ...]
input_df = pd.DataFrame([feature_values], columns=feature_names)
print(f"Prepared DataFrame:\n{input_df}")
# 3. Make prediction
prediction = model.predict(input_df)
# 4. Format output
output = prediction[0]
if hasattr(output, 'item'): # Handle numpy types
output = output.item()
print(f"Prediction result: {output}")
# 5. Return the prediction as a JSON response
return jsonify({"prediction": output})
except ValueError as ve:
print(f"Value Error: {ve}")
return jsonify({"error": f"Invalid input data format: {ve}"}), 400
except KeyError as ke:
print(f"Key Error: {ke}")
return jsonify({"error": f"Missing expected feature in input data: {ke}"}), 400
except Exception as e:
print(f"An error occurred: {e}")
return jsonify({"error": "An error occurred during prediction."}), 500
if __name__ == '__main__':
# Run the app, accessible on the network, with debug mode on
# Remember to turn debug=False for production
app.run(host='0.0.0.0', port=5000, debug=True)
Congratulations! You've successfully built a basic machine learning prediction API using Flask. This service takes your saved model, wraps it in a web server, accepts input data over HTTP, and returns predictions. This is a fundamental pattern for making your models usable by other applications or users. In the next chapter, we'll look at how to package this application using Docker to make it even more portable and easier to deploy.
© 2025 ApX Machine Learning