Building a working prediction service combines APIs, the Flask web framework, and the process of loading a saved machine learning model within a Python application. Detailed instructions guide the creation of such a service.This hands-on exercise will guide you step-by-step through creating a simple web API using Flask. This API will load a pre-trained model (like the one you saved in Chapter 2) and expose an endpoint that accepts input data via HTTP, uses the model to make a prediction, and returns the result.PrerequisitesBefore we start coding, make sure you have the following ready:Python: Ensure Python 3 is installed on your system.Flask: You need to install the Flask library. If you haven't already, open your terminal or command prompt and run:pip install Flask joblib scikit-learn pandas(We include joblib, scikit-learn and pandas as they are commonly used for saving/loading models and handling data, adjust if your model uses different libraries like pickle).Saved Model: You need a trained machine learning model saved to a file using joblib or pickle. For this example, let's assume you have a model file named model.joblib saved in the same directory where you'll create your Flask application. This model should be trained to predict something based on specific input features. We will also assume this model requires input as a Pandas DataFrame.(Optional) Saved Preprocessor: If your model requires specific preprocessing steps (like scaling or encoding) that were saved separately (e.g., scaler.joblib or a full pipeline.joblib), have that file ready too. For simplicity in this first example, we'll assume the model file contains the necessary steps or that preprocessing is simple enough to be done directly in the Flask app.Project StructureLet's keep things organized. Create a new directory for your project, for instance, simple_ml_api. Inside this directory, place your saved model file (model.joblib). You will create your Flask application script, named app.py, in this same directory.simple_ml_api/ ├── app.py # Your Flask application code └── model.joblib # Your saved model fileStep 1: Create the Basic Flask Application (app.py)Open a new file named app.py in your project directory and start by importing the necessary libraries and initializing Flask:import joblib import pandas as pd from flask import Flask, request, jsonify # Initialize the Flask application app = Flask(__name__) # Load the trained model (and preprocessor if you have one) # We load it ONCE when the app starts, not inside the route function # This is more efficient as it avoids reloading on every request. try: model = joblib.load("model.joblib") print("Model loaded successfully.") # If you have a separate preprocessor, load it here: # preprocessor = joblib.load("preprocessor.joblib") except FileNotFoundError: print("Error: model.joblib not found. Make sure the model file is in the correct directory.") model = None except Exception as e: print(f"Error loading model: {e}") model = None Here, we import Flask for the web server, request to handle incoming data, jsonify to create JSON responses, joblib to load our model, and pandas because many models expect data in DataFrame format. We initialize the Flask app with app = Flask(__name__).Crucially, we load the model.joblib file right after initializing the app. This means the model is loaded into memory only once when the application starts. Loading it inside the prediction function would be very inefficient, as it would reload the model from disk for every single prediction request. We also added basic error handling in case the model file is missing or cannot be loaded.Step 2: Define the Prediction EndpointNow, let's create the specific URL endpoint that will handle prediction requests. We'll use the route /predict and specify that it should accept HTTP POST requests, as clients will be sending data to it.# Add this below the model loading code in app.py @app.route('/predict', methods=['POST']) def predict(): # Check if the model loaded successfully if model is None: return jsonify({"error": "Model not loaded or failed to load."}), 500 # 1. Get data from the POST request try: data = request.get_json(force=True) print(f"Received data: {data}") # Log received data # Ensure data is in the expected format (e.g., a dictionary) if not isinstance(data, dict): raise ValueError("Input data must be a JSON object (dictionary).") # 2. Prepare the data for the model # Assuming the model expects a Pandas DataFrame with specific column names # Adjust column names based on your model's training data # Example: {'feature1': value1, 'feature2': value2, ...} feature_values = list(data.values()) feature_names = list(data.keys()) # Or define expected columns explicitly input_df = pd.DataFrame([feature_values], columns=feature_names) print(f"Prepared DataFrame:\n{input_df}") # Log the DataFrame # If you had a preprocessor, you would apply it here: # input_processed = preprocessor.transform(input_df) # prediction = model.predict(input_processed) # 3. Make prediction prediction = model.predict(input_df) # Convert prediction to a standard Python type if necessary (e.g., from numpy array) # Ensure the output is JSON serializable output = prediction[0] if hasattr(output, 'item'): # Handles numpy types output = output.item() print(f"Prediction result: {output}") # Log the prediction # 4. Return the prediction as a JSON response return jsonify({"prediction": output}) except ValueError as ve: print(f"Value Error: {ve}") return jsonify({"error": f"Invalid input data format: {ve}"}), 400 except KeyError as ke: print(f"Error: {ke}") return jsonify({"error": f"Missing expected feature in input data: {ke}"}), 400 except Exception as e: # Catch other potential errors during processing or prediction print(f"An error occurred: {e}") return jsonify({"error": "An error occurred during prediction."}), 500 Let's break down the predict function:Check Model: First, it verifies if the model was loaded successfully during startup. If not, it returns an error.Get Data: request.get_json(force=True) attempts to parse the incoming request body as JSON. force=True helps if the content type isn't explicitly set to application/json, but it's good practice for clients to set it. We add basic validation to check if the received data is a dictionary.Prepare Data: This step is critical and depends entirely on how your model was trained. Many scikit-learn models expect input as a 2D array-like structure (like a Pandas DataFrame or NumPy array) with specific feature columns in a specific order. Here, we assume the input JSON is a dictionary like {"feature1": 1.0, "feature2": 2.5, ...}. We convert this into a single-row Pandas DataFrame. You must adapt the column names and data preparation to match your specific model's requirements. If you saved a preprocessor or pipeline, you would apply its transform method here.Predict: We call the model.predict() method, passing the prepared data (input_df).Format Output: The prediction often comes back as a NumPy array (even for a single prediction). We extract the first element (prediction[0]) and convert it to a standard Python type using .item() if necessary, ensuring it can be easily converted to JSON.Return Response: jsonify({"prediction": output}) creates a JSON response containing the prediction result.Error Handling: The try...except block catches potential issues like missing JSON data, incorrect data format (e.g., missing features), or errors during the prediction step itself, returning informative JSON error messages with appropriate HTTP status codes (400 for client errors, 500 for server errors).Step 3: Add Code to Run the ServerFinally, add the standard Python construct to make the script runnable and start the Flask development server:# Add this at the very end of app.py if __name__ == '__main__': # Set host='0.0.0.0' to make the server accessible from other devices on your network # Use a port different from the default 5000 if needed app.run(host='0.0.0.0', port=5000, debug=True)This code block checks if the script is being executed directly (not imported). app.run() starts the Flask development server.host='0.0.0.0' makes the server listen on all available network interfaces, not just localhost. This is useful if you want to test from another device on your network or eventually run it in a container.port=5000 specifies the port number (5000 is the default for Flask).debug=True enables debug mode. This provides more detailed error messages in the browser and automatically restarts the server when you save changes to the code. Important: Do not use debug=True in a production environment due to security risks and performance overhead.Step 4: Run Your Flask APINow you're ready to run your prediction service!Open your terminal or command prompt.Navigate to your project directory (simple_ml_api).Make sure your model.joblib file is present.Run the application:python app.pyYou should see output indicating that the model loaded successfully and the Flask server is running, typically something like:Model loaded successfully. * Serving Flask app 'app' * Debug mode: on WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Running on all addresses (0.0.0.0) * Running on http://127.0.0.1:5000 * Running on http://[your-local-ip]:5000 Press CTRL+C to quit * Restarting with stat Model loaded successfully. * Debugger is active! * Debugger PIN: ...Your API is now running and listening for requests on port 5000!Step 5: Test Your APIYou need a way to send a POST request with JSON data to your running API. You can use tools like curl (a command-line tool) or write a simple Python script using the requests library.Example Input Data:Let's assume your model.joblib expects four features named sepal_length, sepal_width, petal_length, and petal_width. Your input JSON should look like this:{ "sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2 }Testing with curl:Open another terminal window (leave the first one running the server) and execute the following command. Replace the feature values if needed.curl -X POST -H "Content-Type: application/json" \ -d '{"sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2}' \ http://127.0.0.1:5000/predict-X POST: Specifies the HTTP method as POST.-H "Content-Type: application/json": Tells the server the body contains JSON data.-d '{...}': Provides the JSON data in the request body.http://127.0.0.1:5000/predict: The URL of your API endpoint.Testing with Python requests:Alternatively, create a small Python script (e.g., test_api.py) or use an interactive Python session:import requests import json # The URL of your Flask API endpoint url = 'http://127.0.0.1:5000/predict' # The input data as a Python dictionary # Adjust feature names and values based on your model input_data = { "sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2 } # Send the POST request with JSON data response = requests.post(url, json=input_data) # Check if the request was successful (status code 200) if response.status_code == 200: # Print the JSON response (prediction) from the API result = response.json() print(f"API Response: {result}") # Example output might be: API Response: {'prediction': 0} or {'prediction': 'setosa'} else: # Print error information if the request failed print(f"Error: {response.status_code}") try: print(f"Error details: {response.json()}") except json.JSONDecodeError: print(f"Error details: {response.text}") Run this script (python test_api.py).Expected Output:If everything works correctly, both curl and the Python script should receive a JSON response from your API, similar to this (the actual prediction value depends on your model):{ "prediction": 0 }or perhaps (if your model predicts class names):{ "prediction": "setosa" }You should also see log messages in the terminal where app.py is running, showing the received data, the prepared DataFrame, and the prediction result.digraph G { rankdir=LR; node [shape=box, style=filled, fillcolor="#e9ecef", fontname="sans-serif"]; edge [color="#495057", fontname="sans-serif"]; Client [label="Client\n(curl or Python requests)", shape=oval, fillcolor="#a5d8ff"]; Flask [label="Flask Server (app.py)\nRunning on port 5000", fillcolor="#bac8ff"]; Model [label="Loaded Model\n(model.joblib)", shape=cylinder, fillcolor="#96f2d7"]; subgraph cluster_flask { label = "Inside Flask App"; style=filled; color="#dee2e6"; Flask; Model; } Client -> Flask [label=" POST /predict \n Content-Type: application/json \n Body: {'feature': value, ...} "]; Flask -> Model [label=" Calls model.predict(input_df) "]; Model -> Flask [label=" Returns prediction result "]; Flask -> Client [label=" HTTP 200 OK \n Body: {'prediction': result} "]; }Flow diagram showing a client sending a POST request with JSON data to the /predict endpoint of the running Flask application. The application processes the data, uses the loaded model to make a prediction, and returns the result as a JSON response back to the client.Full app.py Example CodeHere is the complete code for app.py for easy reference:import joblib import pandas as pd from flask import Flask, request, jsonify # Initialize the Flask application app = Flask(__name__) # Load the trained model try: model = joblib.load("model.joblib") print("Model loaded successfully.") except FileNotFoundError: print("Error: model.joblib not found.") model = None except Exception as e: print(f"Error loading model: {e}") model = None @app.route('/predict', methods=['POST']) def predict(): # Check if the model loaded successfully if model is None: return jsonify({"error": "Model not loaded or failed to load."}), 500 # 1. Get data from the POST request try: data = request.get_json(force=True) print(f"Received data: {data}") if not isinstance(data, dict): raise ValueError("Input data must be a JSON object (dictionary).") # 2. Prepare the data for the model # IMPORTANT: Adjust feature names to match your model's training data feature_values = list(data.values()) feature_names = list(data.keys()) # Or use a predefined list: ['sepal_length', 'sepal_width', ...] input_df = pd.DataFrame([feature_values], columns=feature_names) print(f"Prepared DataFrame:\n{input_df}") # 3. Make prediction prediction = model.predict(input_df) # 4. Format output output = prediction[0] if hasattr(output, 'item'): # Handle numpy types output = output.item() print(f"Prediction result: {output}") # 5. Return the prediction as a JSON response return jsonify({"prediction": output}) except ValueError as ve: print(f"Value Error: {ve}") return jsonify({"error": f"Invalid input data format: {ve}"}), 400 except KeyError as ke: print(f"Error: {ke}") return jsonify({"error": f"Missing expected feature in input data: {ke}"}), 400 except Exception as e: print(f"An error occurred: {e}") return jsonify({"error": "An error occurred during prediction."}), 500 if __name__ == '__main__': # Run the app, accessible on the network, with debug mode on # Remember to turn debug=False for production app.run(host='0.0.0.0', port=5000, debug=True)Congratulations! You've successfully built a basic machine learning prediction API using Flask. This service takes your saved model, wraps it in a web server, accepts input data over HTTP, and returns predictions. This is a fundamental pattern for making your models usable by other applications or users. In the next chapter, we'll look at how to package this application using Docker to make it even more portable and easier to deploy.