Having trained a model, the next logical step is making it available to serve predictions. TensorFlow Serving is a high-performance system specifically designed for this purpose. In this hands-on exercise, we'll walk through the process of saving a trained TensorFlow model and deploying it locally using TensorFlow Serving running inside a Docker container. We will then interact with the deployed model using its REST API.
This practical assumes you have Docker installed and running on your system, along with a working Python environment with TensorFlow installed.
First, let's create a very simple Keras model. For this example, we don't need a complex architecture; the focus is on the deployment mechanics.
import tensorflow as tf
import numpy as np
import os
import shutil # For directory cleanup
# Define a simple model
def create_simple_model():
model = tf.keras.Sequential([
tf.keras.layers.Dense(10, activation='relu', input_shape=(4,)),
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(3, activation='softmax') # Example output shape
])
# Compile is needed for saving signatures, but we won't train here
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy')
return model
model = create_simple_model()
# Let's generate some dummy data just to show input/output shapes
print("Model Summary:")
model.summary()
dummy_input = np.random.rand(1, 4)
print(f"\nDummy input shape: {dummy_input.shape}")
dummy_output = model.predict(dummy_input)
print(f"Dummy output shape: {dummy_output.shape}")
print(f"Dummy output: {dummy_output}")
# --- Saving the Model ---
# Define the path where the model will be saved.
# TF Serving expects models to be in versioned directories.
model_dir = 'simple_model'
version = 1
export_path = os.path.join(model_dir, str(version))
# Clean up if the directory already exists
if os.path.isdir(export_path):
print(f"Removing existing directory: {export_path}")
shutil.rmtree(model_dir)
print(f"\nSaving model to: {export_path}")
# Save the model in TensorFlow's SavedModel format
# This format includes the model architecture, weights, and serving signatures.
tf.keras.models.save_model(
model,
export_path,
overwrite=True,
include_optimizer=True, # Optional, but good practice
signatures=None, # Keras automatically generates a default 'serving_default' signature
options=None
)
print(f"\nModel saved successfully!")
print(f"Directory structure under {model_dir}:")
for root, dirs, files in os.walk(model_dir):
indent = ' ' * 4 * (root.count(os.sep) - model_dir.count(os.sep))
print(f"{indent}{os.path.basename(root)}/")
file_indent = ' ' * 4 * (root.count(os.sep) - model_dir.count(os.sep) + 1)
for f in files:
print(f"{file_indent}{f}")
Executing this code creates a directory named simple_model
containing a subdirectory 1
(the version). Inside 1
, you'll find the saved_model.pb
file defining the computation graph, along with subdirectories like variables
(containing the model weights) and potentially assets
. This SavedModel
format is precisely what TensorFlow Serving needs.
Now, we'll use Docker to run the official TensorFlow Serving image and point it to our saved model. Open your terminal or command prompt.
First, ensure you have the latest serving image:
docker pull tensorflow/serving
Next, run the container. You need to replace /path/to/your/simple_model
with the absolute path to the simple_model
directory you just created on your host machine.
# Make sure you are in the directory *containing* the 'simple_model' folder
# or provide the full absolute path to 'simple_model'
# Example using absolute path (replace with your actual path):
# On Linux/macOS:
# docker run -p 8501:8501 --mount type=bind,source=/home/user/my_projects/advanced_tf/simple_model,target=/models/my_simple_classifier -e MODEL_NAME=my_simple_classifier -t tensorflow/serving
# On Windows (using PowerShell):
# docker run -p 8501:8501 --mount type=bind,source=C:\Users\YourUser\MyProjects\advanced_tf\simple_model,target=/models/my_simple_classifier -e MODEL_NAME=my_simple_classifier -t tensorflow/serving
# Example using current directory (run from the parent of 'simple_model'):
# On Linux/macOS:
docker run -p 8501:8501 --mount type=bind,source=$(pwd)/simple_model,target=/models/my_simple_classifier -e MODEL_NAME=my_simple_classifier -t tensorflow/serving &
# On Windows (using PowerShell):
# docker run -p 8501:8501 --mount type=bind,source=${PWD}/simple_model,target=/models/my_simple_classifier -e MODEL_NAME=my_simple_classifier -t tensorflow/serving
Let's break down this command:
docker run
: Starts a new container.-p 8501:8501
: Maps port 8501 on your host machine to port 8501 inside the container. This is the default port for TF Serving's REST API.--mount type=bind,source=<host_path>,target=/models/my_simple_classifier
: This is the critical part. It makes the simple_model
directory from your host machine (the source
) available inside the container at the path /models/my_simple_classifier
(the target
). TensorFlow Serving is configured by default to look for models inside the /models
directory within the container. We are naming our model my_simple_classifier
within the serving environment.-e MODEL_NAME=my_simple_classifier
: This environment variable explicitly tells TensorFlow Serving which model to load from the /models
directory. The name must match the subdirectory name used in the target
part of the --mount
option.-t tensorflow/serving
: Specifies the Docker image to use.&
(Linux/macOS): Runs the container in the background (optional).After running the command, Docker will pull the image if you don't have it locally and then start the container. You should see log output from TensorFlow Serving indicating it's looking for models and hopefully loading my_simple_classifier
successfully. Look for lines similar to:
... Successfully loaded servable version {name: my_simple_classifier version: 1} ...
... Running gRPC ModelServer at 0.0.0.0:8500 ...
... Exporting HTTP/REST API at 0.0.0.0:8501 ...
If you see errors, double-check the source
path in the --mount
command; it must be the correct absolute path to the directory containing the version subdirectory (1
).
With TF Serving running and the model loaded, we can now send prediction requests. We'll use Python's requests
library to interact with the REST endpoint.
Create a new Python script or use a Jupyter notebook:
import requests
import json
import numpy as np
# Prepare sample input data compatible with the model's input shape (1, 4)
# Note: Needs to be a list of lists for JSON serialization
input_data = np.random.rand(2, 4).tolist() # Create 2 samples
# The REST API endpoint format is:
# http://<host>:<port>/v1/models/<model_name>[:predict]
# Or http://<host>:<port>/v1/models/<model_name>/versions/<version>[:predict]
url = 'http://localhost:8501/v1/models/my_simple_classifier:predict'
# url = 'http://localhost:8501/v1/models/my_simple_classifier/versions/1:predict' # Also works
# The request payload must be a JSON object.
# For the default 'serving_default' signature (and many common cases),
# the key is "instances", and the value is a list of input examples.
data = json.dumps({"instances": input_data})
# Set the content type header
headers = {"content-type": "application/json"}
# Send the POST request
try:
response = requests.post(url, data=data, headers=headers)
response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx)
# Parse the JSON response
predictions = response.json()['predictions']
print("Request URL:", url)
print("Input Data Sent (first sample):", input_data[0])
print("\nResponse Status Code:", response.status_code)
print("Predictions Received (first sample):", predictions[0])
print(f"\nReceived {len(predictions)} predictions.")
except requests.exceptions.RequestException as e:
print(f"Error making request: {e}")
# If running in Docker, check container logs: docker logs <container_id>
except KeyError:
print("Error: 'predictions' key not found in response.")
print("Response content:", response.text) # Print raw response for debugging
When you run this script, it constructs a JSON payload containing your input data under the key "instances"
. It sends this payload via an HTTP POST request to the TensorFlow Serving endpoint. If successful, TF Serving processes the input using the loaded model and returns the predictions, which are then printed.
The output should look something like this (exact prediction values will vary):
Request URL: http://localhost:8501/v1/models/my_simple_classifier:predict
Input Data Sent (first sample): [0.123, 0.456, 0.789, 0.987]
Response Status Code: 200
Predictions Received (first sample): [0.25, 0.45, 0.3] # Example probabilities from softmax
Received 2 predictions.
Once you are finished experimenting, you can stop the TensorFlow Serving container. Find its ID using docker ps
and then stop it:
# Find the container ID
docker ps
# Stop the container (replace <container_id> with the actual ID)
docker stop <container_id>
# Optional: Remove the container
docker rm <container_id>
You can also remove the simple_model
directory you created earlier if you no longer need it.
This practical demonstrates the fundamental workflow of deploying a TensorFlow model using TF Serving. You saved a model in the SavedModel
format, used Docker to launch the serving container while mounting the model directory, and successfully queried the model's REST endpoint for predictions. This forms the basis for deploying more complex models into production environments. You can extend this by exploring TF Serving's configuration options, batching requests for higher throughput, or using the gRPC interface for potentially lower latency communication.
© 2025 ApX Machine Learning