Having trained a model, the next logical step is making it available to serve predictions. TensorFlow Serving is a high-performance system specifically designed for this purpose. In this hands-on exercise, we'll walk through the process of saving a trained TensorFlow model and deploying it locally using TensorFlow Serving running inside a Docker container. We will then interact with the deployed model using its REST API.This practical assumes you have Docker installed and running on your system, along with a working Python environment with TensorFlow installed.1. Prepare and Save a ModelFirst, let's create a very simple Keras model. For this example, we don't need a complex architecture; the focus is on the deployment mechanics.import tensorflow as tf import numpy as np import os import shutil # For directory cleanup # Define a simple model def create_simple_model(): model = tf.keras.Sequential([ tf.keras.layers.Dense(10, activation='relu', input_shape=(4,)), tf.keras.layers.Dense(10, activation='relu'), tf.keras.layers.Dense(3, activation='softmax') # Example output shape ]) # Compile is needed for saving signatures, but we won't train here model.compile(optimizer='adam', loss='sparse_categorical_crossentropy') return model model = create_simple_model() # Let's generate some dummy data just to show input/output shapes print("Model Summary:") model.summary() dummy_input = np.random.rand(1, 4) print(f"\nDummy input shape: {dummy_input.shape}") dummy_output = model.predict(dummy_input) print(f"Dummy output shape: {dummy_output.shape}") print(f"Dummy output: {dummy_output}") # --- Saving the Model --- # Define the path where the model will be saved. # TF Serving expects models to be in versioned directories. model_dir = 'simple_model' version = 1 export_path = os.path.join(model_dir, str(version)) # Clean up if the directory already exists if os.path.isdir(export_path): print(f"Removing existing directory: {export_path}") shutil.rmtree(model_dir) print(f"\nSaving model to: {export_path}") # Save the model in TensorFlow's SavedModel format # This format includes the model architecture, weights, and serving signatures. tf.keras.models.save_model( model, export_path, overwrite=True, include_optimizer=True, # Optional, but good practice signatures=None, # Keras automatically generates a default 'serving_default' signature options=None ) print(f"\nModel saved successfully!") print(f"Directory structure under {model_dir}:") for root, dirs, files in os.walk(model_dir): indent = ' ' * 4 * (root.count(os.sep) - model_dir.count(os.sep)) print(f"{indent}{os.path.basename(root)}/") file_indent = ' ' * 4 * (root.count(os.sep) - model_dir.count(os.sep) + 1) for f in files: print(f"{file_indent}{f}") Executing this code creates a directory named simple_model containing a subdirectory 1 (the version). Inside 1, you'll find the saved_model.pb file defining the computation graph, along with subdirectories like variables (containing the model weights) and potentially assets. This SavedModel format is precisely what TensorFlow Serving needs.2. Launch TensorFlow Serving via DockerNow, we'll use Docker to run the official TensorFlow Serving image and point it to our saved model. Open your terminal or command prompt.First, ensure you have the latest serving image:docker pull tensorflow/servingNext, run the container. You need to replace /path/to/your/simple_model with the absolute path to the simple_model directory you just created on your host machine.# Make sure you are in the directory *containing* the 'simple_model' folder # or provide the full absolute path to 'simple_model' # Example using absolute path (replace with your actual path): # On Linux/macOS: # docker run -p 8501:8501 --mount type=bind,source=/home/user/my_projects/advanced_tf/simple_model,target=/models/my_simple_classifier -e MODEL_NAME=my_simple_classifier -t tensorflow/serving # On Windows (using PowerShell): # docker run -p 8501:8501 --mount type=bind,source=C:\Users\YourUser\MyProjects\advanced_tf\simple_model,target=/models/my_simple_classifier -e MODEL_NAME=my_simple_classifier -t tensorflow/serving # Example using current directory (run from the parent of 'simple_model'): # On Linux/macOS: docker run -p 8501:8501 --mount type=bind,source=$(pwd)/simple_model,target=/models/my_simple_classifier -e MODEL_NAME=my_simple_classifier -t tensorflow/serving & # On Windows (using PowerShell): # docker run -p 8501:8501 --mount type=bind,source=${PWD}/simple_model,target=/models/my_simple_classifier -e MODEL_NAME=my_simple_classifier -t tensorflow/servingLet's break down this command:docker run: Starts a new container.-p 8501:8501: Maps port 8501 on your host machine to port 8501 inside the container. This is the default port for TF Serving's REST API.--mount type=bind,source=<host_path>,target=/models/my_simple_classifier: This is the critical part. It makes the simple_model directory from your host machine (the source) available inside the container at the path /models/my_simple_classifier (the target). TensorFlow Serving is configured by default to look for models inside the /models directory within the container. We are naming our model my_simple_classifier within the serving environment.-e MODEL_NAME=my_simple_classifier: This environment variable explicitly tells TensorFlow Serving which model to load from the /models directory. The name must match the subdirectory name used in the target part of the --mount option.-t tensorflow/serving: Specifies the Docker image to use.& (Linux/macOS): Runs the container in the background (optional).After running the command, Docker will pull the image if you don't have it locally and then start the container. You should see log output from TensorFlow Serving indicating it's looking for models and hopefully loading my_simple_classifier successfully. Look for lines similar to:... Successfully loaded servable version {name: my_simple_classifier version: 1} ... ... Running gRPC ModelServer at 0.0.0.0:8500 ... ... Exporting HTTP/REST API at 0.0.0.0:8501 ...If you see errors, double-check the source path in the --mount command; it must be the correct absolute path to the directory containing the version subdirectory (1).3. Send Inference Requests via REST APIWith TF Serving running and the model loaded, we can now send prediction requests. We'll use Python's requests library to interact with the REST endpoint.Create a new Python script or use a Jupyter notebook:import requests import json import numpy as np # Prepare sample input data compatible with the model's input shape (1, 4) # Note: Needs to be a list of lists for JSON serialization input_data = np.random.rand(2, 4).tolist() # Create 2 samples # The REST API endpoint format is: # http://<host>:<port>/v1/models/<model_name>[:predict] # Or http://<host>:<port>/v1/models/<model_name>/versions/<version>[:predict] url = 'http://localhost:8501/v1/models/my_simple_classifier:predict' # url = 'http://localhost:8501/v1/models/my_simple_classifier/versions/1:predict' # Also works # The request payload must be a JSON object. # For the default 'serving_default' signature (and many common cases), # the key is "instances", and the value is a list of input examples. data = json.dumps({"instances": input_data}) # Set the content type header headers = {"content-type": "application/json"} # Send the POST request try: response = requests.post(url, data=data, headers=headers) response.raise_for_status() # Raise an exception for bad status codes (4xx or 5xx) # Parse the JSON response predictions = response.json()['predictions'] print("Request URL:", url) print("Input Data Sent (first sample):", input_data[0]) print("\nResponse Status Code:", response.status_code) print("Predictions Received (first sample):", predictions[0]) print(f"\nReceived {len(predictions)} predictions.") except requests.exceptions.RequestException as e: print(f"Error making request: {e}") # If running in Docker, check container logs: docker logs <container_id> except KeyError: print("Error: 'predictions' not found in response.") print("Response content:", response.text) # Print raw response for debugging When you run this script, it constructs a JSON payload containing your input data under the "instances". It sends this payload via an HTTP POST request to the TensorFlow Serving endpoint. If successful, TF Serving processes the input using the loaded model and returns the predictions, which are then printed.The output should look something like this (exact prediction values will vary):Request URL: http://localhost:8501/v1/models/my_simple_classifier:predict Input Data Sent (first sample): [0.123, 0.456, 0.789, 0.987] Response Status Code: 200 Predictions Received (first sample): [0.25, 0.45, 0.3] # Example probabilities from softmax Received 2 predictions.4. CleanupOnce you are finished experimenting, you can stop the TensorFlow Serving container. Find its ID using docker ps and then stop it:# Find the container ID docker ps # Stop the container (replace <container_id> with the actual ID) docker stop <container_id> # Optional: Remove the container docker rm <container_id>You can also remove the simple_model directory you created earlier if you no longer need it.This practical demonstrates the fundamental workflow of deploying a TensorFlow model using TF Serving. You saved a model in the SavedModel format, used Docker to launch the serving container while mounting the model directory, and successfully queried the model's REST endpoint for predictions. This forms the basis for deploying more complex models into production environments. You can extend this by exploring TF Serving's configuration options, batching requests for higher throughput, or using the gRPC interface for potentially lower latency communication.