Alright, let's put the concepts from this chapter into practice. You've learned about the different ways to save your model's progress, which is fundamental for any realistic machine learning workflow. Whether you need to recover from an interruption, deploy a finished model, or simply save the best version during a long training run, knowing how to save and load effectively is essential.
In this practice section, we'll walk through common scenarios:
ModelCheckpoint
to automatically save weights during training.We'll use a simple model and synthetic data so we can focus purely on the mechanics of saving and loading.
First, let's import TensorFlow and other necessary libraries, and generate some simple data for a binary classification problem.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import os
import shutil # For cleaning up saved files
print(f"Using TensorFlow version: {tf.__version__}")
# Generate synthetic data
def generate_data(num_samples=1000):
# Simple 2D features, linearly separable for simplicity
np.random.seed(42)
X = np.random.rand(num_samples, 2) * 10 - 5
# Simple linear boundary: y > 0.5*x - 1
y = (X[:, 1] > 0.5 * X[:, 0] - 1).astype(int)
return X, y
X_train, y_train = generate_data(1000)
X_val, y_val = generate_data(200)
# Define a simple Sequential model
def build_model():
model = keras.Sequential(
[
layers.Dense(16, activation="relu", input_shape=(2,)),
layers.Dense(8, activation="relu"),
layers.Dense(1, activation="sigmoid"), # Binary classification
]
)
model.compile(optimizer="adam",
loss="binary_crossentropy",
metrics=["accuracy"])
return model
# Create directories for saving models/weights
checkpoint_dir = "./training_checkpoints"
saved_model_dir = "./saved_model"
# Clean up previous runs if they exist
if os.path.exists(checkpoint_dir):
shutil.rmtree(checkpoint_dir)
if os.path.exists(saved_model_dir):
shutil.rmtree(saved_model_dir)
os.makedirs(checkpoint_dir)
# saved_model_dir will be created by model.save()
The ModelCheckpoint
callback is incredibly useful for automatically saving your model during training. You can configure it to save only the weights or the entire model, and decide whether to save at every epoch or only when performance improves. Here, we'll save only the weights whenever the validation loss improves.
model = build_model()
# Configure the ModelCheckpoint callback
# We'll save weights only, based on validation loss
# The filename includes the epoch number and validation loss
checkpoint_path = os.path.join(checkpoint_dir, "ckpt_epoch_{epoch:02d}_val_loss_{val_loss:.2f}.weights.h5")
checkpoint_callback = keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
save_weights_only=True, # Save only the model's weights
monitor='val_loss', # Monitor validation loss
mode='min', # Save when validation loss decreases
save_best_only=True, # Only save the 'best' model seen so far
verbose=1 # Print messages when saving
)
print("Starting training with ModelCheckpoint callback...")
history = model.fit(
X_train,
y_train,
epochs=10,
batch_size=32,
validation_data=(X_val, y_val),
callbacks=[checkpoint_callback],
verbose=0 # Set to 0 to avoid cluttering output, verbose=1 in callback shows saving
)
print("\nTraining finished.")
print(f"Checkpoints saved in: {checkpoint_dir}")
print("Files:", os.listdir(checkpoint_dir))
# Find the latest checkpoint (which should be the best one due to save_best_only=True)
latest_checkpoint = tf.train.latest_checkpoint(checkpoint_dir)
print(f"\nLatest (best) checkpoint found: {latest_checkpoint}")
You should see output indicating that checkpoints were saved when the validation loss improved. The tf.train.latest_checkpoint
utility helps find the path to the most recently saved checkpoint file in a directory, which corresponds to the best performing model in our case because we set save_best_only=True
.
Now, imagine your training was interrupted, or you simply want to use the best weights you saved. You need to:
model.load_weights()
.# Build a new, untrained model instance with the same architecture
new_model = build_model()
# Evaluate the untrained model (should have poor performance)
print("\nEvaluating the new, untrained model:")
loss_untrained, acc_untrained = new_model.evaluate(X_val, y_val, verbose=0)
print(f"Untrained model - Loss: {loss_untrained:.4f}, Accuracy: {acc_untrained:.4f}")
# Load the weights from the best checkpoint saved earlier
if latest_checkpoint:
print(f"\nLoading weights from: {latest_checkpoint}")
new_model.load_weights(latest_checkpoint)
# Evaluate the model with loaded weights (should have good performance)
print("Evaluating the model with loaded weights:")
loss_loaded, acc_loaded = new_model.evaluate(X_val, y_val, verbose=0)
print(f"Model with loaded weights - Loss: {loss_loaded:.4f}, Accuracy: {acc_loaded:.4f}")
else:
print("\nNo checkpoint found to load.")
Notice the significant improvement in accuracy after loading the weights compared to the freshly initialized new_model
. This confirms that the learned parameters were successfully restored. Remember, load_weights
only restores the parameters; it doesn't restore the optimizer's state.
Saving only weights is useful, but sometimes you need the whole package: architecture, weights, and the optimizer's state (e.g., to resume training exactly where you left off). The model.save()
method handles this, saving everything into a directory using the TensorFlow SavedModel
format.
# Let's assume 'model' is the trained model from step 1
# Or we could use 'new_model' which has the loaded weights
print(f"\nSaving the entire model to: {saved_model_dir}")
model.save(saved_model_dir) # Use the originally trained model instance
print("Model saved successfully.")
print("Contents of the saved model directory:")
# List the contents to show the SavedModel structure
for item in os.listdir(saved_model_dir):
print(f"- {item}")
Executing model.save()
creates a directory containing files like saved_model.pb
(the graph definition and metadata), a variables
directory (containing the weights), and possibly an assets
directory. This format is language-neutral and suitable for serving models via TensorFlow Serving or using them in other TensorFlow environments.
Loading a SavedModel
is straightforward using tf.keras.models.load_model()
. This restores the architecture, weights, and optimizer state, making the model ready for inference or continued training.
print(f"\nLoading the entire model from: {saved_model_dir}")
loaded_full_model = tf.keras.models.load_model(saved_model_dir)
# Verify the loaded model's architecture
print("\nLoaded model summary:")
loaded_full_model.summary()
# Evaluate the loaded model to confirm it performs as expected
print("\nEvaluating the loaded full model:")
loss_full, acc_full = loaded_full_model.evaluate(X_val, y_val, verbose=0)
print(f"Loaded full model - Loss: {loss_full:.4f}, Accuracy: {acc_full:.4f}")
# You can also make predictions directly
print("\nMaking a prediction with the loaded model:")
sample_prediction = loaded_full_model.predict(X_val[:5]) # Predict on first 5 validation samples
print("Predictions:", sample_prediction.flatten())
print("Actual labels:", y_val[:5])
The loaded model performs identically to the one we saved, and we didn't need to rebuild the architecture or compile it again (though recompiling might be desired if you want to change the optimizer or metrics for further training).
Because model.save()
also saves the optimizer's state, you can seamlessly resume training. TensorFlow will pick up where it left off, including the learning rate schedule and other optimizer parameters like momentum.
# Resume training the loaded model for a few more epochs
print("\nResuming training on the loaded model...")
history_resumed = loaded_full_model.fit(
X_train,
y_train,
epochs=5, # Train for 5 more epochs
initial_epoch=history.epoch[-1] + 1, # Start epoch numbering correctly
batch_size=32,
validation_data=(X_val, y_val),
verbose=1
)
print("\nResumed training finished.")
This demonstrates how loading a full SavedModel
allows you to continue the training process precisely, which is invaluable for long-running experiments.
In this practice session, you've worked through the essential workflows for saving and loading models in TensorFlow/Keras:
ModelCheckpoint
Callback: Ideal for automatically saving the best weights (or full models) during training runs, providing fault tolerance and capturing optimal states.model.load_weights()
: Used to restore learned parameters into a model instance that has the same architecture. Useful when you only need the weights, like for transfer learning or inference when you rebuild the model structure yourself.model.save()
: Saves the entire model (architecture, weights, optimizer state) in the SavedModel
format. This is the standard way to save a model for deployment or for resuming training later.tf.keras.models.load_model()
: Loads a model previously saved using model.save()
, restoring its complete state.Mastering these techniques ensures that your training efforts are preserved and your models are ready for evaluation, deployment, or further development.
© 2025 ApX Machine Learning