Practice applying machine learning principles by compiling a model, training it with model.fit(), evaluating its performance, making predictions, and using callbacks for better training management and monitoring. A simple classifier is built and trained on a standard dataset like Fashion-MNIST, integrating ModelCheckpoint, EarlyStopping, and TensorBoard.This practice assumes you have TensorFlow installed and are familiar with basic NumPy operations. We'll use the Fashion-MNIST dataset, which is conveniently available through tf.keras.datasets.1. Load and Prepare DataFirst, load the Fashion-MNIST dataset and preprocess it. Preprocessing typically involves normalization (scaling pixel values) and reshaping the data if necessary for the model's input layer. We also need to one-hot encode the labels for categorical crossentropy loss.import tensorflow as tf import numpy as np import os import datetime # Load Fashion MNIST dataset (x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data() # Normalize pixel values to be between 0 and 1 x_train = x_train.astype('float32') / 255.0 x_test = x_test.astype('float32') / 255.0 # Reshape input data if using a Dense network first (add channel dimension for Conv later) # For Dense layers, we flatten the images x_train_flat = x_train.reshape((x_train.shape[0], 28 * 28)) x_test_flat = x_test.reshape((x_test.shape[0], 28 * 28)) # One-hot encode the labels num_classes = 10 y_train_cat = tf.keras.utils.to_categorical(y_train, num_classes) y_test_cat = tf.keras.utils.to_categorical(y_test, num_classes) print(f"x_train shape: {x_train_flat.shape}") # Shape: (60000, 784) print(f"y_train shape: {y_train_cat.shape}") # Shape: (60000, 10) print(f"x_test shape: {x_test_flat.shape}") # Shape: (10000, 784) print(f"y_test shape: {y_test_cat.shape}") # Shape: (10000, 10)2. Build a Simple Keras ModelWe'll define a straightforward sequential model with a few dense layers.def build_simple_model(input_shape, num_classes): model = tf.keras.models.Sequential([ tf.keras.layers.Input(shape=input_shape), # Use Input layer for clarity tf.keras.layers.Dense(128, activation='relu', name='dense_1'), tf.keras.layers.Dropout(0.3, name='dropout_1'), # Add dropout for regularization tf.keras.layers.Dense(64, activation='relu', name='dense_2'), tf.keras.layers.Dense(num_classes, activation='softmax', name='output') ]) return model input_shape = (28 * 28,) # Flattened image shape model = build_simple_model(input_shape, num_classes) model.summary()3. Compile the ModelNext, compile the model, specifying the optimizer, loss function, and metrics to track during training.model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss=tf.keras.losses.CategoricalCrossentropy(), metrics=['accuracy']) print("Model compiled.")4. Configure CallbacksNow, let's set up the callbacks we discussed:ModelCheckpoint: To save the best model weights observed during training based on validation accuracy.EarlyStopping: To halt training if the validation loss stops improving, preventing overfitting.TensorBoard: To log metrics and graph structure for visualization.# Define log directory for TensorBoard (unique for each run) log_dir = os.path.join("logs", "fit", datetime.datetime.now().strftime("%Y%m%d-%H%M%S")) print(f"TensorBoard log directory: {log_dir}") # Define checkpoint path and filename checkpoint_filepath = 'models/best_fashion_mnist_model.weights.h5' os.makedirs(os.path.dirname(checkpoint_filepath), exist_ok=True) print(f"Model checkpoints will be saved to: {checkpoint_filepath}") # Create callbacks tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1) model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint( filepath=checkpoint_filepath, save_weights_only=True, # Save only weights monitor='val_accuracy', # Monitor validation accuracy mode='max', # Save the model with max validation accuracy save_best_only=True) # Only save if it's the 'best' so far early_stopping_callback = tf.keras.callbacks.EarlyStopping( monitor='val_loss', # Monitor validation loss patience=10, # Number of epochs with no improvement after which training will be stopped verbose=1, # Print messages when stopping restore_best_weights=True) # Restore model weights from the epoch with the best value of the monitored quantity. callbacks_list = [tensorboard_callback, model_checkpoint_callback, early_stopping_callback]Note: restore_best_weights=True in EarlyStopping is convenient as the model object will hold the best weights automatically when training stops. If set to False, you'd typically load the weights saved by ModelCheckpoint manually after training.5. Train the Model with CallbacksWe are now ready to train the model using model.fit(). We'll pass the training data, specify the number of epochs, batch size, validation data (using a split of the training set), and our list of callbacks.print("Starting model training...") batch_size = 64 epochs = 50 # Set a higher number, EarlyStopping will likely stop it sooner history = model.fit(x_train_flat, y_train_cat, epochs=epochs, batch_size=batch_size, validation_split=0.2, # Use 20% of training data for validation callbacks=callbacks_list, verbose=1) # Set verbose=1 or 2 to see progress per epoch print("Model training finished.")During training, you'll see output for each epoch, including loss and accuracy for both the training and validation sets. Pay attention to messages from EarlyStopping if it halts the training run. ModelCheckpoint will silently save the best weights whenever validation accuracy improves.6. Monitor Training with TensorBoardWhile the model is training (or after it finishes), you can launch TensorBoard to visualize the metrics. Open your terminal or command prompt, navigate to the directory containing the logs folder (or provide the full path), and run:tensorboard --logdir logs/fitTensorBoard will typically start a web server on http://localhost:6006. Open this URL in your browser. You should see:Scalars: Plots of loss and accuracy over epochs for both training and validation. This helps identify overfitting (validation loss increasing while training loss decreases) or assess convergence.Graphs: A visualization of your Keras model architecture.Histograms/Distributions: Visualizations of weights and biases over time (if histogram_freq was set).{"layout": {"title": "Training & Validation Loss", "xaxis": {"title": "Epoch"}, "yaxis": {"title": "Loss"}, "width": 600, "height": 400}, "data": [{"x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], "y": [0.8, 0.6, 0.5, 0.45, 0.4, 0.37, 0.35, 0.33, 0.31, 0.3, 0.29, 0.28, 0.27, 0.26, 0.25, 0.24, 0.23, 0.22, 0.21, 0.2], "mode": "lines", "name": "Training Loss", "line": {"color": "#339af0"}}, {"x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20], "y": [0.7, 0.55, 0.48, 0.44, 0.42, 0.41, 0.40, 0.39, 0.385, 0.38, 0.38, 0.385, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46], "mode": "lines", "name": "Validation Loss", "line": {"color": "#fd7e14"}}]}Training and validation loss curves as might be seen in TensorBoard. Note how validation loss starts to plateau or increase while training loss continues to decrease, indicating potential overfitting. Early stopping would halt training around epoch 10-12.7. Evaluate the ModelAfter training, evaluate the model's performance on the unseen test set using model.evaluate(). Since we used restore_best_weights=True in EarlyStopping, the model object already contains the weights from the epoch with the best validation loss. If you hadn't used that option or wanted to load specifically from the ModelCheckpoint file, you would first load the weights:# Optional: Load the best weights saved by ModelCheckpoint # model.load_weights(checkpoint_filepath) # print("Loaded best weights from checkpoint.") print("Evaluating model on test data...") test_loss, test_acc = model.evaluate(x_test_flat, y_test_cat, verbose=0) print(f"\nTest Loss: {test_loss:.4f}") print(f"Test Accuracy: {test_acc:.4f}")This gives you the final performance metrics on data the model never saw during training or validation.8. Generate PredictionsFinally, use model.predict() to get predictions on new data (we'll use a few examples from the test set here). The output of a softmax layer gives probabilities for each class.# Get predictions for the first 5 test images predictions = model.predict(x_test_flat[:5]) # Print the predicted class probabilities for each image print("\nPredictions (Probabilities) for first 5 test images:") print(predictions) # Get the class with the highest probability for each image predicted_classes = np.argmax(predictions, axis=1) print("\nPredicted Classes:", predicted_classes) # Get the actual classes for comparison actual_classes = y_test[:5] print("Actual Classes: ", actual_classes)The output shows the probability distribution across the 10 classes for each of the 5 input images, followed by the index (class label) with the highest probability.This walkthrough demonstrates the standard workflow for training a Keras model in TensorFlow. You compiled the model with necessary components, used model.fit along with callbacks to manage the training process efficiently (saving the best model, stopping early, logging for visualization), evaluated the final performance, and made predictions. These steps form the core process you'll adapt for various machine learning tasks.