After selecting an appropriate loss function and optimizer, the final component needed to configure your model's learning process via model.compile()
is the specification of metrics. While the loss function guides the optimization process (it's what the optimizer tries to minimize), metrics are used to monitor and evaluate the performance of your model during training and testing. They don't directly influence the weight updates, but they provide valuable insights into how well the model is actually performing on the task it's designed for. Think of the loss as the guide for the climber (the optimizer), and metrics as the altimeter and GPS readings telling you how high you've climbed and if you're heading towards the summit.
You specify metrics using the metrics
argument in model.compile()
, typically passing a list of strings corresponding to built-in Keras metrics or instances of metric classes.
import tensorflow as tf
# Assume 'model' is a defined Keras model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy', tf.keras.metrics.Precision(name='prec'), 'mae'])
In this example, during training and evaluation, Keras will compute and report not only the sparse categorical crossentropy loss but also the accuracy, precision (specifically named 'prec'), and mean absolute error.
The choice of metrics depends heavily on your specific machine learning task (e.g., classification, regression) and what aspects of performance are most important.
For classification problems, where the goal is to assign inputs to predefined categories, common metrics include:
'accuracy'
or tf.keras.metrics.Accuracy
): This is often the most intuitive metric. It measures the proportion of predictions that the model got right.
Accuracy=Total Number of PredictionsNumber of Correct Predictions
While simple, accuracy can be misleading, especially for datasets with imbalanced classes. For example, if 95% of your data belongs to class A and 5% to class B, a model that always predicts class A will achieve 95% accuracy but is useless for identifying class B.'precision'
or tf.keras.metrics.Precision
): Measures the accuracy of positive predictions. Out of all instances the model predicted as positive, what fraction actually were positive?
Precision=True Positives+False PositivesTrue Positives
High precision is important when the cost of a false positive is high (e.g., classifying a non-spam email as spam).'recall'
or tf.keras.metrics.Recall
): Also known as sensitivity or true positive rate. Out of all actual positive instances, what fraction did the model correctly identify?
Recall=True Positives+False NegativesTrue Positives
High recall is important when the cost of a false negative is high (e.g., failing to detect a fraudulent transaction).'auc'
or tf.keras.metrics.AUC
): The ROC curve plots the True Positive Rate (Recall) against the False Positive Rate at various classification thresholds. AUC represents the degree or measure of separability; it tells how much the model is capable of distinguishing between classes. An AUC of 1.0 indicates a perfect classifier, while an AUC of 0.5 suggests performance no better than random guessing. It's particularly useful for evaluating binary classifiers on imbalanced datasets.tf.keras.metrics.CategoricalAccuracy
) vs. Sparse Categorical Accuracy (tf.keras.metrics.SparseCategoricalAccuracy
): Similar to their loss function counterparts, you choose these based on the format of your true labels. Use CategoricalAccuracy
if your labels are one-hot encoded (e.g., [0, 1, 0]
) and SparseCategoricalAccuracy
if they are integer indices (e.g., 1
). Using the simple string 'accuracy'
often automatically selects the appropriate one based on the loss function used, but explicit specification can be clearer.For regression problems, where the goal is to predict a continuous value, common metrics include:
'mae'
or tf.keras.metrics.MeanAbsoluteError
): Calculates the average of the absolute differences between the true values (yi) and the predicted values (y^i).
MAE=n1i=1∑n∣yi−y^i∣
MAE is easily interpretable as it represents the average prediction error in the original units of the target variable. It's less sensitive to outliers than MSE.'mse'
or tf.keras.metrics.MeanSquaredError
): Calculates the average of the squared differences between the true and predicted values.
MSE=n1i=1∑n(yi−y^i)2
MSE penalizes larger errors more heavily than smaller errors due to the squaring. Its units are the square of the target variable's units, which can make direct interpretation harder. It's often used as a loss function but can also serve as a monitoring metric.tf.keras.metrics.RootMeanSquaredError
): This is simply the square root of the MSE.
RMSE=MSE=n1i=1∑n(yi−y^i)2
RMSE has the advantage of being in the same units as the target variable, making it more interpretable than MSE, while still penalizing large errors significantly.Instead of using string identifiers like 'accuracy'
, you can instantiate metric classes directly from tf.keras.metrics
. This provides more flexibility, such as setting custom names that appear in logs and TensorBoard, or configuring specific metric parameters (like thresholds for Precision or Recall).
model.compile(optimizer='rmsprop',
loss='binary_crossentropy',
metrics=[
tf.keras.metrics.BinaryAccuracy(name='acc'),
tf.keras.metrics.Precision(name='precision', thresholds=0.6),
tf.keras.metrics.Recall(name='recall', thresholds=0.6)
])
Here, we use BinaryAccuracy
(appropriate for binary classification) and set a specific classification threshold of 0.6 for calculating precision and recall.
If the built-in metrics don't cover your specific evaluation needs, TensorFlow allows you to define your own custom metrics. A custom metric is typically a Python function that accepts y_true
(true labels) and y_pred
(model predictions) as arguments and returns a scalar tensor representing the metric value.
def my_custom_metric(y_true, y_pred):
# Example: Calculate squared difference, but only for positive true values
y_true = tf.cast(y_true, tf.float32)
y_pred = tf.cast(y_pred, tf.float32)
mask = tf.cast(y_true > 0, tf.float32)
squared_diff = tf.square(y_true - y_pred) * mask
# Avoid division by zero if no positive true values exist
return tf.reduce_sum(squared_diff) / (tf.reduce_sum(mask) + 1e-7)
# ... inside model definition ...
model.compile(optimizer='adam',
loss='mse',
metrics=['mae', my_custom_metric]) # Pass the function directly
When compiling the model, you can simply pass your custom metric function (or an instance of a custom tf.keras.metrics.Metric
subclass for stateful metrics) in the metrics
list.
Selecting and monitoring appropriate metrics is fundamental for understanding your model's behavior and performance. The metrics reported during model.fit()
and model.evaluate()
provide the quantitative feedback needed to iterate on your model architecture, hyperparameters, and data processing strategies.
© 2025 ApX Machine Learning