Alright, let's put the theory of Bayesian Neural Networks into practice. In this section, we'll build, train, and evaluate a simple BNN using Variational Inference (VI). We'll focus on a regression task, which allows for intuitive visualization of the model's predictions and its associated uncertainty. We will use TensorFlow Probability (TFP), a library that integrates probabilistic reasoning and statistical analysis with TensorFlow.
You should have TensorFlow and TensorFlow Probability installed. If not, you can typically install them using pip:
pip install tensorflow tensorflow-probability matplotlib numpy
First, let's import the necessary libraries and generate some synthetic data for our regression problem. We'll create data where the relationship between the input x and output y is non-linear, with some added noise. This noise represents the aleatoric uncertainty.
import numpy as np
import tensorflow as tf
import tensorflow_probability as tfp
import matplotlib.pyplot as plt
import plotly.graph_objects as go
# For reproducibility
np.random.seed(42)
tf.random.set_seed(42)
tfd = tfp.distributions
tfk = tf.keras
tfkl = tf.keras.layers
tfpl = tfp.layers
# Generate synthetic data
def generate_data(n_samples=100, noise_std=0.1):
X = np.linspace(-3, 3, n_samples).astype(np.float32).reshape(-1, 1)
# Non-linear function with noise
y = X * np.sin(X * 2) + np.random.normal(0, noise_std, size=(n_samples, 1)).astype(np.float32)
return X, y
X_train, y_train = generate_data(n_samples=150, noise_std=0.2)
X_test = np.linspace(-4, 4, 200).astype(np.float32).reshape(-1, 1)
# Visualize the training data
fig = go.Figure()
fig.add_trace(go.Scatter(x=X_train.flatten(), y=y_train.flatten(), mode='markers', name='Training Data', marker=dict(color='#1f77b4', size=6)))
fig.update_layout(
title='Synthetic Regression Data',
xaxis_title='Input (x)',
yaxis_title='Output (y)',
template='plotly_white',
legend_title_text='Data'
)
# fig.show() # Use this in a Python environment to display
The training data follows the pattern y≈xsin(2x) with added Gaussian noise.
Now, we'll define our BNN using the Keras functional API and TFP layers. Specifically, we use tfp.layers.DenseVariational
. This layer represents a densely-connected neural network layer where weights and biases are distributions (our approximate posterior q(w)) rather than point estimates.
During training, this layer adds a KL divergence term to the model's loss. This term measures the difference between the learned approximate posterior q(w) and the prior p(w). The layer automatically handles the sampling needed for the forward pass and the calculation of this KL term as part of the VI objective (ELBO maximization, or equivalently, negative ELBO minimization).
We need to specify:
# Define the prior distribution for weights and biases
def prior_fn(kernel_size, bias_size, dtype=None):
n = kernel_size + bias_size
prior_model = tfk.Sequential([
tfpl.VariableLayer(tfpl.IndependentNormal.params_size(n), dtype=dtype),
tfpl.IndependentNormal(n, convert_to_tensor_fn=tfd.Distribution.sample)
])
return prior_model
# Define the posterior approximation strategy (mean-field Gaussian)
def posterior_fn(kernel_size, bias_size, dtype=None):
n = kernel_size + bias_size
posterior_model = tfk.Sequential([
tfpl.VariableLayer(tfpl.IndependentNormal.params_size(n), dtype=dtype),
tfpl.IndependentNormal(n, convert_to_tensor_fn=tfd.Distribution.sample)
])
return posterior_model
# Build the BNN model
def create_bnn_model(train_size):
inputs = tfkl.Input(shape=(1,))
hidden = tfpl.DenseVariational(
units=32,
make_prior_fn=prior_fn,
make_posterior_fn=posterior_fn,
kl_weight=1/train_size, # Scale KL divergence by dataset size
activation='relu'
)(inputs)
hidden = tfpl.DenseVariational(
units=16,
make_prior_fn=prior_fn,
make_posterior_fn=posterior_fn,
kl_weight=1/train_size,
activation='relu'
)(hidden)
# Output layer: Predicting mean of a Normal distribution
# We model the output y as y ~ Normal(loc=f(x), scale=sigma)
# Here, f(x) is the output of the DenseVariational layer
# We'll use a fixed standard deviation (sigma) for simplicity,
# effectively using Mean Squared Error as the negative log-likelihood.
# Alternatively, another output head could predict sigma (aleatoric uncertainty).
output_mean = tfpl.DenseVariational(
units=1, # Predicting the mean parameter
make_prior_fn=prior_fn,
make_posterior_fn=posterior_fn,
kl_weight=1/train_size
# No activation for regression output mean
)(hidden)
# For simplicity, we use MSE loss, corresponding to a fixed Gaussian likelihood std dev.
# A more complete BNN might also predict the std dev (scale).
# Example: output_scale = tfpl.DenseVariational(...) -> tf.exp(output_scale_raw)
# Then use tfp.layers.IndependentNormal(1) as the final layer.
model = tfk.Model(inputs=inputs, outputs=output_mean)
return model
bnn_model = create_bnn_model(train_size=len(X_train))
bnn_model.summary()
We scale the KL divergence term by 1 / train_size
. This is common practice in VI for BNNs, balancing the data fit (likelihood) term and the regularization (KL divergence) term in the objective function.
For VI, the objective is to maximize the Evidence Lower Bound (ELBO), which is equivalent to minimizing the negative ELBO. The negative ELBO can be written as:
−ELBO=−Eq(w)[logp(D∣w)]+KL[q(w)∣∣p(w)]The first term is the expected negative log-likelihood of the data given the parameters sampled from the approximate posterior. The second term is the KL divergence between the approximate posterior and the prior.
When using Keras with DenseVariational
, the KL divergence term is automatically added to the model's loss. We only need to specify the negative log-likelihood term as our main loss function. For regression with assumed Gaussian noise (constant variance), the negative log-likelihood is proportional to the Mean Squared Error (MSE).
# Define the negative log-likelihood loss function (MSE for Gaussian likelihood)
def nll_loss(y_true, y_pred_distribution):
# For DenseVariational, y_pred_distribution is just the predicted mean here.
# A more complete model would output a tfd.Distribution.
# return -y_pred_distribution.log_prob(y_true) # If output layer was tfp.layers.IndependentNormal
return tf.reduce_mean(tf.square(y_true - y_pred_distribution))
# Compile the model
optimizer = tfk.optimizers.Adam(learning_rate=0.01)
bnn_model.compile(optimizer=optimizer, loss=nll_loss) # Keras adds KL divergence automatically
# Train the model
print("Starting training...")
history = bnn_model.fit(X_train, y_train, epochs=500, batch_size=32, verbose=0)
print("Training finished.")
# You can plot the loss curve (total loss = NLL + KL divergence)
# plt.plot(history.history['loss'])
# plt.title('Model Loss During Training')
# plt.xlabel('Epoch')
# plt.ylabel('Total Loss (-ELBO)')
# plt.show()
A key advantage of BNNs is their ability to quantify uncertainty. With VI, we approximate the posterior p(w∣D) with q(w). To get predictive uncertainty, we perform multiple forward passes through the network, each time sampling a different set of weights wi∼q(w). The variation in the outputs reflects the model's epistemic uncertainty (uncertainty about the model parameters).
# Make predictions by sampling multiple times
n_samples = 100
predictions_mc = np.stack([bnn_model(X_test).numpy() for _ in range(n_samples)], axis=0)
# Squeeze unnecessary dimensions
predictions_mc = np.squeeze(predictions_mc) # Shape: (n_samples, n_test_points)
# Calculate predictive mean and standard deviation
pred_mean = np.mean(predictions_mc, axis=0)
pred_std = np.std(predictions_mc, axis=0)
# Visualize the results: mean prediction and uncertainty bounds
fig = go.Figure()
# Uncertainty bounds (e.g., +/- 2 standard deviations)
fig.add_trace(go.Scatter(
x=np.concatenate([X_test.flatten(), X_test.flatten()[::-1]]),
y=np.concatenate([pred_mean - 2 * pred_std, (pred_mean + 2 * pred_std)[::-1]]),
fill='toself',
fillcolor='rgba(250, 82, 82, 0.2)', # Faint red color #fa5252
line=dict(color='rgba(255,255,255,0)'),
hoverinfo="skip",
showlegend=False,
name='Epistemic Uncertainty (±2 std)'
))
# Mean prediction
fig.add_trace(go.Scatter(
x=X_test.flatten(), y=pred_mean,
mode='lines', name='Predictive Mean',
line=dict(color='#f03e3e') # Red color #f03e3e
))
# Original training data
fig.add_trace(go.Scatter(
x=X_train.flatten(), y=y_train.flatten(),
mode='markers', name='Training Data',
marker=dict(color='#1c7ed6', size=6) # Blue color #1c7ed6
))
fig.update_layout(
title='BNN Regression with Uncertainty',
xaxis_title='Input (x)',
yaxis_title='Output (y)',
template='plotly_white',
legend_title_text='Components'
)
# fig.show() # Use this in a Python environment to display
BNN predictive mean (red line) captures the underlying trend, while the shaded area (±2 standard deviations from the mean) represents epistemic uncertainty. Notice the uncertainty increases in regions with no training data (e.g., x<−3 or x>3) and also where the function changes rapidly.
As discussed previously, Monte Carlo (MC) Dropout offers a simpler way to approximate Bayesian inference in existing standard NNs. It involves:
While computationally cheaper and easier to implement in standard frameworks, MC Dropout is an approximation to a specific type of BNN (related to Gaussian Processes). The VI approach we implemented is often considered a more principled way to construct BNNs with explicit priors and posteriors.
In this practical section, we constructed a Bayesian Neural Network using TensorFlow Probability's DenseVariational
layers. We trained it using Variational Inference, where the objective function balanced fitting the data (via negative log-likelihood/MSE) and adhering to prior beliefs (via KL divergence). By sampling from the learned approximate posterior distribution of weights, we generated predictions along with quantifiable epistemic uncertainty estimates.
This example provides a foundation for applying BNNs. You could extend this by:
Building BNNs provides a powerful framework for creating deep learning models that not only predict but also understand their own confidence.
© 2025 ApX Machine Learning