Now that you're familiar with the building blocks of neural networks in Flux.jl, including layers, activation functions, loss functions, and optimizers, it's time to put this knowledge into practice. This section guides you through building and training your first simple neural network for a binary classification task. We'll generate a synthetic dataset, define a model, train it, and evaluate its performance.
Before we begin, ensure you have Flux.jl installed. If you plan to visualize the data or results, Plots.jl
is a useful addition. You can add them using Julia's package manager:
using Pkg
Pkg.add("Flux")
Pkg.add("Plots") # Optional, for visualization
Pkg.add("Random") # For data generation
For this exercise, we will use these packages:
using Flux
using Random
using Plots # Optional, if you want to run plotting code locally
println("Flux and supporting packages loaded.")
We'll tackle a binary classification problem: distinguishing between two classes of data points in a 2D space that are not linearly separable. This type of problem is a good fit for neural networks. We'll generate a "moons" dataset, which consists of two interleaving crescent shapes.
Let's create a function to generate this data.
Random.seed!(42) # for reproducibility
function generate_moons(n_samples::Int=200, noise::Float64=0.15)
n_samples_per_moon = n_samples ÷ 2
# Outer moon
t_outer = range(0, stop=pi, length=n_samples_per_moon)
x_outer = 1.0 .* cos.(t_outer) .+ randn(n_samples_per_moon) .* noise
y_outer = 1.0 .* sin.(t_outer) .+ randn(n_samples_per_moon) .* noise
points_outer = [x_outer'; y_outer']
# Inner moon (shifted)
t_inner = range(0, stop=pi, length=n_samples_per_moon)
x_inner = 1.0 .* cos.(t_inner) .- 0.5 .+ randn(n_samples_per_moon) .* noise
y_inner = -1.0 .* sin.(t_inner) .+ 0.3 .+ randn(n_samples_per_moon) .* noise # Adjusted shift
points_inner = [x_inner'; y_inner']
# Combine features and labels
X = hcat(points_outer, points_inner) # Features: 2xN matrix
Y_labels = vcat(zeros(Int, n_samples_per_moon), ones(Int, n_samples_per_moon)) # Labels: N vector
# Shuffle the data
perm = randperm(n_samples)
X = X[:, perm]
Y_labels = Y_labels[perm]
# Reshape Y to be 1xN for Flux compatibility with loss functions
Y = reshape(Y_labels, 1, n_samples)
return Float32.(X), Float32.(Y)
end
X_data, Y_data = generate_moons(300) # Generate 300 data points
println("Generated data X: $(size(X_data)), Y: $(size(Y_data))")
# X_data will be a 2x300 matrix, Y_data will be a 1x300 matrix
# Each column in X_data is a data point [feature1; feature2]
# Each corresponding column in Y_data is its label (0.0 or 1.0)
Neural networks in Flux typically expect input data to be Float32
. Our generate_moons
function handles this conversion. The features X_data
are a matrix where each column is a sample and rows are features. The labels Y_data
are a row vector.
Here's a visual representation of a similar synthetic dataset:
A scatter plot showing two classes of data points forming crescent shapes. Class 0 points are red circles, and Class 1 points are blue crosses, illustrating a non-linearly separable pattern.
We'll construct a feedforward neural network using Chain
, which stacks layers sequentially. Our network will have two hidden layers with relu
activation functions and an output layer with a sigmoid
activation function, suitable for binary classification.
# Define the model
model = Chain(
Dense(2 => 16, relu), # Input layer: 2 features, Output: 16 features, ReLU activation
Dense(16 => 8, relu), # Hidden layer: 16 features, Output: 8 features, ReLU activation
Dense(8 => 1, sigmoid) # Output layer: 8 features, Output: 1 feature (probability), Sigmoid activation
)
# You can inspect the model's parameters
# params(model)
Dense
layer takes 2 input features (our x and y coordinates) and maps them to 16 features.Dense
layer takes these 16 features and maps them to 8 features.Dense
layer takes 8 features and outputs a single value. The sigmoid
function squashes this output to a probability between 0 and 1, representing the likelihood of the input belonging to Class 1.For binary classification problems where the output layer uses a sigmoid activation, Flux.binarycrossentropy
is an appropriate loss function. It measures the difference between the predicted probabilities and the actual binary labels (0 or 1). The formula for a single prediction is L(y,y^)=−(ylog(y^)+(1−y)log(1−y^)), where y is the true label and y^ is the predicted probability.
We will use the ADAM
optimizer, a popular and effective adaptive learning rate optimization algorithm.
# Define the loss function
loss(x, y) = Flux.binarycrossentropy(model(x), y)
# Define the optimizer
optimizer = ADAM(0.01) # ADAM optimizer with a learning rate of 0.01
# Get the parameters of the model for training
ps = Flux.params(model)
Training the network involves iterating over the dataset multiple times (epochs). In each iteration, we:
Flux.gradient
.Flux.update!
.# Training parameters
epochs = 200
losses = [] # To store loss values for plotting
println("Starting training...")
for epoch in 1:epochs
# Calculate gradient
grads = Flux.gradient(() -> loss(X_data, Y_data), ps)
# Update model parameters
Flux.update!(optimizer, ps, grads)
# Calculate and store current loss
current_loss = loss(X_data, Y_data)
push!(losses, current_loss)
if epoch % 20 == 0 || epoch == 1
println("Epoch: $epoch, Loss: $current_loss")
end
end
println("Training finished.")
The training loop prints the loss every 20 epochs. You should observe the loss generally decreasing over time, indicating that the model is learning.
After training, we should evaluate our model's performance.
A common way to monitor training is to plot the loss function over epochs.
The training loss decreasing over epochs, indicating the model is learning to fit the data. The y-axis shows Binary Cross-Entropy Loss, and the x-axis shows Epochs.
The actual values in your loss curve will depend on the random initialization and the exact data. You can generate a similar plot using Plots.jl
with the losses
array we collected:
# using Plots
# plot(1:epochs, losses, xlabel="Epoch", ylabel="Loss", label="Training Loss", legend=:topright, title="Training Loss Curve")
Accuracy is another important metric. It tells us the proportion of data points that are correctly classified. Since our model outputs probabilities, we'll consider a prediction as Class 1 if the probability is > 0.5, and Class 0 otherwise.
# Make predictions on the training data
predictions_prob = model(X_data)
predictions_class = ifelse.(predictions_prob .> 0.5, 1.0, 0.0) # Convert probabilities to class labels (0 or 1)
# Calculate accuracy
# Y_data is 1xN, predictions_class is also 1xN
accuracy = sum(predictions_class .== Y_data) / size(Y_data, 2)
println("Training Accuracy: $(round(accuracy * 100, digits=2))%")
# Expected accuracy should be high, e.g., > 90% for this synthetic problem
For 2D classification problems, visualizing the decision boundary learned by the model can be very insightful. This involves creating a grid of points spanning the data space, predicting the class for each grid point, and then plotting these predictions as a contour map.
# # Optional: Code to plot decision boundary using Plots.jl
# # This can be computationally intensive for large grids or many points
#
# # Create a grid of points
# x_range = range(minimum(X_data[1,:]) - 0.2, maximum(X_data[1,:]) + 0.2, length=100)
# y_range = range(minimum(X_data[2,:]) - 0.2, maximum(X_data[2,:]) + 0.2, length=100)
# grid = Float32.(hcat([[x, y] for x in x_range for y in y_range]'...))
#
# # Get model predictions on the grid
# Z = model(grid)
# Z_reshaped = reshape(Z, length(x_range), length(y_range))
#
# # Plot decision boundary
# contourf(x_range, y_range, Z_reshaped', levels=1, color=[:red_alpha, :blue_alpha], aspect_ratio=:equal, title="Decision Boundary and Data")
#
# # Overlay original data points
# class0_indices = Y_data[1,:] .== 0
# class1_indices = Y_data[1,:] .== 1
# scatter!(X_data[1, class0_indices], X_data[2, class0_indices], label="Class 0", color=:red, markershape=:circle, markerstrokewidth=0, alpha=0.7)
# scatter!(X_data[1, class1_indices], X_data[2, class1_indices], label="Class 1", color=:blue, markershape=:xcross, markerstrokewidth=1, alpha=0.7)
# xlims!(minimum(x_range), maximum(x_range))
# ylims!(minimum(y_range), maximum(y_range))
#
# # savefig("decision_boundary.png") # To save the plot
Running the code above (uncommented) in a Julia environment with Plots.jl
would produce an image showing the two classes of data points and the regions the model assigns to each class. This helps to visually confirm if the model has learned a reasonable separation.
Here is the complete script combining all the steps:
using Flux
using Random
using Plots # For plotting, optional for core logic
# 1. Data Generation and Preparation
Random.seed!(42)
function generate_moons(n_samples::Int=300, noise::Float64=0.15)
n_samples_per_moon = n_samples ÷ 2
t_outer = range(0, stop=pi, length=n_samples_per_moon)
x_outer = 1.0 .* cos.(t_outer) .+ randn(n_samples_per_moon) .* noise
y_outer = 1.0 .* sin.(t_outer) .+ randn(n_samples_per_moon) .* noise
points_outer = [x_outer'; y_outer']
t_inner = range(0, stop=pi, length=n_samples_per_moon)
x_inner = 1.0 .* cos.(t_inner) .- 0.5 .+ randn(n_samples_per_moon) .* noise
y_inner = -1.0 .* sin.(t_inner) .+ 0.3 .+ randn(n_samples_per_moon) .* noise
points_inner = [x_inner'; y_inner']
X = hcat(points_outer, points_inner)
Y_labels = vcat(zeros(Int, n_samples_per_moon), ones(Int, n_samples_per_moon))
perm = randperm(n_samples)
X = X[:, perm]
Y_labels = Y_labels[perm]
Y = reshape(Y_labels, 1, n_samples)
return Float32.(X), Float32.(Y)
end
X_data, Y_data = generate_moons()
# 2. Model Definition
model = Chain(
Dense(2 => 16, relu),
Dense(16 => 8, relu),
Dense(8 => 1, sigmoid)
)
# 3. Loss Function and Optimizer
loss(x, y) = Flux.binarycrossentropy(model(x), y)
optimizer = ADAM(0.01)
ps = Flux.params(model)
# 4. Training Loop
epochs = 200
losses = []
println("Starting training...")
for epoch in 1:epochs
grads = Flux.gradient(() -> loss(X_data, Y_data), ps)
Flux.update!(optimizer, ps, grads)
current_loss = loss(X_data, Y_data)
push!(losses, current_loss)
if epoch % 20 == 0 || epoch == 1
println("Epoch: $epoch, Loss: $current_loss")
end
end
println("Training finished.")
# 5. Evaluation
# Plot loss curve
# plot(1:epochs, losses, xlabel="Epoch", ylabel="Loss", label="Training Loss", legend=:topright, title="Training Loss Curve")
# savefig("loss_curve.png") # Example of saving the plot
# Calculate accuracy
predictions_prob = model(X_data)
predictions_class = ifelse.(predictions_prob .> 0.5, 1.0, 0.0)
accuracy = sum(predictions_class .== Y_data) / size(Y_data, 2)
println("Training Accuracy: $(round(accuracy * 100, digits=2))%")
# Optional: Plot decision boundary (code provided earlier)
# Ensure Plots.jl is used for this visualization.
In this hands-on practical, you have successfully built, trained, and evaluated a simple neural network using Flux.jl for a binary classification task. You saw how to define the model architecture, choose appropriate loss functions and optimizers, and implement the training loop. The decreasing loss and high training accuracy on the synthetic "moons" dataset demonstrate that even a small neural network can learn non-linear decision boundaries.
From here, you can explore more complex architectures, different datasets, experiment with hyperparameters like learning rates and number of neurons, or apply these techniques to regression problems. This example serves as a solid foundation for your further explorations into deep learning with Julia.
Was this section helpful?
© 2025 ApX Machine Learning