FizzBuzz is a simple programming exercise often used to test basic coding skills and gauge basic competency in software engineering interviews. But solving it with deep learning offers an interesting perspective on neural networks, binary encoding, and classification tasks, and who knows, this solution might even become a future staple in machine learning interviews.
In this guide, we’ll build a PyTorch-based neural network to solve FizzBuzz using binary encoding and a straightforward architecture. We can achieve 98% to 100% accuracy on a test set of unseen numbers.
Initially, I experimented with feeding whole numbers directly to the model, but it didn’t perform well. Neural networks struggled to grasp the FizzBuzz rules when trained with continuous input. This issue led me to try binary encoding, which transforms a number into its binary representation.
Here’s why binary encoding worked better:
After testing different bit lengths:
Ultimately, encoding each number into a 12-bit vector struck the best balance between performance and efficiency.
The neural network architecture is simple but effective, consisting of:
This architecture achieved over 98% accuracy on unseen test numbers, proving its ability to generalize.
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
def fizzbuzz_encode(x):
if x % 15 == 0:
return 3 # FizzBuzz
elif x % 5 == 0:
return 2 # Buzz
elif x % 3 == 0:
return 1 # Fizz
else:
return 0 # Neither
def binary_encode(x, num_bits):
return [x >> i & 1 for i in range(num_bits)]
class FizzBuzzModel(nn.Module):
def __init__(self, input_size, hidden_size1, hidden_size2, output_size):
super(FizzBuzzModel, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size1)
self.dropout1 = nn.Dropout(0.2)
self.fc2 = nn.Linear(hidden_size1, hidden_size2)
self.dropout2 = nn.Dropout(0.2)
self.fc3 = nn.Linear(hidden_size2, output_size)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = self.dropout1(x)
x = torch.relu(self.fc2(x))
x = self.dropout2(x)
x = self.fc3(x)
return x
NUM_BITS = 12
HIDDEN_SIZE1 = 256
HIDDEN_SIZE2 = 128
OUTPUT_SIZE = 4
NUM_EPOCHS = 5000
LEARNING_RATE = 0.005
# Enable CUDA GPU if available. Change it to "mps" for Mac M-Chip GPU.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
TRAIN_START = 101
TRAIN_END = 2 ** NUM_BITS - 1
TEST_START = 1
TEST_END = 100
train_inputs = torch.tensor([binary_encode(i, NUM_BITS) for i in range(TRAIN_START, TRAIN_END + 1)], dtype=torch.float32).to(device)
train_labels = torch.tensor([fizzbuzz_encode(i) for i in range(TRAIN_START, TRAIN_END + 1)], dtype=torch.long).to(device)
test_inputs = torch.tensor([binary_encode(i, NUM_BITS) for i in range(TEST_START, TEST_END + 1)], dtype=torch.float32).to(device)
test_labels = torch.tensor([fizzbuzz_encode(i) for i in range(TEST_START, TEST_END + 1)], dtype=torch.long).to(device)
model = FizzBuzzModel(NUM_BITS, HIDDEN_SIZE1, HIDDEN_SIZE2, OUTPUT_SIZE).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)
for epoch in range(NUM_EPOCHS):
model.train()
outputs = model(train_inputs)
loss = criterion(outputs, train_labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch + 1) % 500 == 0:
print(f"Epoch [{epoch + 1}/{NUM_EPOCHS}], Loss: {loss.item():.4f}")
model.eval()
with torch.no_grad():
predictions = model(test_inputs)
predicted_labels = torch.argmax(predictions, dim=1).cpu().numpy()
correct = (predicted_labels == test_labels.cpu().numpy()).sum()
accuracy = correct / len(test_labels) * 100
print(f"Test Accuracy: {accuracy:.2f}%")
for i, label in enumerate(predicted_labels):
fizzbuzz_output = ["0", "Fizz", "Buzz", "FizzBuzz"][label]
print(f"Number: {i + 1}, Prediction: {fizzbuzz_output}")
The training set includes numbers from 101 to 4095, encoded into 12-bit vectors. The model is trained using cross-entropy loss and the Adam optimizer, running for 5000 epochs.
The test set comprises numbers 1 to 100, which are excluded from the training set. This ensures the model isn’t just memorizing numbers but learning the actual FizzBuzz pattern.
Here’s a sample of the training output:
Epoch [500/5000], Loss: 0.1221
Epoch [1000/5000], Loss: 0.0579
Epoch [1500/5000], Loss: 0.0324
Epoch [2000/5000], Loss: 0.0215
Epoch [2500/5000], Loss: 0.0208
Epoch [3000/5000], Loss: 0.0159
Epoch [3500/5000], Loss: 0.0108
Epoch [4000/5000], Loss: 0.0202
Epoch [4500/5000], Loss: 0.0109
Epoch [5000/5000], Loss: 0.0079
Test Accuracy: 100.00%
The model achieves a perfect 100% accuracy on the test set. This validates that it can generalize the FizzBuzz pattern beyond the training data.
Using binary encoding and a simple neural network architecture, we successfully solved FizzBuzz with deep learning. This experiment demonstrates:
If you’re exploring neural networks for unconventional problems, this project is an excellent example of how even simple challenges can become insightful learning experiences.
Recommended Posts
© 2025 ApX Machine Learning. All rights reserved.
AutoML Platform
LangML Suite