How to Solve FizzBuzz Using Deep Learning

Wei Ming T.

By Wei Ming T. on Jan 22, 2025

FizzBuzz is a simple programming exercise often used to test basic coding skills and gauge basic competency in software engineering interviews. But solving it with deep learning offers an interesting perspective on neural networks, binary encoding, and classification tasks, and who knows, this solution might even become a future staple in machine learning interviews.

In this guide, we’ll build a PyTorch-based neural network to solve FizzBuzz using binary encoding and a straightforward architecture. We can achieve 98% to 100% accuracy on a test set of unseen numbers.

Input Encoding

Initially, I experimented with feeding whole numbers directly to the model, but it didn’t perform well. Neural networks struggled to grasp the FizzBuzz rules when trained with continuous input. This issue led me to try binary encoding, which transforms a number into its binary representation.

Here’s why binary encoding worked better:

  • Compact Representation: Each number is represented as a fixed-length vector of bits (0s and 1s). This eliminates the need for the model to understand magnitude or perform implicit arithmetic.
  • Pattern Recognition: Binary encoding captures the key patterns in FizzBuzz (multiples of 3, 5, and 15) in a structured, easily learnable format.

After testing different bit lengths:

  • 12 bits provided consistently high accuracy. Anything lower, such as 10 bits, degraded performance significantly.
  • 16 bits or more slightly improved accuracy but also increased computational cost, with diminishing returns.

Ultimately, encoding each number into a 12-bit vector struck the best balance between performance and efficiency.

Model Architecture

The neural network architecture is simple but effective, consisting of:

  1. Input Layer: Accepts a 12-bit binary vector.
  2. Two Hidden Layers: Each with 256 and 128 neurons, ReLU activations, and dropout (20%) for regularization.
  3. Output Layer: Predicts one of four classes: “Neither,” “Fizz,” “Buzz,” or “FizzBuzz.”

This architecture achieved over 98% accuracy on unseen test numbers, proving its ability to generalize.

Code Implementation

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

def fizzbuzz_encode(x):
    if x % 15 == 0:
        return 3  # FizzBuzz
    elif x % 5 == 0:
        return 2  # Buzz
    elif x % 3 == 0:
        return 1  # Fizz
    else:
        return 0  # Neither

def binary_encode(x, num_bits):
    return [x >> i & 1 for i in range(num_bits)]

class FizzBuzzModel(nn.Module):
    def __init__(self, input_size, hidden_size1, hidden_size2, output_size):
        super(FizzBuzzModel, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size1)
        self.dropout1 = nn.Dropout(0.2)
        self.fc2 = nn.Linear(hidden_size1, hidden_size2)
        self.dropout2 = nn.Dropout(0.2)
        self.fc3 = nn.Linear(hidden_size2, output_size)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = self.dropout1(x)
        x = torch.relu(self.fc2(x))
        x = self.dropout2(x)
        x = self.fc3(x)
        return x

NUM_BITS = 12
HIDDEN_SIZE1 = 256
HIDDEN_SIZE2 = 128
OUTPUT_SIZE = 4
NUM_EPOCHS = 5000
LEARNING_RATE = 0.005

# Enable CUDA GPU if available. Change it to "mps" for Mac M-Chip GPU.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

TRAIN_START = 101
TRAIN_END = 2 ** NUM_BITS - 1
TEST_START = 1
TEST_END = 100

train_inputs = torch.tensor([binary_encode(i, NUM_BITS) for i in range(TRAIN_START, TRAIN_END + 1)], dtype=torch.float32).to(device)
train_labels = torch.tensor([fizzbuzz_encode(i) for i in range(TRAIN_START, TRAIN_END + 1)], dtype=torch.long).to(device)

test_inputs = torch.tensor([binary_encode(i, NUM_BITS) for i in range(TEST_START, TEST_END + 1)], dtype=torch.float32).to(device)
test_labels = torch.tensor([fizzbuzz_encode(i) for i in range(TEST_START, TEST_END + 1)], dtype=torch.long).to(device)

model = FizzBuzzModel(NUM_BITS, HIDDEN_SIZE1, HIDDEN_SIZE2, OUTPUT_SIZE).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)

for epoch in range(NUM_EPOCHS):
    model.train()
    outputs = model(train_inputs)
    loss = criterion(outputs, train_labels)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if (epoch + 1) % 500 == 0:
        print(f"Epoch [{epoch + 1}/{NUM_EPOCHS}], Loss: {loss.item():.4f}")

model.eval()
with torch.no_grad():
    predictions = model(test_inputs)
    predicted_labels = torch.argmax(predictions, dim=1).cpu().numpy()

    correct = (predicted_labels == test_labels.cpu().numpy()).sum()
    accuracy = correct / len(test_labels) * 100
    print(f"Test Accuracy: {accuracy:.2f}%")

    for i, label in enumerate(predicted_labels):
        fizzbuzz_output = ["0", "Fizz", "Buzz", "FizzBuzz"][label]
        print(f"Number: {i + 1}, Prediction: {fizzbuzz_output}")

Training and Testing Setup

Training Process

The training set includes numbers from 101 to 4095, encoded into 12-bit vectors. The model is trained using cross-entropy loss and the Adam optimizer, running for 5000 epochs.

Test Set

The test set comprises numbers 1 to 100, which are excluded from the training set. This ensures the model isn’t just memorizing numbers but learning the actual FizzBuzz pattern.

Results and Observations

Here’s a sample of the training output:

Epoch [500/5000], Loss: 0.1221
Epoch [1000/5000], Loss: 0.0579
Epoch [1500/5000], Loss: 0.0324
Epoch [2000/5000], Loss: 0.0215
Epoch [2500/5000], Loss: 0.0208
Epoch [3000/5000], Loss: 0.0159
Epoch [3500/5000], Loss: 0.0108
Epoch [4000/5000], Loss: 0.0202
Epoch [4500/5000], Loss: 0.0109
Epoch [5000/5000], Loss: 0.0079
Test Accuracy: 100.00%

The model achieves a perfect 100% accuracy on the test set. This validates that it can generalize the FizzBuzz pattern beyond the training data.

Conclusion

Using binary encoding and a simple neural network architecture, we successfully solved FizzBuzz with deep learning. This experiment demonstrates:

  1. The power of binary encoding for compact and meaningful data representation.
  2. The effectiveness of a straightforward neural network for classification tasks.
  3. The importance of a carefully chosen test set to verify generalization.

If you’re exploring neural networks for unconventional problems, this project is an excellent example of how even simple challenges can become insightful learning experiences.

© 2025 ApX Machine Learning. All rights reserved.

AutoML Platform

Beta
  • Early access to high-performance ML infrastructure
  • Be first to leverage distributed training
  • Shape the future of no-code ML development
Learn More

LangML Suite

Coming Soon
  • Priority access to enterprise LLM infrastructure
  • Be among first to test RAG optimization
  • Exclusive early access to fine-tuning suite
Learn More