Now that we've covered the theoretical foundations of Recurrent Neural Networks (RNNs), LSTMs, and GRUs, it's time to put this knowledge into practice. In this hands-on section, we will build a simple sequence model using a common deep learning library, TensorFlow with its Keras API. This exercise will solidify your understanding of how to prepare sequential text data and construct a basic recurrent model for a representative task.
We'll tackle a simplified sentiment analysis problem: classifying short text snippets as either positive or negative. While real-world sentiment analysis often involves more complex datasets and models, this example focuses purely on the mechanics of setting up and training a sequence model.
First, ensure you have TensorFlow installed. If not, you can typically install it using pip:
pip install tensorflow
We'll use Keras, which is bundled with TensorFlow, for building our model. Let's define a small, synthetic dataset for demonstration purposes.
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense, LSTM, GRU
from tensorflow.keras.optimizers import Adam
# Sample data: (text, label) -> 0 for negative, 1 for positive
texts = [
"this is a great movie",
"i really enjoyed the experience",
"what a fantastic performance",
"loved the acting",
"truly amazing",
"this is terrible",
"i did not like it at all",
"what a boring show",
"hated the plot",
"really awful film"
]
labels = np.array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0]) # 5 positive, 5 negative
print(f"Number of samples: {len(texts)}")
print(f"Sample text: '{texts[0]}', Label: {labels[0]}")
print(f"Sample text: '{texts[5]}', Label: {labels[5]}")
Sequence models don't work directly with raw text. We need to convert our sentences into numerical representations that the model can process. This involves two main steps: tokenization and padding.
# --- Tokenization ---
vocab_size = 100 # Maximum number of words to keep based on frequency
tokenizer = Tokenizer(num_words=vocab_size, oov_token="<OOV>") # <OOV> for out-of-vocabulary words
tokenizer.fit_on_texts(texts)
word_index = tokenizer.word_index
sequences = tokenizer.texts_to_sequences(texts)
print("\nWord Index Sample:", list(word_index.items())[:10])
print("Original Text:", texts[0])
print("Sequence Representation:", sequences[0])
# --- Padding ---
max_length = 10 # Define a maximum sequence length (can be inferred or set)
padded_sequences = pad_sequences(sequences, maxlen=max_length, padding='post', truncating='post')
print("\nPadded Sequence Example (Post-padding):")
print(padded_sequences[0])
print("Shape of padded sequences:", padded_sequences.shape)
Notice how pad_sequences
adds zeros at the end (padding='post'
) to make all sequences have length 10. If a sequence was longer than max_length
, it would be shortened (truncating='post'
).
Now, let's construct our model. We'll use Keras's Sequential
API, which allows us to stack layers linearly.
input_dim
(size of the vocabulary) and output_dim
(dimensionality of the embedding vectors). We also specify input_length
which corresponds to our max_length
from padding.SimpleRNN
. The primary argument is units
, which defines the dimensionality of the hidden state (and output space). Other recurrent layers like LSTM
or GRU
can be swapped in here.Dense
layer with one unit and a sigmoid
activation function. The sigmoid function outputs a value between 0 and 1, representing the probability of the positive class.embedding_dim = 16 # Dimensionality of the word embeddings
rnn_units = 32 # Number of units in the RNN layer
model = Sequential([
# 1. Embedding Layer
Embedding(input_dim=vocab_size,
output_dim=embedding_dim,
input_length=max_length),
# 2. Recurrent Layer (SimpleRNN)
# Try replacing SimpleRNN with LSTM or GRU later!
SimpleRNN(units=rnn_units),
# If stacking RNN layers, use return_sequences=True on intermediate layers:
# SimpleRNN(units=rnn_units, return_sequences=True),
# SimpleRNN(units=rnn_units), # Last RNN layer doesn't need return_sequences=True
# 3. Output Layer
Dense(units=1, activation='sigmoid')
])
# Display the model's architecture
model.summary()
The summary shows the layers, their output shapes, and the number of trainable parameters. Notice how the SimpleRNN
layer outputs a single vector of shape (None, 32)
, where 32 is rnn_units
. If return_sequences=True
were set, the output shape would be (None, max_length, rnn_units)
.
Before training, we need to configure the learning process using model.compile()
. This involves specifying:
binary_crossentropy
is appropriate.accuracy
is a common metric.model.compile(optimizer=Adam(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
print("\nModel compiled successfully.")
Now we train the model using our prepared data. We provide the padded sequences as input (X
) and the corresponding labels (y
).
num_epochs = 30
batch_size = 2
validation_fraction = 0.2 # Use 20% of the data for validation
print(f"\nStarting training for {num_epochs} epochs...")
history = model.fit(padded_sequences,
labels,
epochs=num_epochs,
batch_size=batch_size,
validation_split=validation_fraction,
verbose=1) # Set verbose=0 to hide epoch progress
print("\nTraining finished.")
During training, Keras prints the loss and accuracy for both the training set and the validation set (if provided) after each epoch.
Plotting the training and validation loss and accuracy over epochs is a standard way to assess the model's learning progress and check for overfitting. Overfitting occurs when the model performs well on the training data but poorly on unseen validation data (training loss decreases while validation loss increases).
import plotly.graph_objects as go
from plotly.subplots import make_subplots
# Extract history data
acc = history.history['accuracy']
val_acc = history.history.get('val_accuracy') # Use .get() in case validation_split was 0
loss = history.history['loss']
val_loss = history.history.get('val_loss')
epochs_range = range(1, num_epochs + 1)
# Create figure with subplots
fig = make_subplots(rows=1, cols=2, subplot_titles=('Training and Validation Accuracy', 'Training and Validation Loss'))
# Add Accuracy trace
fig.add_trace(go.Scatter(x=list(epochs_range), y=acc, name='Training Accuracy', mode='lines+markers', marker_color='#1f77b4'), row=1, col=1)
if val_acc:
fig.add_trace(go.Scatter(x=list(epochs_range), y=val_acc, name='Validation Accuracy', mode='lines+markers', marker_color='#ff7f0e'), row=1, col=1)
# Add Loss trace
fig.add_trace(go.Scatter(x=list(epochs_range), y=loss, name='Training Loss', mode='lines+markers', marker_color='#1f77b4'), row=1, col=2)
if val_loss:
fig.add_trace(go.Scatter(x=list(epochs_range), y=val_loss, name='Validation Loss', mode='lines+markers', marker_color='#ff7f0e'), row=1, col=2)
# Update layout
fig.update_layout(
height=400,
width=800,
xaxis_title='Epoch',
yaxis_title='Accuracy',
xaxis2_title='Epoch',
yaxis2_title='Loss',
legend_title_text='Metric',
margin=dict(l=20, r=20, t=50, b=20) # Adjust margins
)
# Display the plot (in environments that support Plotly rendering)
# fig.show() # Uncomment to display locally if Plotly is configured
# Or provide the JSON representation for web embedding
plotly_json = fig.to_json()
Training and validation accuracy and loss curves over training epochs.
In this simple example with easily separable data, the accuracy quickly reaches 1.0 (or 100%). On more realistic datasets, you'd expect a more gradual increase and potential signs of overfitting (divergence between training and validation curves).
Finally, let's see how to use the trained model to predict the sentiment of new, unseen text. Remember to apply the same preprocessing steps (tokenization and padding) to the new data.
new_texts = [
"it was truly great",
"a complete waste of time",
"amazing film loved it"
]
# Preprocess the new texts
new_sequences = tokenizer.texts_to_sequences(new_texts)
new_padded = pad_sequences(new_sequences, maxlen=max_length, padding='post', truncating='post')
print("\nNew padded sequences:")
print(new_padded)
# Get predictions (probabilities)
predictions = model.predict(new_padded)
print("\nRaw Predictions (Probabilities):")
print(predictions)
# Interpret predictions (threshold at 0.5)
predicted_labels = (predictions > 0.5).astype(int).flatten() # flatten converts [[0],[1]] to [0,1]
print("\nPredicted Labels (0=Negative, 1=Positive):")
for text, label in zip(new_texts, predicted_labels):
sentiment = "Positive" if label == 1 else "Negative"
print(f"'{text}' -> {sentiment}")
The output shows the probability assigned by the model to the positive class (values closer to 1 indicate positive sentiment, closer to 0 indicate negative) and the final predicted label based on a 0.5 threshold.
This example provides a basic framework. You are encouraged to experiment:
SimpleRNN
with LSTM
or GRU
in the model definition. Observe if there's any difference in training speed or final performance (though this dataset is too simple to see significant differences related to vanishing gradients).
# Example using LSTM
# model = Sequential([
# Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_length),
# LSTM(units=rnn_units), # Replace SimpleRNN with LSTM
# Dense(units=1, activation='sigmoid')
# ])
embedding_dim
, rnn_units
, learning_rate
, batch_size
, or num_epochs
and retrain the model.return_sequences=True
on all but the last recurrent layer).tensorflow_datasets
.This practical exercise demonstrated the end-to-end process of building and training a simple sequence model for text classification. You now have the foundational code structure to tackle more complex sequence processing tasks using RNNs, LSTMs, or GRUs.
© 2025 ApX Machine Learning