Build a simple sequence model using a common deep learning library, TensorFlow with its Keras API. This practical approach demonstrates the application of concepts such as Recurrent Neural Networks (RNNs), LSTMs, and GRUs. This exercise will solidify your understanding of how to prepare sequential text data and construct a basic recurrent model for a representative task.
"We'll tackle a simplified sentiment analysis problem: classifying short text snippets as either positive or negative. While sentiment analysis often involves more complex datasets and models, this example focuses purely on the mechanics of setting up and training a sequence model."
First, ensure you have TensorFlow installed. If not, you can typically install it using pip:
pip install tensorflow
We'll use Keras, which is bundled with TensorFlow, for building our model. Let's define a small, synthetic dataset for demonstration purposes.
import numpy as np
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense, LSTM, GRU
from tensorflow.keras.optimizers import Adam
# Sample data: (text, label) -> 0 for negative, 1 for positive
texts = [
"this is a great movie",
"i really enjoyed the experience",
"what a fantastic performance",
"loved the acting",
"truly amazing",
"this is terrible",
"i did not like it at all",
"what a boring show",
"hated the plot",
"really awful film"
]
labels = np.array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0]) # 5 positive, 5 negative
print(f"Number of samples: {len(texts)}")
print(f"Sample text: '{texts[0]}', Label: {labels[0]}")
print(f"Sample text: '{texts[5]}', Label: {labels[5]}")
Sequence models don't work directly with raw text. We need to convert our sentences into numerical representations that the model can process. This involves two main steps: tokenization and padding.
# --- Tokenization ---
vocab_size = 100 # Maximum number of words to keep based on frequency
tokenizer = Tokenizer(num_words=vocab_size, oov_token="<OOV>") # <OOV> for out-of-vocabulary words
tokenizer.fit_on_texts(texts)
word_index = tokenizer.word_index
sequences = tokenizer.texts_to_sequences(texts)
print("\nWord Index Sample:", list(word_index.items())[:10])
print("Original Text:", texts[0])
print("Sequence Representation:", sequences[0])
# --- Padding ---
max_length = 10 # Define a maximum sequence length (can be inferred or set)
padded_sequences = pad_sequences(sequences, maxlen=max_length, padding='post', truncating='post')
print("\nPadded Sequence Example (Post-padding):")
print(padded_sequences[0])
print("Shape of padded sequences:", padded_sequences.shape)
Notice how pad_sequences adds zeros at the end (padding='post') to make all sequences have length 10. If a sequence was longer than max_length, it would be shortened (truncating='post').
Now, let's construct our model. We'll use Keras's Sequential API, which allows us to stack layers linearly.
input_dim (size of the vocabulary) and output_dim (dimensionality of the embedding vectors). We also specify input_length which corresponds to our max_length from padding.SimpleRNN. The primary argument is units, which defines the dimensionality of the hidden state (and output space). Other recurrent layers like LSTM or GRU can be swapped in here.Dense layer with one unit and a sigmoid activation function. The sigmoid function outputs a value between 0 and 1, representing the probability of the positive class.embedding_dim = 16 # Dimensionality of the word embeddings
rnn_units = 32 # Number of units in the RNN layer
model = Sequential([
# 1. Embedding Layer
Embedding(input_dim=vocab_size,
output_dim=embedding_dim,
input_length=max_length),
# 2. Recurrent Layer (SimpleRNN)
# Try replacing SimpleRNN with LSTM or GRU later!
SimpleRNN(units=rnn_units),
# If stacking RNN layers, use return_sequences=True on intermediate layers:
# SimpleRNN(units=rnn_units, return_sequences=True),
# SimpleRNN(units=rnn_units), # Last RNN layer doesn't need return_sequences=True
# 3. Output Layer
Dense(units=1, activation='sigmoid')
])
# Display the model's architecture
model.summary()
The summary shows the layers, their output shapes, and the number of trainable parameters. Notice how the SimpleRNN layer outputs a single vector of shape (None, 32), where 32 is rnn_units. If return_sequences=True were set, the output shape would be (None, max_length, rnn_units).
Before training, we need to configure the learning process using model.compile(). This involves specifying:
binary_crossentropy is appropriate.accuracy is a common metric.model.compile(optimizer=Adam(learning_rate=0.001),
loss='binary_crossentropy',
metrics=['accuracy'])
print("\nModel compiled successfully.")
Now we train the model using our prepared data. We provide the padded sequences as input (X) and the corresponding labels (y).
num_epochs = 30
batch_size = 2
validation_fraction = 0.2 # Use 20% of the data for validation
print(f"\nStarting training for {num_epochs} epochs...")
history = model.fit(padded_sequences,
labels,
epochs=num_epochs,
batch_size=batch_size,
validation_split=validation_fraction,
verbose=1) # Set verbose=0 to hide epoch progress
print("\nTraining finished.")
During training, Keras prints the loss and accuracy for both the training set and the validation set (if provided) after each epoch.
Plotting the training and validation loss and accuracy over epochs is a standard way to assess the model's learning progress and check for overfitting. Overfitting occurs when the model performs well on the training data but poorly on unseen validation data (training loss decreases while validation loss increases).
import plotly.graph_objects as go
from plotly.subplots import make_subplots
# Extract history data
acc = history.history['accuracy']
val_acc = history.history.get('val_accuracy') # Use .get() in case validation_split was 0
loss = history.history['loss']
val_loss = history.history.get('val_loss')
epochs_range = range(1, num_epochs + 1)
# Create figure with subplots
fig = make_subplots(rows=1, cols=2, subplot_titles=('Training and Validation Accuracy', 'Training and Validation Loss'))
# Add Accuracy trace
fig.add_trace(go.Scatter(x=list(epochs_range), y=acc, name='Training Accuracy', mode='lines+markers', marker_color='#1f77b4'), row=1, col=1)
if val_acc:
fig.add_trace(go.Scatter(x=list(epochs_range), y=val_acc, name='Validation Accuracy', mode='lines+markers', marker_color='#ff7f0e'), row=1, col=1)
# Add Loss trace
fig.add_trace(go.Scatter(x=list(epochs_range), y=loss, name='Training Loss', mode='lines+markers', marker_color='#1f77b4'), row=1, col=2)
if val_loss:
fig.add_trace(go.Scatter(x=list(epochs_range), y=val_loss, name='Validation Loss', mode='lines+markers', marker_color='#ff7f0e'), row=1, col=2)
# Update layout
fig.update_layout(
height=400,
width=800,
xaxis_title='Epoch',
yaxis_title='Accuracy',
xaxis2_title='Epoch',
yaxis2_title='Loss',
legend_title_text='Metric',
margin=dict(l=20, r=20, t=50, b=20) # Adjust margins
)
# Display the plot (in environments that support Plotly rendering)
# fig.show() # Uncomment to display locally if Plotly is configured
# Or provide the JSON representation for web embedding
plotly_json = fig.to_json()
Training and validation accuracy and loss curves over training epochs.
In this simple example with easily separable data, the accuracy quickly reaches 1.0 (or 100%). On more realistic datasets, you'd expect a more gradual increase and potential signs of overfitting (divergence between training and validation curves).
Finally, let's see how to use the trained model to predict the sentiment of new, unseen text. Remember to apply the same preprocessing steps (tokenization and padding) to the new data.
new_texts = [
"it was truly great",
"a complete waste of time",
"amazing film loved it"
]
# Preprocess the new texts
new_sequences = tokenizer.texts_to_sequences(new_texts)
new_padded = pad_sequences(new_sequences, maxlen=max_length, padding='post', truncating='post')
print("\nNew padded sequences:")
print(new_padded)
# Get predictions (probabilities)
predictions = model.predict(new_padded)
print("\nRaw Predictions (Probabilities):")
print(predictions)
# Interpret predictions (threshold at 0.5)
predicted_labels = (predictions > 0.5).astype(int).flatten() # flatten converts [[0],[1]] to [0,1]
print("\nPredicted Labels (0=Negative, 1=Positive):")
for text, label in zip(new_texts, predicted_labels):
sentiment = "Positive" if label == 1 else "Negative"
print(f"'{text}' -> {sentiment}")
The output shows the probability assigned by the model to the positive class (values closer to 1 indicate positive sentiment, closer to 0 indicate negative) and the final predicted label based on a 0.5 threshold.
This example provides a basic framework. You are encouraged to experiment:
SimpleRNN with LSTM or GRU in the model definition. Observe if there's any difference in training speed or final performance (though this dataset is too simple to see significant differences related to vanishing gradients).
# Example using LSTM
# model = Sequential([
# Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_length),
# LSTM(units=rnn_units), # Replace SimpleRNN with LSTM
# Dense(units=1, activation='sigmoid')
# ])
embedding_dim, rnn_units, learning_rate, batch_size, or num_epochs and retrain the model.return_sequences=True on all but the last recurrent layer).tensorflow_datasets.This practical exercise demonstrated the end-to-end process of building and training a simple sequence model for text classification. You now have the foundational code structure to tackle more complex sequence processing tasks using RNNs, LSTMs, or GRUs.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with