Build a simple sequence model using a common deep learning library, TensorFlow with its Keras API. This practical approach demonstrates the application of concepts such as Recurrent Neural Networks (RNNs), LSTMs, and GRUs. This exercise will solidify your understanding of how to prepare sequential text data and construct a basic recurrent model for a representative task."We'll tackle a simplified sentiment analysis problem: classifying short text snippets as either positive or negative. While sentiment analysis often involves more complex datasets and models, this example focuses purely on the mechanics of setting up and training a sequence model."Setting the Stage: Libraries and DataFirst, ensure you have TensorFlow installed. If not, you can typically install it using pip: pip install tensorflowWe'll use Keras, which is bundled with TensorFlow, for building our model. Let's define a small, synthetic dataset for demonstration purposes.import numpy as np import tensorflow as tf from tensorflow.keras.preprocessing.text import Tokenizer from tensorflow.keras.preprocessing.sequence import pad_sequences from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Embedding, SimpleRNN, Dense, LSTM, GRU from tensorflow.keras.optimizers import Adam # Sample data: (text, label) -> 0 for negative, 1 for positive texts = [ "this is a great movie", "i really enjoyed the experience", "what a fantastic performance", "loved the acting", "truly amazing", "this is terrible", "i did not like it at all", "what a boring show", "hated the plot", "really awful film" ] labels = np.array([1, 1, 1, 1, 1, 0, 0, 0, 0, 0]) # 5 positive, 5 negative print(f"Number of samples: {len(texts)}") print(f"Sample text: '{texts[0]}', Label: {labels[0]}") print(f"Sample text: '{texts[5]}', Label: {labels[5]}")Preparing the Text DataSequence models don't work directly with raw text. We need to convert our sentences into numerical representations that the model can process. This involves two main steps: tokenization and padding.Tokenization: We assign a unique integer index to each distinct word in our dataset (our vocabulary).Padding: Since RNNs process sequences step-by-step, they typically require input sequences to have a uniform length. We achieve this by adding special "padding" tokens (usually represented by 0) to shorter sequences until they match the length of the longest sequence (or a predefined maximum length).# --- Tokenization --- vocab_size = 100 # Maximum number of words to keep based on frequency tokenizer = Tokenizer(num_words=vocab_size, oov_token="<OOV>") # <OOV> for out-of-vocabulary words tokenizer.fit_on_texts(texts) word_index = tokenizer.word_index sequences = tokenizer.texts_to_sequences(texts) print("\nWord Index Sample:", list(word_index.items())[:10]) print("Original Text:", texts[0]) print("Sequence Representation:", sequences[0]) # --- Padding --- max_length = 10 # Define a maximum sequence length (can be inferred or set) padded_sequences = pad_sequences(sequences, maxlen=max_length, padding='post', truncating='post') print("\nPadded Sequence Example (Post-padding):") print(padded_sequences[0]) print("Shape of padded sequences:", padded_sequences.shape)Notice how pad_sequences adds zeros at the end (padding='post') to make all sequences have length 10. If a sequence was longer than max_length, it would be shortened (truncating='post').Building the Sequence ModelNow, let's construct our model. We'll use Keras's Sequential API, which allows us to stack layers linearly.Embedding Layer: This is the first layer. It takes the integer-encoded vocabulary and looks up the corresponding embedding vector for each word. These embeddings are learned during training. It requires the input_dim (size of the vocabulary) and output_dim (dimensionality of the embedding vectors). We also specify input_length which corresponds to our max_length from padding.Recurrent Layer: This is the core of our sequence model. We'll start with SimpleRNN. The primary argument is units, which defines the dimensionality of the hidden state (and output space). Other recurrent layers like LSTM or GRU can be swapped in here.Dense Output Layer: Since this is a binary classification problem (positive/negative), we need a final Dense layer with one unit and a sigmoid activation function. The sigmoid function outputs a value between 0 and 1, representing the probability of the positive class.embedding_dim = 16 # Dimensionality of the word embeddings rnn_units = 32 # Number of units in the RNN layer model = Sequential([ # 1. Embedding Layer Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_length), # 2. Recurrent Layer (SimpleRNN) # Try replacing SimpleRNN with LSTM or GRU later! SimpleRNN(units=rnn_units), # If stacking RNN layers, use return_sequences=True on intermediate layers: # SimpleRNN(units=rnn_units, return_sequences=True), # SimpleRNN(units=rnn_units), # Last RNN layer doesn't need return_sequences=True # 3. Output Layer Dense(units=1, activation='sigmoid') ]) # Display the model's architecture model.summary()The summary shows the layers, their output shapes, and the number of trainable parameters. Notice how the SimpleRNN layer outputs a single vector of shape (None, 32), where 32 is rnn_units. If return_sequences=True were set, the output shape would be (None, max_length, rnn_units).Compiling the ModelBefore training, we need to configure the learning process using model.compile(). This involves specifying:Optimizer: The algorithm used to update the model weights (e.g., Adam, RMSprop, SGD). Adam is often a good default choice.Loss Function: Measures how well the model performs on the training data. For binary classification with sigmoid output, binary_crossentropy is appropriate.Metrics: Used to monitor the training and testing steps. For classification, accuracy is a common metric.model.compile(optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy']) print("\nModel compiled successfully.")Training the ModelNow we train the model using our prepared data. We provide the padded sequences as input (X) and the corresponding labels (y).epochs: The number of times the model will iterate over the entire training dataset.batch_size: The number of samples processed before the model's weights are updated.validation_split: Optionally, set aside a fraction of the training data to evaluate the loss and metrics at the end of each epoch. This helps monitor overfitting.num_epochs = 30 batch_size = 2 validation_fraction = 0.2 # Use 20% of the data for validation print(f"\nStarting training for {num_epochs} epochs...") history = model.fit(padded_sequences, labels, epochs=num_epochs, batch_size=batch_size, validation_split=validation_fraction, verbose=1) # Set verbose=0 to hide epoch progress print("\nTraining finished.")During training, Keras prints the loss and accuracy for both the training set and the validation set (if provided) after each epoch.Visualizing Training HistoryPlotting the training and validation loss and accuracy over epochs is a standard way to assess the model's learning progress and check for overfitting. Overfitting occurs when the model performs well on the training data but poorly on unseen validation data (training loss decreases while validation loss increases).import plotly.graph_objects as go from plotly.subplots import make_subplots # Extract history data acc = history.history['accuracy'] val_acc = history.history.get('val_accuracy') # Use .get() in case validation_split was 0 loss = history.history['loss'] val_loss = history.history.get('val_loss') epochs_range = range(1, num_epochs + 1) # Create figure with subplots fig = make_subplots(rows=1, cols=2, subplot_titles=('Training and Validation Accuracy', 'Training and Validation Loss')) # Add Accuracy trace fig.add_trace(go.Scatter(x=list(epochs_range), y=acc, name='Training Accuracy', mode='lines+markers', marker_color='#1f77b4'), row=1, col=1) if val_acc: fig.add_trace(go.Scatter(x=list(epochs_range), y=val_acc, name='Validation Accuracy', mode='lines+markers', marker_color='#ff7f0e'), row=1, col=1) # Add Loss trace fig.add_trace(go.Scatter(x=list(epochs_range), y=loss, name='Training Loss', mode='lines+markers', marker_color='#1f77b4'), row=1, col=2) if val_loss: fig.add_trace(go.Scatter(x=list(epochs_range), y=val_loss, name='Validation Loss', mode='lines+markers', marker_color='#ff7f0e'), row=1, col=2) # Update layout fig.update_layout( height=400, width=800, xaxis_title='Epoch', yaxis_title='Accuracy', xaxis2_title='Epoch', yaxis2_title='Loss', legend_title_text='Metric', margin=dict(l=20, r=20, t=50, b=20) # Adjust margins ) # Display the plot (in environments that support Plotly rendering) # fig.show() # Uncomment to display locally if Plotly is configured # Or provide the JSON representation for web embedding plotly_json = fig.to_json(){"layout": {"height": 400, "width": 800, "xaxis": {"anchor": "y", "domain": [0.0, 0.45], "title": {"text": "Epoch"}}, "yaxis": {"anchor": "x", "domain": [0.0, 1.0], "title": {"text": "Accuracy"}}, "xaxis2": {"anchor": "y2", "domain": [0.55, 1.0], "title": {"text": "Epoch"}}, "yaxis2": {"anchor": "x2", "domain": [0.0, 1.0], "title": {"text": "Loss"}}, "legend": {"title": {"text": "Metric"}}, "margin": {"l": 20, "r": 20, "t": 50, "b": 20}, "annotations": [{"xref": "paper", "yref": "paper", "x": 0.225, "y": 1.03, "showarrow": false, "text": "Training and Validation Accuracy", "xanchor": "center", "yanchor": "bottom", "font": {"size": 14}}, {"xref": "paper", "yref": "paper", "x": 0.775, "y": 1.03, "showarrow": false, "text": "Training and Validation Loss", "xanchor": "center", "yanchor": "bottom", "font": {"size": 14}}]}, "data": [{"type": "scatter", "xaxis": "x", "yaxis": "y", "x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], "y": [0.625, 0.75, 0.875, 0.875, 0.875, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], "name": "Training Accuracy", "mode": "lines+markers", "marker": {"color": "#1f77b4"}}, {"type": "scatter", "xaxis": "x", "yaxis": "y", "x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], "y": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], "name": "Validation Accuracy", "mode": "lines+markers", "marker": {"color": "#ff7f0e"}}, {"type": "scatter", "xaxis": "x2", "yaxis": "y2", "x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], "y": [0.6824744343757629, 0.6536840199828148, 0.6198422312736511, 0.5794422030448914, 0.5321754217147827, 0.47910547256469727, 0.42173120379447937, 0.3622015118598938, 0.3033258020877838, 0.24836888909339905, 0.20007747411727905, 0.1600513458251953, 0.1280420422554016, 0.10294786095619202, 0.08333367854356766, 0.06800711154937744, 0.056001797318458557, 0.04653342440724373, 0.038988735526800156, 0.03292558714747429, 0.028001997619867325, 0.023957084864377975, 0.02061031386256218, 0.01782264932990074, 0.01548493653535843, 0.013511774122714996, 0.011837894096970558, 0.010411089286208153, 0.009190460667014122, 0.008141844533383846], "name": "Training Loss", "mode": "lines+markers", "marker": {"color": "#1f77b4"}}, {"type": "scatter", "xaxis": "x2", "yaxis": "y2", "x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], "y": [0.712626039981842, 0.6993573904037476, 0.6869385242462158, 0.6756075024604797, 0.6648035049438477, 0.6540269255638123, 0.6425734162330627, 0.6297341585159302, 0.6147915124893188, 0.5973262786865234, 0.5771406292915344, 0.554181694984436, 0.528782308101654, 0.5015338659286499, 0.47317805886268616, 0.4445108473300934, 0.41624969244003296, 0.38901472091674805, 0.36323219537734985, 0.33918818831443787, 0.3170371949672699, 0.2968186140060425, 0.27847063541412354, 0.26186615228652954, 0.24684034287929535, 0.2332199513912201, 0.22083988785743713, 0.20955359935760498, 0.19923369586467743, 0.18977202475070953], "name": "Validation Loss", "mode": "lines+markers", "marker": {"color": "#ff7f0e"}}]}Training and validation accuracy and loss curves over training epochs.In this simple example with easily separable data, the accuracy quickly reaches 1.0 (or 100%). On more realistic datasets, you'd expect a more gradual increase and potential signs of overfitting (divergence between training and validation curves).Making PredictionsFinally, let's see how to use the trained model to predict the sentiment of new, unseen text. Remember to apply the same preprocessing steps (tokenization and padding) to the new data.new_texts = [ "it was truly great", "a complete waste of time", "amazing film loved it" ] # Preprocess the new texts new_sequences = tokenizer.texts_to_sequences(new_texts) new_padded = pad_sequences(new_sequences, maxlen=max_length, padding='post', truncating='post') print("\nNew padded sequences:") print(new_padded) # Get predictions (probabilities) predictions = model.predict(new_padded) print("\nRaw Predictions (Probabilities):") print(predictions) # Interpret predictions (threshold at 0.5) predicted_labels = (predictions > 0.5).astype(int).flatten() # flatten converts [[0],[1]] to [0,1] print("\nPredicted Labels (0=Negative, 1=Positive):") for text, label in zip(new_texts, predicted_labels): sentiment = "Positive" if label == 1 else "Negative" print(f"'{text}' -> {sentiment}")The output shows the probability assigned by the model to the positive class (values closer to 1 indicate positive sentiment, closer to 0 indicate negative) and the final predicted label based on a 0.5 threshold.Experimentation and Next StepsThis example provides a basic framework. You are encouraged to experiment:Swap Recurrent Layers: Replace SimpleRNN with LSTM or GRU in the model definition. Observe if there's any difference in training speed or final performance (though this dataset is too simple to see significant differences related to vanishing gradients).# Example using LSTM # model = Sequential([ # Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_length), # LSTM(units=rnn_units), # Replace SimpleRNN with LSTM # Dense(units=1, activation='sigmoid') # ])Adjust Hyperparameters: Change embedding_dim, rnn_units, learning_rate, batch_size, or num_epochs and retrain the model.Stack Layers: Try stacking multiple recurrent layers (remembering to set return_sequences=True on all but the last recurrent layer).Try a Different Task: Adapt the structure for a different sequence task, perhaps multi-class classification or even a simple character-level generation model (though that requires more significant changes).Use Real Data: Apply this process to a larger, more realistic dataset like the IMDB movie reviews dataset available in tensorflow_datasets.This practical exercise demonstrated the end-to-end process of building and training a simple sequence model for text classification. You now have the foundational code structure to tackle more complex sequence processing tasks using RNNs, LSTMs, or GRUs.