Sentiment analysis, a common sequence modeling task, involves building and training a model. This model classifies text reviews as positive or negative using either an LSTM or a GRU layer. The development process demonstrates applying framework APIs, handling sequence data, and constructing a complete model.We assume you have a basic understanding of text preprocessing steps like tokenization and padding, which are covered in detail in Chapter 8. Here, we'll focus on integrating these steps with LSTM/GRU model implementation.Setting the Stage: The IMDB DatasetWe'll use the popular IMDB dataset, which contains 50,000 movie reviews labeled as either positive (1) or negative (0). This dataset is often included directly within deep learning frameworks, making it convenient to access.# Example using TensorFlow/Keras import tensorflow as tf from tensorflow import keras # Load the dataset, keeping only the top N most frequent words VOCAB_SIZE = 10000 (train_data, train_labels), (test_data, test_labels) = keras.datasets.imdb.load_data(num_words=VOCAB_SIZE) print(f"Training entries: {len(train_data)}, labels: {len(train_labels)}") print(f"Sample review (integer encoded): {train_data[0][:20]}...")The data is already integer-encoded, where each integer represents a specific word in the dataset's vocabulary.Preparing the DataRecurrent networks require inputs of uniform length. Since movie reviews vary in length, we need to pad or truncate them to a fixed size. We'll use post-padding, meaning we add zeros at the end of shorter sequences. Masking (often handled automatically by framework layers) will ensure these padded values are ignored during computation.# Pad sequences to a maximum length MAX_SEQUENCE_LENGTH = 256 train_data_padded = keras.preprocessing.sequence.pad_sequences( train_data, value=0, # Pad value padding='post', # Pad at the end maxlen=MAX_SEQUENCE_LENGTH ) test_data_padded = keras.preprocessing.sequence.pad_sequences( test_data, value=0, padding='post', maxlen=MAX_SEQUENCE_LENGTH ) print(f"Sample padded review length: {len(train_data_padded[0])}") print(f"Sample padded review: {train_data_padded[0][:30]}...")Building the Sentiment Analysis ModelNow, let's define our model architecture using the Keras Sequential API.Embedding Layer: This layer takes the integer-encoded vocabulary indices and looks up a corresponding dense vector representation (embedding) for each word. It learns these embeddings during training. It expects input shaped (batch_size, sequence_length) and outputs (batch_size, sequence_length, embedding_dim).LSTM or GRU Layer: This is the core recurrent layer. It processes the sequence of embeddings. We'll start with an LSTM layer.Dense Output Layer: A single neuron with a sigmoid activation function outputs a value between 0 and 1, representing the probability of the review being positive.EMBEDDING_DIM = 16 RNN_UNITS = 32 # Number of units in the LSTM/GRU layer model = keras.Sequential([ keras.layers.Embedding(input_dim=VOCAB_SIZE, output_dim=EMBEDDING_DIM, mask_zero=True, # Important: Enables masking for padded values input_length=MAX_SEQUENCE_LENGTH), keras.layers.LSTM(RNN_UNITS), # You could replace LSTM with GRU here keras.layers.Dense(1, activation='sigmoid') # Output layer for binary classification ]) model.summary()The mask_zero=True argument in the Embedding layer is significant. It tells downstream layers (like the LSTM) to ignore time steps where the input was 0 (our padding value).Visualizing the Model Architecturedigraph G { rankdir=TB; node [shape=box, style="filled", fillcolor="#a5d8ff"]; edge [color="#495057"]; "Input (Batch, 256)" -> "Embedding (10k vocab, 16 dim)\nmask_zero=True" [label="Integer Sequences"]; "Embedding (10k vocab, 16 dim)\nmask_zero=True" -> "LSTM (32 units)" [label="(Batch, 256, 16)"]; "LSTM (32 units)" -> "Dense (1 unit, sigmoid)" [label="(Batch, 32)"]; "Dense (1 unit, sigmoid)" -> "Output (Batch, 1)" [label="Probability (Positive Sentiment)"]; }Basic model structure: Input -> Embedding -> LSTM -> Dense Output.Compiling the ModelBefore training, we need to configure the learning process using compile. We specify the optimizer, the loss function, and metrics to monitor.Optimizer: adam is a generally good starting choice.Loss Function: binary_crossentropy is appropriate for binary (0/1) classification problems with a sigmoid output.Metrics: accuracy allows us to track the percentage of correctly classified reviews during training and evaluation.model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])Training the ModelWe can now train the model using the fit method, providing the padded training data and labels. We also set aside a portion of the training data for validation during training to monitor performance on unseen data and check for overfitting.EPOCHS = 10 BATCH_SIZE = 512 # Create a validation set from the training data validation_split = 0.2 num_validation_samples = int(validation_split * len(train_data_padded)) x_val = train_data_padded[:num_validation_samples] partial_x_train = train_data_padded[num_validation_samples:] y_val = train_labels[:num_validation_samples] partial_y_train = train_labels[num_validation_samples:] print("Training the model...") history = model.fit(partial_x_train, partial_y_train, epochs=EPOCHS, batch_size=BATCH_SIZE, validation_data=(x_val, y_val), verbose=1) # Set verbose=1 or 2 to see progress per epoch print("Training complete.")Evaluating PerformanceAfter training, we evaluate the model's performance on the held-out test set. We can also visualize the training and validation accuracy and loss over epochs to understand the learning dynamics.print("\nEvaluating on test data...") results = model.evaluate(test_data_padded, test_labels, verbose=0) print(f"Test Loss: {results[0]:.4f}") print(f"Test Accuracy: {results[1]:.4f}") # Plotting training history (requires Plotly) import plotly.graph_objects as go from plotly.subplots import make_subplots history_dict = history.history acc = history_dict['accuracy'] val_acc = history_dict['val_accuracy'] loss = history_dict['loss'] val_loss = history_dict['val_loss'] epochs_range = range(1, EPOCHS + 1) fig = make_subplots(rows=1, cols=2, subplot_titles=("Training and Validation Loss", "Training and Validation Accuracy")) fig.add_trace(go.Scatter(x=list(epochs_range), y=loss, name='Training Loss', mode='lines+markers', line=dict(color='#4263eb')), row=1, col=1) fig.add_trace(go.Scatter(x=list(epochs_range), y=val_loss, name='Validation Loss', mode='lines+markers', line=dict(color='#f76707')), row=1, col=1) fig.add_trace(go.Scatter(x=list(epochs_range), y=acc, name='Training Accuracy', mode='lines+markers', line=dict(color='#12b886')), row=1, col=2) fig.add_trace(go.Scatter(x=list(epochs_range), y=val_acc, name='Validation Accuracy', mode='lines+markers', line=dict(color='#ae3ec9')), row=1, col=2) fig.update_layout(height=400, width=800, title_text="Model Training History") fig.update_xaxes(title_text="Epochs", row=1, col=1) fig.update_xaxes(title_text="Epochs", row=1, col=2) fig.update_yaxes(title_text="Loss", row=1, col=1) fig.update_yaxes(title_text="Accuracy", row=1, col=2) # The following line generates the Plotly JSON output print(fig.to_json()){"layout": {"height": 400, "width": 800, "title": {"text": "Model Training History"}, "xaxis": {"title": {"text": "Epochs"}, "anchor": "y", "domain": [0.0, 0.45]}, "yaxis": {"title": {"text": "Loss"}, "anchor": "x", "domain": [0.0, 1.0]}, "xaxis2": {"title": {"text": "Epochs"}, "anchor": "y2", "domain": [0.55, 1.0]}, "yaxis2": {"title": {"text": "Accuracy"}, "anchor": "x2", "domain": [0.0, 1.0]}, "template": "plotly", "annotations": [{"text": "Training and Validation Loss", "showarrow": false, "xref": "paper", "yref": "paper", "x": 0.225, "y": 1.0}, {"text": "Training and Validation Accuracy", "showarrow": false, "xref": "paper", "yref": "paper", "x": 0.775, "y": 1.0}]}, "data": [{"x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "y": [0.68, 0.55, 0.42, 0.33, 0.27, 0.23, 0.19, 0.17, 0.15, 0.13], "name": "Training Loss", "mode": "lines+markers", "line": {"color": "#4263eb"}, "type": "scatter", "xaxis": "x", "yaxis": "y"}, {"x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "y": [0.62, 0.47, 0.38, 0.33, 0.31, 0.30, 0.31, 0.32, 0.33, 0.34], "name": "Validation Loss", "mode": "lines+markers", "line": {"color": "#f76707"}, "type": "scatter", "xaxis": "x", "yaxis": "y"}, {"x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "y": [0.58, 0.75, 0.83, 0.87, 0.90, 0.92, 0.93, 0.94, 0.95, 0.96], "name": "Training Accuracy", "mode": "lines+markers", "line": {"color": "#12b886"}, "type": "scatter", "xaxis": "x2", "yaxis": "y2"}, {"x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "y": [0.70, 0.80, 0.84, 0.86, 0.87, 0.88, 0.87, 0.87, 0.87, 0.86], "name": "Validation Accuracy", "mode": "lines+markers", "line": {"color": "#ae3ec9"}, "type": "scatter", "xaxis": "x2", "yaxis": "y2"}]}Example training history showing loss and accuracy curves for training and validation sets over epochs. (Note: Actual curve values are illustrative and depend on the specific training run).The plot helps identify potential overfitting (where training accuracy keeps improving, but validation accuracy plateaus or decreases) and determine if more training epochs are needed.Variations and Further StepsGRU: Try replacing keras.layers.LSTM(RNN_UNITS) with keras.layers.GRU(RNN_UNITS) and retrain. Compare the performance and training time.Stacked RNNs: To stack layers, ensure intermediate recurrent layers return the full sequence of outputs, not just the final output.model_stacked = keras.Sequential([ keras.layers.Embedding(VOCAB_SIZE, EMBEDDING_DIM, mask_zero=True, input_length=MAX_SEQUENCE_LENGTH), keras.layers.LSTM(RNN_UNITS, return_sequences=True), # Returns hidden state for each time step keras.layers.LSTM(RNN_UNITS), # This layer receives the sequence keras.layers.Dense(1, activation='sigmoid') ])Bidirectional RNNs: Wrap a recurrent layer with keras.layers.Bidirectional to process the input sequence in both forward and backward directions, potentially capturing context more effectively.model_bidirectional = keras.Sequential([ keras.layers.Embedding(VOCAB_SIZE, EMBEDDING_DIM, mask_zero=True, input_length=MAX_SEQUENCE_LENGTH), keras.layers.Bidirectional(keras.layers.LSTM(RNN_UNITS)), # Wrap the LSTM layer keras.layers.Dense(1, activation='sigmoid') ])Note that a Bidirectional layer typically doubles the output feature dimension (one set of features for forward, one for backward), unless configured otherwise.Hyperparameter Tuning: Experiment with EMBEDDING_DIM, RNN_UNITS, optimizer choice, learning rate, and BATCH_SIZE. Add Dropout for regularization (covered later).This practical example provides a concrete foundation for implementing LSTM and GRU models for sequence classification. You can adapt this structure for various other sequence-based tasks by modifying the input data preparation and the final output layer(s) of the model.