An LSTM model is built and trained using Keras for text classification, a common task in Natural Language Processing (NLP). The popular IMDB movie review dataset is used, with the goal of classifying reviews as either positive or negative based on their text content. This implementation demonstrates sequence data preparation, embedding layers, and recurrent layer implementation.Setting the Stage: The IMDB DatasetThe IMDB dataset consists of 50,000 movie reviews, pre-split into 25,000 for training and 25,000 for testing. Each review is labeled as positive (1) or negative (0). Keras provides convenient access to this dataset, already preprocessed into sequences of word indices (integers). Each integer represents a specific word in a dictionary.Let's start by loading the data. We'll limit our vocabulary to the top 10,000 most frequent words to keep the input manageable.import keras from keras.datasets import imdb from keras import utils # Load the dataset, keeping only the top 10,000 most frequent words vocabulary_size = 10000 (train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=vocabulary_size) print(f"Number of training samples: {len(train_data)}") print(f"Number of test samples: {len(test_data)}") # Example: Look at the first review (sequence of word indices) print(f"First training review (indices): {train_data[0][:15]}...") print(f"Label for first review: {train_labels[0]}")You'll notice each train_data and test_data sample is a list of integers. Also, the reviews have varying lengths.Preparing the Sequence DataRecurrent neural networks require inputs to have a consistent sequence length. We need to pad (or truncate) the reviews so that every sequence has the same number of elements. We'll use Keras's pad_sequences utility. Let's choose a maximum sequence length, say 250 words. Reviews shorter than this will be padded with zeros at the beginning (padding='pre'), and reviews longer than this will be truncated.# Set the maximum length for each review sequence max_sequence_length = 250 # Pad sequences padded_train_data = utils.pad_sequences(train_data, maxlen=max_sequence_length, padding='pre') padded_test_data = utils.pad_sequences(test_data, maxlen=max_sequence_length, padding='pre') print(f"Shape of padded training data: {padded_train_data.shape}") print(f"Shape of padded test data: {padded_test_data.shape}") # Example: Look at the first padded review print(f"First padded training review: {padded_train_data[0]}")Now, padded_train_data and padded_test_data are 2D tensors of shape (num_samples, max_sequence_length).Building the LSTM ModelWe'll construct a simple sequential model:Embedding Layer: This layer takes the integer-encoded vocabulary and learns an embedding vector for each word. It maps each word index to a dense vector of a fixed size (embedding_dim). This layer requires the input_dim (vocabulary size) and output_dim (embedding dimension).LSTM Layer: This is the core recurrent layer that processes the sequence of embedding vectors. We specify the number of LSTM units (neurons) in this layer. These units capture contextual information from the sequence.Dense Layer: A final fully connected layer with a single unit and a sigmoid activation function. This outputs a probability between 0 and 1, representing the likelihood of the review being positive.Let's define this model using the Keras Sequential API.from keras import layers from keras import models embedding_dim = 32 # Dimension of the word embedding vectors lstm_units = 32 # Number of units in the LSTM layer model = models.Sequential(name="imdb_lstm_classifier") model.add(layers.Embedding(input_dim=vocabulary_size, output_dim=embedding_dim, input_length=max_sequence_length, name="word_embedding")) model.add(layers.LSTM(units=lstm_units, name="lstm_layer")) model.add(layers.Dense(units=1, activation='sigmoid', name="output_classifier")) model.summary()The model.summary() output shows the layers, their output shapes, and the number of parameters. Notice the large number of parameters in the Embedding layer (vocabulary_size * embedding_dim) and the LSTM layer.Compiling the ModelBefore training, we need to configure the learning process using the compile() method. We specify:Optimizer: adam is a widely used and effective optimization algorithm.Loss Function: binary_crossentropy is suitable for binary classification problems where the output is a probability.Metrics: We'll monitor accuracy during training and evaluation.model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) print("Model compiled successfully.")Training the ModelNow we train the model using the fit() method. We provide the padded training data and labels. We also set:epochs: The number of times to iterate over the entire training dataset.batch_size: The number of samples per gradient update.validation_split: A fraction of the training data to be used as validation data. The model's performance on this set is monitored at the end of each epoch, helping us detect overfitting.num_epochs = 10 batch_size = 128 validation_fraction = 0.2 print("Starting training...") history = model.fit(padded_train_data, train_labels, epochs=num_epochs, batch_size=batch_size, validation_split=validation_fraction, verbose=1) # Set verbose=1 or 2 to see progress per epoch print("Training finished.")The fit() method returns a History object containing training and validation loss and metrics for each epoch. We can use this to visualize the training process.{"layout": {"title": "Model Training History", "xaxis": {"title": "Epoch"}, "yaxis": {"title": "Accuracy"}, "legend": {"title": "Metric"}, "width": 700, "height": 400}, "data": [{"x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "y": [0.75, 0.88, 0.92, 0.95, 0.96, 0.97, 0.98, 0.99, 0.99, 0.99], "mode": "lines+markers", "name": "Training Accuracy", "line": {"color": "#4263eb"}}, {"x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "y": [0.85, 0.87, 0.86, 0.88, 0.87, 0.86, 0.85, 0.86, 0.85, 0.85], "mode": "lines+markers", "name": "Validation Accuracy", "line": {"color": "#fd7e14"}}]}Training and validation accuracy over epochs. Note how validation accuracy often plateaus or even decreases while training accuracy continues to rise, indicating potential overfitting.{"layout": {"title": "Model Training History", "xaxis": {"title": "Epoch"}, "yaxis": {"title": "Loss"}, "legend": {"title": "Metric"}, "width": 700, "height": 400}, "data": [{"x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "y": [0.50, 0.30, 0.22, 0.17, 0.13, 0.10, 0.08, 0.06, 0.05, 0.04], "mode": "lines+markers", "name": "Training Loss", "line": {"color": "#4263eb"}}, {"x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "y": [0.35, 0.31, 0.33, 0.30, 0.34, 0.38, 0.42, 0.45, 0.49, 0.53], "mode": "lines+markers", "name": "Validation Loss", "line": {"color": "#fd7e14"}}]}Training and validation loss over epochs. An increasing validation loss alongside decreasing training loss is a clear sign of overfitting.Evaluating the ModelFinally, let's evaluate the performance of our trained model on the unseen test data using the evaluate() method.loss, accuracy = model.evaluate(padded_test_data, test_labels, verbose=0) print(f"\nTest Loss: {loss:.4f}") print(f"Test Accuracy: {accuracy:.4f}")This gives us the final performance metrics on data the model has never encountered during training. You should typically see an accuracy significantly better than random guessing (50%) for this task.Summary and Further StepsIn this practice section, you successfully:Loaded and preprocessed text data (IMDB reviews) for sequence modeling.Applied padding to ensure uniform sequence lengths.Built a text classification model using Keras with an Embedding layer and an LSTM layer.Compiled, trained, and monitored the model's performance using validation data.Evaluated the final model on the test set.This provides a solid foundation for applying RNNs and LSTMs to sequence-based problems. From here, you could experiment with:Using GRU layers instead of LSTM.Stacking multiple recurrent layers.Using Bidirectional LSTMs (keras.layers.Bidirectional) to process sequences in both forward and backward directions.Adjusting hyperparameters like embedding dimension, LSTM units, batch size, and optimizer.Applying techniques discussed in the next chapter, such as dropout regularization or early stopping, to combat overfitting and potentially improve test accuracy.