Recurrent Neural Networks, LSTMs, and GRUs expect input data formatted as numerical tensors, often with specific shapes like (batch,time_steps,features). However, real-world sequence data, such as raw text or time series measurements, rarely comes in this ready-to-use format. This chapter focuses on the necessary steps to bridge this gap.
You will learn standard techniques for transforming sequence data into a structure suitable for recurrent models. For text data, this involves steps like:
For time series data, we will cover techniques such as normalization and creating sliding windows of observations. Finally, we will discuss how to structure the prepared data into batches for efficient model training. By the end of this chapter, you will be able to build pipelines to preprocess typical sequence datasets for input into RNNs.
8.1 Text Data Preprocessing Overview
8.2 Tokenization and Vocabulary Building
8.3 Integer Encoding Sequences
8.4 Introduction to Embedding Layers
8.5 Handling Variable Length Sequences
8.6 Padding Sequences
8.7 Masking Padded Values
8.8 Batching Sequential Data
8.9 Preprocessing Time Series Data
8.10 Practice: Data Preparation Pipeline
© 2025 ApX Machine Learning