While text data requires tokenization and embedding, time series data presents its own set of preprocessing needs before it can be effectively used by Recurrent Neural Networks. Time series often consist of measurements taken at successive points in time, potentially with multiple variables recorded at each step. Like other data types used in neural networks, numerical stability and consistent feature representation are important. Furthermore, we need to structure the continuous flow of time series data into discrete sequences suitable for RNN inputs.
This section covers two primary preprocessing steps for time series data: normalization and windowing. We'll also touch upon handling features and potential missing values.
Neural networks, including RNNs, generally train more effectively when input features are on a similar scale. Time series data can easily have features with vastly different ranges (e.g., temperature in Celsius vs. atmospheric pressure in Pascals). Large input values can lead to large gradients, potentially causing unstable training (exploding gradients), while very small values might slow down learning.
Two common scaling techniques are Min-Max Scaling and Standardization:
This method scales the data to a fixed range, usually [0, 1] or [-1, 1]. The formula for scaling to [0, 1] is:
Xscaled=Xmax−XminX−XminWhere Xmin and Xmax are the minimum and maximum values of the feature in the training dataset.
This method scales data to have a mean of 0 and a standard deviation of 1. The formula is:
Xscaled=σX−μWhere μ is the mean and σ is the standard deviation of the feature in the training dataset.
A significant point in preprocessing time series (and most other machine learning data) is that you must fit the scaler (calculate μ, σ, Xmin, Xmax) only on the training data. You then use this fitted scaler to transform the training, validation, and test sets. Fitting the scaler on the entire dataset, including validation or test data, introduces information leakage from the future or unseen data into the training process, leading to overly optimistic performance estimates.
# Conceptual Python example using scikit-learn
from sklearn.preprocessing import StandardScaler
import numpy as np
# Assume train_data, validation_data, test_data are NumPy arrays
# Shape might be (num_samples, num_features) before windowing
scaler = StandardScaler()
# Fit ONLY on training data
scaler.fit(train_data)
# Apply the *same* fitted scaler to all datasets
train_scaled = scaler.transform(train_data)
validation_scaled = scaler.transform(validation_data)
test_scaled = scaler.transform(test_data)
# Later, when making predictions on new data, use the same scaler:
# new_data_scaled = scaler.transform(new_data)
# To revert predictions back to original scale (if needed):
# predictions_original_scale = scaler.inverse_transform(predictions_scaled)
RNNs process data sequentially. For a typical time series forecasting task, we don't feed the entire history as one sequence. Instead, we use a "sliding window" approach to generate multiple, smaller input sequences and their corresponding target values.
Imagine you have a univariate time series (one measurement per time step): [10, 20, 30, 40, 50, 60, 70, 80]
.
We need to decide:
Let's choose an input window size Nin=3 and an output horizon Nout=1. We slide this window across the data:
[10, 20, 30]
-> Target [40]
[20, 30, 40]
-> Target [50]
[30, 40, 50]
-> Target [60]
[40, 50, 60]
-> Target [70]
[50, 60, 70]
-> Target [80]
Each "Input -> Target" pair becomes one sample for training or evaluation. The input part will form the time_steps
dimension of our RNN input tensor.
N_{in}
) are used to predict a single future step (N_{out}=1
). Common for simple forecasting. The RNN typically processes the input sequence, and a Dense layer is applied to the final hidden state to produce the output prediction.N_{in}
) predict multiple future steps (N_{out} > 1
). For example, use 3 past steps to predict the next 2 steps.
[10, 20, 30]
-> Target [40, 50]
return_sequences=True
in Keras/TensorFlow for intermediate layers, or using specific decoder structures).The choice depends entirely on the problem you are trying to solve. For forecasting, Many-to-One (predicting the next step) and Many-to-Many (predicting multiple future steps) are frequent patterns.
This is often done using basic array manipulation, for instance, with NumPy slicing or dedicated library functions (like tf.keras.utils.timeseries_dataset_from_array
in TensorFlow).
# Conceptual Python function for windowing (Many-to-One)
def create_windows(data, input_width, label_width, shift):
"""
Creates windowed data for time series forecasting.
Args:
data: The time series data (NumPy array, typically 2D: time x features).
input_width: The number of time steps in the input window.
label_width: The number of time steps in the output label window (usually 1 for simple next-step prediction).
shift: The offset between the end of the input window and the start of the label window.
For predicting the immediate next step(s), shift=label_width.
Returns:
inputs: A list or array of input windows.
labels: A list or array of corresponding labels.
"""
inputs = []
labels = []
total_window_size = input_width + shift
end_index = len(data) - total_window_size + 1
for i in range(end_index):
input_slice = data[i : i + input_width]
label_start_index = i + input_width + shift - label_width
label_slice = data[label_start_index : label_start_index + label_width]
inputs.append(input_slice)
labels.append(label_slice)
return np.array(inputs), np.array(labels)
# Example Usage:
# Assuming 'scaled_data' is our preprocessed time series (e.g., shape [1000, 1] or [1000, num_features])
INPUT_WIDTH = 24 # Use 24 hours of past data
LABEL_WIDTH = 1 # Predict 1 hour ahead
SHIFT = 1 # Predict the step immediately following the input window
inputs, labels = create_windows(scaled_data, INPUT_WIDTH, LABEL_WIDTH, SHIFT)
# Resulting shapes (approximate, depends on total data length):
# inputs.shape -> (num_samples, INPUT_WIDTH, num_features) e.g., (976, 24, 1)
# labels.shape -> (num_samples, LABEL_WIDTH, num_features) e.g., (976, 1, 1)
# If label_width is 1, often labels are squeezed: (num_samples, num_features)
This inputs
array now has the shape (num_samples, time_steps, num_features)
, which is exactly what RNN layers expect (after grouping into batches).
features
dimension in your input tensor. Ensure all features are appropriately scaled (usually using the same method, like Standardization, fitting independently per feature on the training data). The windowing process remains the same, but each time step in the window will contain multiple feature values.Real-world time series often contain missing values. Common strategies include:
It's generally best to handle missing values before scaling and windowing, although the specific choice depends on the nature of the data and the extent of missingness. If using imputation, remember that this also introduces assumptions into your data. Masking (covered earlier for text padding) can sometimes be adapted for missing steps, but standard RNN layers might not directly support it without custom implementations or specific framework features.
By applying normalization and windowing, you transform raw time series measurements into structured, scaled sequences ready for training powerful recurrent models like LSTMs and GRUs. Remember to apply these steps consistently across your training, validation, and test datasets, always deriving scaling parameters and imputation statistics solely from the training portion.
© 2025 ApX Machine Learning