Neural networks, at their core, are mathematical machines. They process numbers, specifically arrays or tensors of numbers. Therefore, a fundamental step in applying neural networks to any problem is transforming your raw input data into a suitable numerical format. This process depends heavily on the type of data you're working with.
Think about the different kinds of information you might want a network to learn from: customer age, product category, pixel values in an image, words in a sentence. Each requires a specific representation strategy.
Numerical features are often the most straightforward to handle, as they are already numbers. These can be:
While already numerical, these features often require further processing, specifically scaling, which we'll cover in the next section ("Feature Scaling: Normalization and Standardization"). For now, the important point is that their inherent numerical nature makes them directly compatible with the network's input requirements, usually represented as floating-point numbers in an input vector or matrix.
For instance, if you have features like age
and income
, a single data point might be represented as a vector like [35, 55000.0]
. A collection of data points would form a matrix where each row corresponds to a sample and each column to a feature.
Categorical features represent qualities or characteristics that fall into distinct groups or categories. Examples include product types ('Electronics', 'Clothing', 'Groceries'), user satisfaction ('High', 'Medium', 'Low'), or colors ('Red', 'Green', 'Blue'). These cannot be fed directly into most neural network models because the network doesn't understand string labels like 'Clothing'. We need to convert them into numbers.
There are two main types:
Several techniques exist for encoding categorical features:
This method assigns a unique integer to each category. For example:
For ordinal features like ('Low', 'Medium', 'High'), this might seem appropriate ('Low' -> 0, 'Medium' -> 1, 'High' -> 2), as it preserves the order. However, applying label encoding to nominal features can be problematic. The network might incorrectly interpret the assigned integers as having an order or magnitude relationship (e.g., Blue=2 is "twice" Green=1), which doesn't exist. This can introduce unintended bias into the learning process.
This is the most common and generally preferred method for nominal categorical features, and often for ordinal ones too unless the ordinal relationship is specifically something you want the model to learn numerically (which can be tricky).
One-Hot Encoding transforms each categorical feature with k distinct categories into k separate binary (0 or 1) features. For each data point, exactly one of these new features will be 1 (indicating the presence of that category), and all others will be 0.
Consider a feature Color
with categories {'Red', 'Green', 'Blue'}. One-hot encoding would create three new features: Is_Red
, Is_Green
, Is_Blue
.
Original Color | Is_Red | Is_Green | Is_Blue |
---|---|---|---|
Red | 1 | 0 | 0 |
Green | 0 | 1 | 0 |
Blue | 0 | 0 | 0 |
Green | 0 | 1 | 0 |
Transformation of a single categorical feature into multiple binary features using One-Hot Encoding.
This method avoids introducing artificial order. However, it can significantly increase the dimensionality (number of features) if the original feature has many unique categories, leading to sparsity (many zeros) in the input data.
While numerical and categorical features are common in tabular datasets, neural networks are also applied to other data structures:
Regardless of the original data type, the ultimate goal of input data representation is to convert your raw data into numerical tensors (multi-dimensional arrays) that the neural network can process. For typical feedforward networks working with tabular data, this often means converting each data sample (row) into a flat vector of numbers, and the entire dataset into a 2D matrix (samples x features).
Understanding how to correctly represent your inputs is the first step towards building effective neural network models. Subsequent sections in this chapter will build upon this foundation, discussing how to scale these numerical representations and structure them for efficient training.
© 2025 ApX Machine Learning