After data is extracted from various sources and transformed to meet quality standards and business requirements, it needs to be loaded into a target system. This target system, such as a data warehouse, a database, or a data lake, enables the data to be used for analysis, reporting, or powering applications.
However, you can't just dump the data anywhere. Target systems have specific structures and rules for how data must be organized. This structure is defined by the target schema. Think of a schema as the blueprint for your data's destination. It dictates the layout, the types of data allowed, and the constraints that must be followed. Understanding this schema is absolutely fundamental before you attempt the 'Load' step.
In the context of relational databases or data warehouses (common targets for ETL), a schema typically defines elements like:
customers, products, and orders.customers table, columns might include customer_id, first_name, last_name, email, and registration_date.INTEGER (whole numbers), VARCHAR (variable-length text), CHAR (fixed-length text), DATE, TIMESTAMP, BOOLEAN (true/false), and DECIMAL or NUMERIC (for numbers with fractional parts, like currency). The data type often includes size limits, like VARCHAR(100) meaning text up to 100 characters.PRIMARY KEY: Uniquely identifies each row in a table (e.g., customer_id). Cannot be null.FOREIGN KEY: Links a column in one table to a PRIMARY KEY in another table, enforcing relationships (e.g., linking orders.customer_id to customers.customer_id).NOT NULL: Ensures a column must always have a value; it cannot be empty.UNIQUE: Ensures all values in a column are distinct across different rows.CHECK: Enforces a specific condition (e.g., order_quantity > 0).Let's look at a simple example schema for an orders table in a database:
A diagram representing the schema for an
orderstable, showing column names, their data types, and applied constraints like primary and foreign keys.
In this example:
orders.order_id is an integer and serves as the primary key, meaning each order must have a unique ID.customer_id is an integer that links to the customers table.order_date must be a valid date and cannot be left empty (NOT NULL).total_amount is a decimal number allowing up to 10 total digits, with 2 digits after the decimal point (suitable for currency). It can be left empty (NULL).The target schema acts as the gatekeeper during the loading process. Your transformed data must precisely conform to this schema:
total_amount (DECIMAL) column will cause an error. Similarly, loading "2023-Feb-28" into the order_date (DATE) column might fail if the target system expects a 'YYYY-MM-DD' format.order_date because of the NOT NULL constraint. You cannot load an order with an order_id that already exists due to the PRIMARY KEY constraint. If customer_id is a foreign key, you usually cannot load an order with a customer_id that doesn't exist in the customers table.VARCHAR(100)). Loading text longer than the limit might result in an error or data truncation (loss of characters), depending on the system configuration.Ignoring the target schema is like trying to fit square pegs into round holes. It inevitably leads to problems during the load process. Errors might include load job failures, data being rejected, data truncation, or subtle data corruption if type conversions happen implicitly but incorrectly.
Therefore, before initiating any data load, you must thoroughly understand the structure, data types, and constraints of your target system's schema. This knowledge guides the final steps of your transformation stage (ensuring data types and formats match the target) and is essential for configuring the load process correctly. The next section, "Schema Mapping," will build directly on this understanding, showing how to explicitly connect your transformed data fields to the target schema columns.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with