Strategies for managing data updates often fall short when history matters. While overwriting data (Type 1) ensures your dimension tables reflect the current state, it destroys the context required for accurate historical reporting. If a sales representative transfers from the North region to the South region, a Type 1 update retroactively moves all their past sales to the South. This distorts regional performance metrics. To maintain an accurate historical record, we employ the Slowly Changing Dimension Type 2 (SCD Type 2) strategy.
SCD Type 2 is the most common technique for tracking historical data in a data warehouse. Instead of overwriting the existing record when an attribute changes, this method retains the old record and inserts a new record to represent the new state.
This approach relies on row versioning. A single entity from the source system, such as a specific customer or product, will span multiple rows in your dimension table. Each row represents a specific time slice in that entity's history. To manage these slices effectively, the schema requires three specific metadata columns:
NULL or a high value future date (e.g., 9999-12-31).The introduction of versioning makes the surrogate key mandatory. The natural key (e.g., employee_id) is no longer unique in the dimension table because the same employee appears multiple times. The surrogate key acts as the primary key for the table and creates a unique identifier for every version of that employee.
Consider a scenario where a customer moves locations. In an operational system, the address is simply updated. In a dimensional model using SCD Type 2, the process involves expiring the old record and creating a new active record.
The transition of a dimension row during an SCD Type 2 update. The original record (SK 101) is closed by updating its end date, and a new record (SK 102) is inserted with the new region.
Implementing SCD Type 2 requires precise logic during the ETL (Extract, Transform, Load) process. When new data arrives, the system compares the incoming attributes against the current version in the dimension table.
If a change is detected, two distinct operations occur within the same transaction:
is_current flag is true. It updates the effective_end_date to the current timestamp and sets is_current to false.effective_start_date is set to the current timestamp, effective_end_date is set to high-value logic (e.g., 9999-12-31), and is_current is set to true.Mathematically, for any query querying time t, a dimension row is valid if:
tstart≤t<tend
Where tend for the current active row is considered infinite. This logic ensures that at any specific microsecond in time, exactly one version of the dimension is valid.
The strength of the SCD Type 2 pattern lies in how it interacts with Fact tables. When a transaction occurs, such as a sale or a login event, the ETL process looks up the dimension table to find the surrogate key active at that specific moment.
For example, if a sales transaction occurs on June 1st, it links to Surrogate Key 101 (Region: East). If a transaction occurs on July 1st, it links to Surrogate Key 102 (Region: West).
Once this link is established in the Fact table, the complexity of dates disappears for the data analyst. The analyst simply joins the Fact table to the Dimension table on the Surrogate Key.
SELECT
d.region,
SUM(f.sales_amount) as total_revenue
FROM fact_sales f
JOIN dim_customer d ON f.customer_sk = d.customer_sk
GROUP BY d.region;
In this query, the database automatically attributes the revenue to the correct historical region. Sales made while the customer lived in the East fall under "East," and sales made after the move fall under "West." No complex WHERE clauses or date arithmetic are required in the reporting layer.
While SCD Type 2 provides the highest fidelity for historical analysis, it comes with storage and processing costs.
last_login_time or age using Type 2, you will generate a new row every day for every user. These attributes are better suited for a Type 1 update (overwrite) or moving them to a separate fact table.A strong SCD Type 2 table definition typically looks like the following SQL structure. Note the use of indices to optimize the lookup of the current active record, which is the most frequent operation during ETL processing.
CREATE TABLE dim_product (
product_sk INT PRIMARY KEY, -- Surrogate Key
product_id VARCHAR(50), -- Natural Key
product_name VARCHAR(255),
category VARCHAR(100),
price DECIMAL(10, 2),
-- SCD Type 2 Metadata
effective_from TIMESTAMP NOT NULL,
effective_to TIMESTAMP DEFAULT '9999-12-31',
is_current BOOLEAN DEFAULT TRUE
);
-- Index for ETL lookup performance
CREATE INDEX idx_product_lookup
ON dim_product (product_id)
WHERE is_current = TRUE;
This design creates a system where history is immutable. Once a record is closed, it serves as a permanent snapshot of the past, allowing for "as-was" reporting, analyzing data exactly as it appeared at a previous point in time.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with