Chapter 3: Advanced Dimension Handling

In the previous chapter, we defined the structural components of Star and Snowflake schemas. While those models work well for static data, production environments deal with attributes that shift over time. Customers change addresses, product categories get reorganized, and sales territories shift. A rigid schema that assumes permanence will eventually yield incorrect historical reports.

This chapter focuses on the mechanisms required to manage changing dimension data and specialized schema requirements. We begin by separating natural keys derived from operational systems from surrogate keys. Implementing a system-generated surrogate key, represented as an integer $k$ , decouples your analytical model from source system dependencies and improves join performance.

We then examine Slowly Changing Dimensions (SCD). You will learn to distinguish between scenarios requiring a Type 1 update, where data is simply overwritten, and Type 2 updates, where history is preserved by inserting new rows. This distinction ensures that a query regarding revenue at time $t$ accurately reflects the dimension attributes as they existed at that moment, rather than their current state.

The section concludes with specialized dimension patterns. We will build a Date dimension to support non-standard calendar logic, such as fiscal quarters, and implement role-playing dimensions. This allows a single physical table to serve multiple logical purposes in a query, such as using one Date table to filter facts by both order_date and ship_date.

Sections

3.1 Surrogate Keys vs Natural Keys
3.2 Slowly Changing Dimensions Type 1
3.3 Slowly Changing Dimensions Type 2
3.4 Date and Time Dimensions
3.5 Role-Playing Dimensions
3.6 Hands-on Practical: Implementing SCD Type 2