The fact table functions as the primary repository for measurements within the architecture of an analytical database. While dimension tables provide the descriptive context, the "who, what, where, and when", the fact table stores the quantitative data generated by business events. It sits at the center of the Star Schema, linking various dimensions together through foreign keys.A properly designed fact table is deep and narrow. It may contain billions of rows representing individual transactions or events, but it typically possesses fewer columns than the surrounding dimension tables. These columns generally fall into two categories: foreign keys that reference dimension tables and numeric measures that serve as the subject of analysis.Anatomy of a Fact RowEvery row in a fact table corresponds to a specific event defined by the grain. If the grain is defined as "one row per line item on a customer receipt," then the fact table must capture the specific metrics associated with that single line item, such as quantity sold and extended price.The structural integrity of the fact table relies on the relationship between its foreign keys and the primary keys of the dimension tables. The combination of these foreign keys defines the composite primary key of the fact table itself. This ensures that each record is unique within the defined grain.digraph G { rankdir=TB; node [shape=plaintext fontname="Arial" fontsize=12]; edge [color="#adb5bd" arrowsize=0.7]; fact_table [label=< <table border="0" cellborder="1" cellspacing="0" cellpadding="8" bgcolor="#ffffff"> <tr><td bgcolor="#1c7ed6"><font color="white"><b>Fact_Sales</b></font></td></tr> <tr><td align="left" port="d1">Date_Key (FK)</td></tr> <tr><td align="left" port="d2">Product_Key (FK)</td></tr> <tr><td align="left" port="d3">Store_Key (FK)</td></tr> <tr><td align="left" port="d4">Customer_Key (FK)</td></tr> <tr><td align="left" bgcolor="#f1f3f5"><i>Quantity_Sold</i></td></tr> <tr><td align="left" bgcolor="#f1f3f5"><i>Extended_Amount</i></td></tr> <tr><td align="left" bgcolor="#f1f3f5"><i>Discount_Value</i></td></tr> </table> >]; dim_date [label=< <table border="0" cellborder="1" cellspacing="0" cellpadding="6" bgcolor="#e7f5ff"> <tr><td bgcolor="#4dabf7"><font color="white"><b>Dim_Date</b></font></td></tr> <tr><td align="left">Date_Key (PK)</td></tr> </table> >]; dim_product [label=< <table border="0" cellborder="1" cellspacing="0" cellpadding="6" bgcolor="#e7f5ff"> <tr><td bgcolor="#4dabf7"><font color="white"><b>Dim_Product</b></font></td></tr> <tr><td align="left">Product_Key (PK)</td></tr> </table> >]; dim_store [label=< <table border="0" cellborder="1" cellspacing="0" cellpadding="6" bgcolor="#e7f5ff"> <tr><td bgcolor="#4dabf7"><font color="white"><b>Dim_Store</b></font></td></tr> <tr><td align="left">Store_Key (PK)</td></tr> </table> >]; dim_customer [label=< <table border="0" cellborder="1" cellspacing="0" cellpadding="6" bgcolor="#e7f5ff"> <tr><td bgcolor="#4dabf7"><font color="white"><b>Dim_Customer</b></font></td></tr> <tr><td align="left">Customer_Key (PK)</td></tr> </table> >]; dim_date -> fact_table:d1; dim_product -> fact_table:d2; dim_store -> fact_table:d3; dim_customer -> fact_table:d4; }The fact table structure acts as the convergence point for dimensions. It contains foreign keys (FK) pointing to dimensions and numerical measures (bottom three rows) derived from the transaction.Additivity of MeasuresA critical aspect of fact table design is understanding how measures behave when aggregated. Analysts frequently sum data across different dimensions, grouping sales by region, by month, or by product category. The behavior of a measure during this summarization defines its additivity.Fully Additive MeasuresThese measures can be summed across any and all dimensions associated with the fact table. This is the most flexible and desirable type of measure. For example, Quantity_Sold in a retail transaction table is fully additive. You can sum the quantity sold across stores, across days, or across products, and the result remains logically valid.$$Total_Quantity = \sum_{i=1}^{n} q_i$$Semi-Additive MeasuresSemi-additive measures can be summed across some dimensions but not others. The most common example is inventory snapshots or account balances.Consider a warehouse inventory fact table. You can sum the quantity of Product A across all warehouses to get the total stock for a specific day. However, you cannot sum the inventory of Product A across time. Adding the inventory count from January 1st to the inventory count from January 2nd results in a nonsensical figure that implies double the actual stock.To handle semi-additive facts correctly in SQL, you often use aggregation functions like SUM() for spatial dimensions (Warehouse, Region) but strictly use LAST_VALUE() or filtered queries for the temporal dimension (Time).{ "data": [ { "x": ["Warehouse A", "Warehouse B", "Total (Valid)"], "y": [100, 150, 250], "name": "Jan 1st Snapshot", "type": "bar", "marker": {"color": ["#339af0", "#339af0", "#51cf66"]} }, { "x": ["Jan 1st", "Jan 2nd", "Sum (Invalid)"], "y": [100, 110, 210], "name": "Warehouse A History", "type": "bar", "xaxis": "x2", "yaxis": "y2", "marker": {"color": ["#339af0", "#339af0", "#ff6b6b"]} } ], "layout": { "grid": {"rows": 1, "columns": 2, "pattern": "independent"}, "title": "Semi-Additive Aggregation Logic", "annotations": [ { "text": "Spatial Aggregation (OK)", "x": 0.2, "y": 260, "showarrow": false, "xref": "paper", "yref": "y" }, { "text": "Temporal Aggregation (Error)", "x": 0.8, "y": 220, "showarrow": false, "xref": "paper", "yref": "y2" } ], "showlegend": false, "height": 350, "margin": {"t": 50, "b": 40, "l": 40, "r": 40} } }Aggregating inventory balances across locations (left) yields a valid total. Aggregating balances across time (right) creates incorrect data, indicated by the red bar.Non-Additive MeasuresNon-additive measures cannot be summed across any dimension. These are typically ratios, percentages, or unit prices. For instance, Gross_Margin_Percentage cannot be aggregated by summing the percentages of individual transactions.To analyze non-additive measures at a higher grain, you must store the underlying components that make up the ratio (e.g., Total_Revenue and Total_Cost) and calculate the ratio after aggregation.$$\text{Correct Margin} = \frac{\sum (\text{Revenue} - \text{Cost})}{\sum \text{Revenue}}$$$$\text{Incorrect Margin} \neq \sum \left( \frac{\text{Revenue} - \text{Cost}}{\text{Revenue}} \right)$$Handling Null Foreign KeysIn a production environment, source data is rarely perfect. A transaction might arrive with a missing customer ID or an undefined store code. In a normalized operational database, this might be represented as a NULL value.In a dimensional model, however, fact tables should strictly avoid NULL values in foreign key columns. If a NULL exists in a foreign key, many reporting tools and standard SQL INNER JOIN operations will drop that row from the result set, causing the total metrics (like Revenue) to underreport.The standard pattern to address this is the assignment of a default "Unknown" or "N/A" record in the corresponding dimension table. This record is typically assigned a surrogate key of -1 or 0. During the ETL (Extract, Transform, Load) process, any missing or invalid source keys are mapped to this surrogate key. This technique ensures referential integrity is maintained and that every dollar of revenue is accounted for, even if the context is temporarily unknown.Fact Table Types OverviewWhile the structure described above covers the most common use cases, different business processes require specialized fact table architectures. These patterns dictate how the table captures data relative to time and events.Transaction Fact Tables: The most granular level. One row corresponds to one event (e.g., a single click, a single scan at a register). These are dense with data and typically fully additive.Periodic Snapshot Fact Tables: Data is aggregated over a standard time period, such as daily or monthly. This is common for performance monitoring or account balances (semi-additive).Accumulating Snapshot Fact Tables: Used for workflows with a definite start and end, such as order processing. A single row represents the entire lifecycle of an order, with multiple date foreign keys updated as the process moves from "Placed" to "Shipped" to "Delivered."We will examine the specific implementation details and use cases for Periodic and Accumulating snapshots in Chapter 4. For the initial design phase, identifying the base transaction grain and ensuring the correct additivity of measures allows you to build a solid foundation for the broader analytical schema.