Timeliness is often the most visible attribute of data quality to downstream stakeholders. Even if a dataset is accurate, complete, and consistent, it loses utility if it arrives too late to influence decision-making. Data reliability engineering distinguishes between two related but distinct temporal metrics: freshness and latency.Freshness describes the age of the data relative to the current moment. It answers the question: "How recently was this data generated?" This is primarily a consumer-facing metric. A dashboard user cares that the sales data reflects transactions from five minutes ago, regardless of how complex the pipeline is.Latency measures the time it takes for data to move through the system. It answers the question: "How long did the pipeline take to process this batch?" This is an engineering-facing metric used to identify bottlenecks in transformation logic or resource contention.The Timestamp TriadTo monitor these metrics effectively, you must instrument your data with specific timestamps at different stages of the lifecycle. A strategy relies on three specific timestamp types:Event Time: The timestamp generated at the source (e.g., when a user clicked a button or a sensor recorded a temperature). This represents the "truth" of when the phenomenon occurred.Ingestion Time: The timestamp recorded when the data entered your controlled environment (e.g., arrived in the Kafka topic or S3 landing bucket).Processing Time: The timestamp applied when the data was successfully written to the destination warehouse or lakehouse.By comparing these timestamps, you can isolate where delays occur. If the gap between Event Time and Ingestion Time is large, the issue lies with the source system or network. If the gap between Ingestion Time and Processing Time is large, the bottleneck is within your transformation pipeline.digraph G { rankdir=TB; node [shape=box, style=filled, fontname="Helvetica"]; edge [fontname="Helvetica", fontsize=10]; source [label="Source System\n(Event Time: T1)", fillcolor="#e9ecef", color="#adb5bd"]; ingestion [label="Ingestion Layer\n(Ingestion Time: T2)", fillcolor="#a5d8ff", color="#1c7ed6"]; warehouse [label="Data Warehouse\n(Processing Time: T3)", fillcolor="#96f2d7", color="#0ca678"]; source -> ingestion [label="Network/Extraction Delay"]; ingestion -> warehouse [label="Transformation/Load Delay"]; }The progression of data through a pipeline requires distinct timestamps at each stage to isolate the root cause of latency.Calculating FreshnessFreshness is calculated by comparing the current system clock to the maximum event timestamp found in a target table. The formula for freshness $F$ at time $t$ is:$$F_t = t - \max(T_{event})$$In a SQL-based environment, you can implement a monitor by querying the most recent record. If you are monitoring a table orders, a basic freshness check looks like this:SELECT MAX(event_timestamp) as latest_data_point, CURRENT_TIMESTAMP() as check_time, TIMESTAMPDIFF(MINUTE, MAX(event_timestamp), CURRENT_TIMESTAMP()) as minutes_since_last_event FROM prod.orders;This query returns the "lag" in minutes. If this lag exceeds your defined Service Level Agreement (SLA), the monitor should fire an alert. For example, if your SLA states that data must be no older than 60 minutes, a result of 65 would trigger a PagerDuty incident or a Slack notification.The Sawtooth Pattern in Batch ProcessingFreshness behaves differently in streaming pipelines versus batch pipelines. In a streaming system, freshness should remain relatively constant and near-zero. In batch systems, freshness follows a "sawtooth" pattern.Consider a pipeline that runs every hour. Immediately after the job finishes, freshness is low (perhaps 5 minutes). As time passes, the data ages. Just before the next run, the data is 59 minutes old. Once the job completes, freshness drops back down.When setting alerts for batch systems, you cannot simply look for $F_t > 0$. You must define a threshold based on the expected batch cadence plus a buffer for processing time.The following visualization demonstrates the difference between healthy batch behavior and a stuck pipeline.{"layout": {"title": "Batch Pipeline Freshness: Healthy vs. Stuck", "xaxis": {"title": "Time (Hours)"}, "yaxis": {"title": "Data Age (Minutes)"}, "shapes": [{"type": "line", "x0": 0, "y0": 60, "x1": 5, "y1": 60, "line": {"color": "#fa5252", "width": 2, "dash": "dash"}}], "showlegend": true, "height": 400, "margin": {"l": 50, "r": 20, "t": 40, "b": 40}}, "data": [{"x": [0, 0.9, 1, 1.9, 2, 2.9, 3, 3.9, 4, 4.9], "y": [5, 59, 5, 59, 5, 59, 65, 119, 125, 179], "type": "scatter", "mode": "lines", "name": "Observed Freshness", "line": {"color": "#1c7ed6"}}, {"x": [0, 5], "y": [60, 60], "type": "scatter", "mode": "lines", "name": "SLA Threshold", "line": {"color": "#fa5252", "dash": "dash"}}]}Ideally, batch data age grows linearly until a new load resets it. The chart depicts a pipeline that fails to update after hour 2, causing the data age to cross the SLA threshold.Monitoring LatencyWhile freshness measures the result, latency measures the efficiency of the process. High latency in a pipeline often precedes freshness violations. By monitoring latency, you can detect degrading performance before it breaches an SLA.Latency is calculated as the duration between two timestamps on the same record. For a batch process, we often look at the average or median latency of the records processed in the last run:$$L_{avg} = \frac{1}{n} \sum_{i=1}^{n} (T_{processing, i} - T_{event, i})$$If your pipeline typically processes a batch in 10 minutes, but suddenly takes 20 minutes, your data might still be "fresh" according to the SLA, but the underlying infrastructure is struggling. This could indicate:Volume anomalies: The source system sent significantly more data than usual.Resource contention: The warehouse cluster is under heavy load from other queries.Inefficient code: A recent deployment introduced a slow join or a Cartesian product.Implementation: The Metadata Table ApproachRunning SELECT MAX(timestamp) on a multi-terabyte table every five minutes is expensive and inefficient. It places unnecessary load on the warehouse.A more scalable engineering pattern involves a dedicated pipeline metadata table. When your ETL job completes a batch, it should write a summary record to this table. The summary includes the batch ID, rows processed, start time, end time, and the maximum event timestamp observed in that batch.Your observability monitor then queries this lightweight metadata table instead of scanning the full raw data.-- Efficient check against metadata table SELECT pipeline_name, last_successful_run, target_table_max_timestamp, datediff('minute', target_table_max_timestamp, current_timestamp) as freshness_minutes FROM pipeline_audit_log WHERE pipeline_name = 'orders_daily_batch' ORDER BY run_id DESC LIMIT 1;This approach decouples monitoring from data processing. It allows you to run freshness checks frequently (e.g., every minute) with negligible cost. This metadata layer becomes the foundation for your observability dashboard, enabling you to track trends in both freshness and latency over weeks or months.