Timeliness is often the most visible attribute of data quality to downstream stakeholders. Even if a dataset is accurate, complete, and consistent, it loses utility if it arrives too late to influence decision-making. Data reliability engineering distinguishes between two related but distinct temporal metrics: freshness and latency.
Freshness describes the age of the data relative to the current moment. It answers the question: "How recently was this data generated?" This is primarily a consumer-facing metric. A dashboard user cares that the sales data reflects transactions from five minutes ago, regardless of how complex the pipeline is.
Latency measures the time it takes for data to move through the system. It answers the question: "How long did the pipeline take to process this batch?" This is an engineering-facing metric used to identify bottlenecks in transformation logic or resource contention.
To monitor these metrics effectively, you must instrument your data with specific timestamps at different stages of the lifecycle. A strategy relies on three specific timestamp types:
By comparing these timestamps, you can isolate where delays occur. If the gap between Event Time and Ingestion Time is large, the issue lies with the source system or network. If the gap between Ingestion Time and Processing Time is large, the bottleneck is within your transformation pipeline.
The progression of data through a pipeline requires distinct timestamps at each stage to isolate the root cause of latency.
Freshness is calculated by comparing the current system clock to the maximum event timestamp found in a target table. The formula for freshness at time is:
In a SQL-based environment, you can implement a monitor by querying the most recent record. If you are monitoring a table orders, a basic freshness check looks like this:
SELECT
MAX(event_timestamp) as latest_data_point,
CURRENT_TIMESTAMP() as check_time,
TIMESTAMPDIFF(MINUTE, MAX(event_timestamp), CURRENT_TIMESTAMP()) as minutes_since_last_event
FROM prod.orders;
This query returns the "lag" in minutes. If this lag exceeds your defined Service Level Agreement (SLA), the monitor should fire an alert. For example, if your SLA states that data must be no older than 60 minutes, a result of 65 would trigger a PagerDuty incident or a Slack notification.
Freshness behaves differently in streaming pipelines versus batch pipelines. In a streaming system, freshness should remain relatively constant and near-zero. In batch systems, freshness follows a "sawtooth" pattern.
Consider a pipeline that runs every hour. Immediately after the job finishes, freshness is low (perhaps 5 minutes). As time passes, the data ages. Just before the next run, the data is 59 minutes old. Once the job completes, freshness drops back down.
When setting alerts for batch systems, you cannot simply look for . You must define a threshold based on the expected batch cadence plus a buffer for processing time.
The following visualization demonstrates the difference between healthy batch behavior and a stuck pipeline.
Ideally, batch data age grows linearly until a new load resets it. The chart depicts a pipeline that fails to update after hour 2, causing the data age to cross the SLA threshold.
While freshness measures the result, latency measures the efficiency of the process. High latency in a pipeline often precedes freshness violations. By monitoring latency, you can detect degrading performance before it breaches an SLA.
Latency is calculated as the duration between two timestamps on the same record. For a batch process, we often look at the average or median latency of the records processed in the last run:
If your pipeline typically processes a batch in 10 minutes, but suddenly takes 20 minutes, your data might still be "fresh" according to the SLA, but the underlying infrastructure is struggling. This could indicate:
Running SELECT MAX(timestamp) on a multi-terabyte table every five minutes is expensive and inefficient. It places unnecessary load on the warehouse.
A more scalable engineering pattern involves a dedicated pipeline metadata table. When your ETL job completes a batch, it should write a summary record to this table. The summary includes the batch ID, rows processed, start time, end time, and the maximum event timestamp observed in that batch.
Your observability monitor then queries this lightweight metadata table instead of scanning the full raw data.
-- Efficient check against metadata table
SELECT
pipeline_name,
last_successful_run,
target_table_max_timestamp,
datediff('minute', target_table_max_timestamp, current_timestamp) as freshness_minutes
FROM pipeline_audit_log
WHERE pipeline_name = 'orders_daily_batch'
ORDER BY run_id DESC
LIMIT 1;
This approach decouples monitoring from data processing. It allows you to run freshness checks frequently (e.g., every minute) with negligible cost. This metadata layer becomes the foundation for your observability dashboard, enabling you to track trends in both freshness and latency over weeks or months.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with