Chapter 2: Data Quality Dimensions and Testing

Governance establishes the rules, but engineering must enforce them. While software engineering relies on unit tests to verify logic, data engineering uses assertions to verify state. Without automated testing, quality issues often remain invisible until they break a downstream report or application.

This chapter focuses on the technical implementation of data quality checks. We move from abstract definitions of data standards to concrete, programmatic assertions. You will learn how to quantify quality using standard dimensions: accuracy, completeness, consistency, and validity.

We will examine the structure of data assertions, writing logic to validate schemas and strictly enforce data types. The curriculum also covers statistical profiling, a method used to identify anomalies in data distribution that simple rule-based checks might miss. For instance, verifying that the mean of a column $\mu$ falls within an expected range $[\mu_{min}, \mu_{max}]$ allows for dynamic quality control.

By the end of this module, you will understand how to build validation suites that function as gateways, preventing non-compliant data from entering production environments.

Sections

2.1 Core Dimensions of Data Quality
2.2 Anatomy of a Data Assertion
2.3 Validating Schemas and Types
2.4 Statistical Profiling and Distribution Checks
2.5 Practice: Writing Data Validation Suites