Managing access controls through manual database commands or user interface clicks inevitably leads to configuration drift. Reliable data engineering environments define access policies as code objects and apply them programmatically. This ensures that permissions are version-controlled, testable, and consistent across environments.
In this practice section, we will build a Python-based policy engine. This engine will take a user's role and a dataset's metadata as input, then dynamically generate a secure SQL query. This approach, often called "Dynamic Views" or "Query Rewriting," is a fundamental pattern for implementing scalable governance.
The first step is moving policy definitions out of the database and into a configuration structure. We use a dictionary structure here, but in production, this might be a YAML file managed in git. We define roles and map them to permissible data tags.
This model relies on the intersection of user attributes (roles) and data attributes (tags). If a user's role grants access to a specific tag, they can query the associated columns.
from typing import Dict, List, Optional
from dataclasses import dataclass
# Define the allowed access levels for specific tags
# A user with 'analyst' role can only see 'public' and 'internal' data
ACCESS_POLICIES = {
"analyst": {"allowed_tags": {"public", "internal"}},
"engineer": {"allowed_tags": {"public", "internal", "confidential"}},
"admin": {"allowed_tags": {"public", "internal", "confidential", "pii"}}
}
@dataclass
class ColumnMetadata:
name: str
tag: str
dtype: str
# Define our table schema with governance tags
users_table_schema = [
ColumnMetadata(name="user_id", tag="public", dtype="int"),
ColumnMetadata(name="email", tag="pii", dtype="string"),
ColumnMetadata(name="country", tag="public", dtype="string"),
ColumnMetadata(name="salary_band", tag="confidential", dtype="string"),
ColumnMetadata(name="created_at", tag="internal", dtype="timestamp"),
]
Once the definitions are established, we need a mechanism to enforce them. We will implement a function that acts as a transpiler: it reads the schema and the requestor's role, then constructs a SQL statement.
This logic follows a simple set operation. Let be the set of tags allowed for a role, and let be the tag associated with column . The column is selected as-is if:
If the condition is not met, we have two architectural choices: exclude the column entirely or mask the data (return NULL or a hash). Masking is often preferred in production analytics to prevent code breakage when a column suddenly "disappears" from a view schema.
Flow of the policy engine combining role definitions and schema metadata to produce compliant SQL.
The following code implements this logic. It iterates through the schema and determines whether to expose the raw column or a masked literal.
def generate_secure_query(table_name: str, schema: List[ColumnMetadata], role: str) -> str:
"""
Generates a SELECT statement with dynamic masking based on RBAC.
"""
if role not in ACCESS_POLICIES:
raise ValueError(f"Role '{role}' is not defined in policies.")
allowed_tags = ACCESS_POLICIES[role]["allowed_tags"]
select_clauses = []
for col in schema:
if col.tag in allowed_tags:
# User has access, select the raw column
select_clauses.append(col.name)
else:
# User lacks access, mask the data
# Casting NULL ensures the view schema remains consistent
select_clauses.append(f"CAST(NULL AS {col.dtype}) AS {col.name}")
query_cols = ", ".join(select_clauses)
return f"SELECT {query_cols} FROM {table_name};"
# Example Usage:
print("--- Analyst View ---")
print(generate_secure_query("users", users_table_schema, "analyst"))
print("\n--- Admin View ---")
print(generate_secure_query("users", users_table_schema, "admin"))
When you run the script above, you will observe how the governance logic alters the resulting SQL.
For the Analyst, who only has public and internal access, the output masks the PII and Confidential columns:
SELECT user_id,
CAST(NULL AS string) AS email,
country,
CAST(NULL AS string) AS salary_band,
created_at
FROM users;
For the Admin, who has all permissions, the query retrieves all data transparently:
SELECT user_id, email, country, salary_band, created_at FROM users;
While generating strings is useful for understanding the logic, in a real environment, you apply this using infrastructure-as-code tools.
users_analyst or users_admin.CAST(NULL...), your Python policy engine can generate the specific CREATE ROW POLICY statements required by your vendor.pii before the code is ever deployed to production.This programmatic approach decouples the definition of security (the dictionary) from the implementation of security (the SQL generation). This separation allows you to update policies centrally without manually rewriting hundreds of SQL queries.
Was this section helpful?
© 2026 ApX Machine LearningEngineered with