Mapping the flow of data through a complex system is rarely a linear process. To construct an accurate dependency graph, engineers must choose between two primary methods of extraction: static analysis and dynamic analysis. These approaches differ fundamentally in how they observe the system, similar to the difference between reading a blueprint and watching a construction crew at work. Understanding the trade-offs between them is necessary for implementing a lineage solution that is both accurate and comprehensive.Static Lineage AnalysisStatic analysis derives lineage by examining source code without executing it. This method parses SQL scripts, Python files, and configuration manifests (such as YAML files in dbt) to identify input and output datasets. The core mechanism behind static analysis is the Abstract Syntax Tree (AST).When a lineage tool processes a SQL query, it does not run the query against the database. Instead, it breaks the code down into its grammatical components. Consider the following simple SQL statement:$$SELECT \ id, \ email \ FROM \ raw_users \ WHERE \ active = true$$A static parser identifies raw_users as the source table and infers a transformation that produces a result set containing id and email. This approach is highly effective for planning and impact analysis because it provides complete coverage of the codebase. It can map dependencies in code branches that rarely run, ensuring that even edge cases are visible in the dependency graph.The primary limitation of static analysis is its inability to interpret logic determined at runtime. If a Python script constructs a table name dynamically based on today's date (e.g., sales_2023_10_27), a static parser reading the code will only see the variable name, not the resolved table name.digraph StaticAnalysis { rankdir=TB; bgcolor="#f8f9fa"; node [shape=rect style=filled fontname="Arial" fontsize=12 penwidth=0]; edge [color="#adb5bd" arrowsize=0.8]; SourceCode [label="SQL / Python Source" fillcolor="#a5d8ff" fontcolor="#1c7ed6"]; Parser [label="Lexical Parser" fillcolor="#bac8ff" fontcolor="#3b5bdb"]; AST [label="Abstract Syntax Tree" fillcolor="#d0bfff" fontcolor="#5f3dc4"]; Metadata [label="Lineage Metadata" fillcolor="#eebefa" fontcolor="#862e9c"]; SourceCode -> Parser; Parser -> AST; AST -> Metadata; }The progression from raw code to metadata through an Abstract Syntax Tree (AST), independent of execution.Dynamic Lineage AnalysisDynamic lineage captures metadata by observing the system during execution. It relies on instrumentation embedded within the data platform, such as listeners in Apache Spark, callback hooks in Airflow, or query logs in a data warehouse.When a pipeline runs, the instrumentation records exactly what data was read and written. This approach resolves the ambiguity that plagues static analysis. If a job uses dynamic SQL to select from a specifically named partition, dynamic analysis captures the precise table name that was accessed. It provides a historical record of what actually happened, rather than a theoretical map of what might happen.The downside is that dynamic lineage is limited to executed paths. If a pipeline contains a conditional statement like if error_count > 0: write_to_error_table(), and the error count is zero during the run, the dependency on the error_table will not appear in the lineage graph for that execution. This can create gaps in the graph where dependencies exist in the code but have not been triggered recently.Comparative VisibilityChoosing the right approach often depends on the specific blind spots of your architecture. Static analysis excels at "design-time" visibility, while dynamic analysis excels at "run-time" precision.{"data": [{"type": "bar", "name": "Static Analysis", "x": ["Code Coverage", "Dynamic SQL Support", "Setup Complexity", "Runtime Overhead"], "y": [95, 20, 80, 5], "marker": {"color": "#4dabf7"}}, {"type": "bar", "name": "Dynamic Analysis", "x": ["Code Coverage", "Dynamic SQL Support", "Setup Complexity", "Runtime Overhead"], "y": [50, 95, 40, 25], "marker": {"color": "#51cf66"}}], "layout": {"title": "Feature Comparison: Static vs Dynamic", "barmode": "group", "plot_bgcolor": "#f8f9fa", "paper_bgcolor": "#f8f9fa", "font": {"family": "Arial", "color": "#495057"}, "margin": {"t": 40, "b": 40, "l": 40, "r": 40}, "showlegend": true}}A comparison of capabilities. Static analysis offers high code coverage with low overhead, whereas dynamic analysis handles runtime resolution accurately but covers only executed paths.Mathematical Representation of DependenciesTo build a reliable governance system, we treat the data platform as a directed graph $G = (V, E)$. Here, the set of vertices $V$ represents data assets (tables, streams, dashboards), and the set of edges $E$ represents the processes (jobs, queries) that transform them.Static and dynamic analyses produce two different versions of this graph.Static Graph ($G_S$): Represents the potential state space. An edge $(u, v) \in E_S$ exists if the code allows data to flow from $u$ to $v$.Dynamic Graph ($G_D$): Represents the realized state space over a time window $t$. An edge $(u, v) \in E_D$ exists only if data actually moved from $u$ to $v$ during execution.For impact analysis, determining what will break if a schema changes, $G_S$ is generally safer because it is conservative; it over-approximates the dependencies. For debugging, determining why a specific report is wrong today, $G_D$ is superior because it filters out theoretical paths that were not involved in the error.Integration in Modern StacksEffective reliability engineering rarely relies on a single method. An observability strategy typically layers these approaches.You might use static analysis within your Continuous Integration (CI) pipeline. Tools like SQLFluff or dbt's manifest parser verify code changes before deployment. This ensures that a developer does not accidentally introduce a circular dependency or reference a deprecated table.Simultaneously, you employ dynamic analysis in the production environment. Using a standard like OpenLineage, you emit events from the orchestrator (e.g., Airflow) as jobs complete. This confirms that the deployed code is behaving as expected and captures runtime metrics such as row counts and byte sizes associated with the edges of the graph.By combining $G_S$ and $G_D$, you achieve a comprehensive view: $G_S$ warns you of potential risks during development, while $G_D$ provides the operational context needed to resolve incidents in production.