When dealing with observational data where unobserved variables might influence both the treatment assignment and the outcome, standard methods like regression adjustment or matching can yield biased estimates of causal effects. As discussed earlier in this chapter, techniques like Instrumental Variables (IV) and Regression Discontinuity Designs (RDD) offer solutions under specific structural assumptions. Difference-in-Differences (DiD) provides another powerful approach within the quasi-experimental toolkit, particularly well-suited for situations where you have panel data, meaning repeated observations of the same units (individuals, firms, regions, etc.) over time.
The core idea behind DiD is to exploit variations in treatment timing and status across units and time periods to control for certain types of unobserved confounding. Specifically, DiD is adept at handling time-invariant unobserved characteristics of units that might affect both their likelihood of receiving treatment and their outcome levels.
Imagine the simplest scenario: two groups of units, one designated as the "treatment" group and the other as the "control" group, observed over two time periods, one "pre-treatment" and one "post-treatment". The treatment is introduced only to the treatment group in the post-treatment period.
Let Yit be the outcome for unit i at time t. Let Ti be an indicator variable that is 1 if unit i belongs to the treatment group and 0 if it belongs to the control group. Let Pt be an indicator variable that is 1 for the post-treatment period and 0 for the pre-treatment period.
The DiD estimator calculates the difference in the average change in the outcome over time between the treatment and control groups.
δ^DiD=(Change for TreatedYˉT=1,P=1−YˉT=1,P=0)−(Change for ControlYˉT=0,P=1−YˉT=0,P=0)Intuitively, we observe how the outcome changed for the treated group after the treatment was introduced. To figure out how much of this change was due to the treatment versus other factors happening over time, we look at how the outcome changed for the control group during the same period. The assumption is that the control group's change reflects the trends or common shocks that would have also affected the treatment group if they hadn't received the treatment. Subtracting the control group's change isolates the estimated effect of the treatment.
The validity of the DiD estimator hinges critically on the parallel trends assumption. This assumption states that, in the absence of the treatment, the average outcome for the treatment group would have followed the same trend as the average outcome for the control group.
Formally, using potential outcomes notation where Yi(0) is the outcome unit i would have if not treated:
E[Yi(0)∣Ti=1,Pt=1]−E[Yi(0)∣Ti=1,Pt=0]=E[Yi(0)∣Ti=0,Pt=1]−E[Yi(0)∣Ti=0,Pt=0]This is an assumption about the counterfactual trend of the treated group. It does not require the groups to have the same outcome levels in the pre-treatment period, only that their trends would have been parallel without the intervention. This assumption allows us to use the observed change in the control group as a proxy for the counterfactual change in the treated group.
Visualization of the parallel trends assumption. The solid blue line shows the control group's outcome over time. The solid red line shows the treated group's observed outcome. The dashed red line illustrates the counterfactual trend the treated group would have followed if the parallel trends assumption holds and they hadn't received treatment. The DiD estimate is the vertical difference between the observed outcome and the counterfactual trend in the post-treatment period.
Since this assumption involves counterfactuals, it cannot be directly tested. However, we can assess its plausibility:
While the 2x2 calculation is intuitive, DiD is more commonly implemented using a regression framework, which readily extends to multiple groups, multiple time periods, and the inclusion of covariates.
For the 2x2 case, the regression model is:
Yit=β0+β1Ti+β2Pt+δ(Ti×Pt)+ϵitHere:
For panel data with many units (i=1,...,N) and time periods (t=1,...,K), a more general and robust approach uses two-way fixed effects:
Yit=αi+λt+δTit+Xit′β+ϵitThe parallel trends assumption in this context means that, conditional on Xit and after accounting for unit and time fixed effects, the trends would be parallel between treated and control units in the absence of treatment.
While the two-way fixed effects model is powerful, recent research has highlighted potential issues, especially when treatment timing varies across units (staggered adoption).
Staggered Adoption Bias: When units adopt treatment at different times, the standard two-way fixed effects estimator δ can be biased. This bias arises because the estimator implicitly uses already-treated units as controls for later-treated units, and vice-versa, potentially leading to misleading estimates (sometimes even negative weights on certain comparisons). Work by Goodman-Bacon (2021), Callaway & Sant'Anna (2021), and Sun & Abraham (2021) details these issues.
did
package in R/Stata (implementing Callaway & Sant'Anna) or the sunab
function within the fixest
R package (implementing Sun & Abraham). These methods typically define comparisons more carefully, often comparing newly treated groups only to never-treated or not-yet-treated groups.Dynamic Treatment Effects (Event Studies): The treatment effect might not be constant; it could evolve over time after treatment initiation. To estimate these dynamics, we replace the single δTit term with a series of indicators relative to the timing of treatment:
Yit=αi+λt+k=kmin∑kmaxδkDitk+Xit′β+ϵitHere, Ditk is an indicator that equals 1 if unit i at time t has been treated for k periods (where k can be negative for pre-treatment periods, 0 for the treatment initiation period, and positive for post-treatment periods).
statsmodels
and linearmodels
in Python, or fixest
, lfe
, and did
in R, are commonly used for estimating fixed effects models and robust DiD variations.Difference-in-Differences is a widely used and effective technique for estimating causal effects using panel data, particularly when concerned about time-invariant unobserved confounders. Its strength lies in the intuitive parallel trends assumption.
However, remember its key aspects and limitations:
By understanding these principles and potential complications, you can effectively apply DiD and its modern extensions to draw more credible causal conclusions from panel data in complex machine learning systems and beyond.
© 2025 ApX Machine Learning