II. Difference-in-Differences: What it is and what it isn’t.
DiD is a powerful econometric method used to assess the causal impact of a treatment or intervention by exploiting variations in both time and policy changes (often referred to as “treatment”) across different groups. This approach has gained widespread popularity in empirical research, particularly in economics, to evaluate the effectiveness of policies, programs, or other interventions. The essence of DiD lies in comparing the changes in outcome variables of interest (e.g., price) over time between a group that is exposed to the change in policy and a control group that is not (e.g., comparing different groups of consumers, different firms, different geographic regions). It gets its name “difference-in-differences” because it essentially combines two types of variation: a before-and-after analysis while simultaneously providing a counterfactual by comparing an affected and an unaffected group.
The key advantage of DiD is its ability to control for time-invariant unobservable factors that may influence the outcome of interest. By differencing out the common time trends between the groups that are and are not affected by a policy change (i.e., “treatment” and “control” groups), DiD isolates the treatment effect by focusing on the differential changes in outcomes that occur after the introduction of the treatment.
To implement DiD, researchers and practitioners typically observe outcomes (variables of interest) for both the treatment and control groups before and after the change in policy is introduced. The difference in the average outcomes between the treatment and control groups before the treatment serves as a baseline. Then, the difference in the average outcomes after the treatment is subtracted. The resulting estimate represents the causal effect of the treatment, and it is often referred to as treatment effect.
DiD is versatile and applicable in various settings. For instance, in evaluating the impact of a policy change, researchers might compare employment rates, income levels, or other relevant outcomes before and after the policy implementation between regions or groups affected by the policy and those not affected. The DiD methodology has also been implemented in antitrust court in various settings. In merger analysis, for example, DiD has often been implemented to estimate retroactively the impact of past consolidations to inform future policy. In current Merger Guidelines, it is also previewed the use of historical data on similar transactions to inform analysis on current filings, as shown in the excerpt below:
The Agencies may look for historical events to assess the presence and substantiality of direct competition between the merging firms. For example, the Agencies may examine the competitive impact of recent relevant mergers, entry, expansion, or exit events.
Despite its strengths, DiD is not immune to potential biases. Assumptions about parallel trends, meaning that the treatment and control groups would have followed similar trends in the absence of the treatment, need to be carefully considered. Violations of this assumption can lead to biased estimates, which will be discussed in more detail in the following section.
III. The DiD Renaissance
Choosing the right quantitative tool in an antitrust setting – including whether DiD is the right tool and its proper configuration – involves careful consideration of various factors to ensure the validity of the causal inference. This is particularly important given the Daubert Standard that puts into question the admissibility of regression analysis and other scientific methods. Courts have rejected regression analyses in the past that did not have an appropriate research design and/or weren’t the adequate tool for the issue being presented. Since biases in the canonical TWFE may be arising due to the violation of distinct conditions, there is no single recipe solution in this DiD Renaissance. There are some excellent papers that summarize the recent advances in the literature. Examples of situations where an alternative DiD specification may be needed, and which formulation in the literature can be applied are discussed below.
1. Multiple periods and/or variation in treatment timing
Differently from the canonical DiD, there may be situations where a simple pre- and post- treatment formulation is not enough to capture the dynamics, especially where different units get exposed to treatment at different times. A company’s pricing policy may go into effect in distinct regions at different times, for example, as opposed to being simultaneously launched. There might be a need to study the effect of successive acquisitions by the same company in different markets. A firm may choose to roll a new policy to distinct groups of stakeholders at different times. The resulting bias of the estimates obtained by applying the standard TWFE will be particularly problematic when there is heterogeneity in the treatment effect over time, as it may not properly represent the weighted average of unit-level treatment effects. There have been a few solutions proposed in the literature. In principle, they all have the idea of estimating several different effects, allowing flexibility of effects within cohorts and time periods, and then re-combining them to get an average effect, although the approach varies. One could, for example, use matching in each period to pick the best control group (only not actually treated units at that period are candidates), and once the control groups are picked one can essentially proceed as usual.
2. The parallel trends: the elephant in the room.
As with any model, the DiD relies on a number of assumptions, one of them being what is commonly referred to as parallel trends. This assumption requires the treatment and control groups should have similar trends over time in the absence of treatment. In practice, this means that, for example, absent a merger, everything else held constant, prices in markets where both merging parties are present (treatment group) and markets where at least one of them are not (control group) would have trended in a similar fashion.
a. Testing
The current standard for testing for the parallel trends assumption simply verifies whether the average outcome of interest has a similar trend prior the change in policy/treatment. In our example above, this means that practitioners would simply verify whether prices pre-merger in these different groups of markets have observed a similar trend. However, as validly pointed out in the literature, there are four main issues with this approach: i) parallel pre-trends do not guarantee that post-trends would be parallel absent treatment; ii) the tests being used to assess whether pre-trends are parallel often suffer from low statistical power; iii) since data used for pre-trend tests are selected and not truly random (an inherent part of the nature of data in antitrust cases) there may be bias; iv) some practitioners will still move on with the DiD analysis even if the parallel trends is violated, believing the analysis will still be informative. At the very least, practitioners should be implementing improved diagnostic tools to assess whether one or more of the issues presented above are likely to be present. Additional strategies, such as partial identification with two control groups are assumed to bracket the trend of the treatment group. A practitioner could also implement bounds using pre-trends, which essentially quantifies the idea that trends should not be “too far apart.”
b. What if parallel trends are violated?
A violation of the parallel trends assumption need not be the end of DiD analysis, but it does mean the need to adjust one’s specifications, as the TWFE linear regression will no longer produce consistent estimates merely by incorporating time independent variables. If this violation of the parallel trends happens due to an observable factor, it is possible to extend the assumption to a conditional parallel trends: conditioning on variables that are observable pre-treatment. There are several ways that the literature has proposed to operationalize the implementation of conditional parallel trends, such as: i) regression adjustment which essentially entails including additional observable and measurable characteristics (these observable and measurable characteristics from each unit can be called covariates) in the regression model to control for potential confounding factors, and allows for a more nuanced analysis of the variable of interest; ii) inverse probability weighting which will explicitly model the probability that each unit belongs to the treated/control given some covariates; iii) doubly-robust estimators which combines both methods previously mentioned.
3. Other recent developments in DiD
Other areas, such as sampling assumptions, treatment timing, spillover effects, conditional treatment, and distributional treatment have also received special attention lately in the literature and will be briefly mentioned below.
Inference without large samples: canonical TWFE DiD inferences rely on researchers and practitioners having access to large numbers of both treaters and untreated clusters, as confidence intervals are based on the central limit theorem. In many settings, however, especially in litigation, the number of independent clusters may be small – e.g., the number of markets, the number of firms – and confidence intervals may provide a poor approximation. Most proposed solutions to this issue attempt to model the dependence within clusters, that is, to understand what kind of relationship the standard errors of the different groups have, and model accordingly.
Quasi-random treatment timing: in settings when treatment is staggered (refer to Section III.1. above), it is often assumed parallel trends and justification is given by arguing that time of treatment is random or quasi-random. Randomness in treatment timing plays an important role in DiD, as it helps isolate the causal effect of the treatment from other potential factors – without random timing, there's a risk that factors other than the treatment itself may influence the outcomes being measured. If we go back to our previous example, one could argue that the timing that a merger affected different markets was random or quasi-random, or that the timing that a change in policy was rolled over for different groups of customers was random or quasi-random This assumption has been be scrutinized and addressed in the literature. Proposed solutions so far have shown under which conditions one can obtain more efficient estimates than staggered DiD methods, and have proposed settings where treatment timing is conditional on fixed observable characteristics (e.g., there are some observable variables about these markets, such as consumer preferences, available technology, etc., that once controlled for, timing is random). This is still an area where research and development is likely to occur.
Spillover effects: most DiD literature imposes what is known as stable unit treatment value assumption (SUTVA), which posits that potential outcomes of a unit are unaffected by the treatment assignment of other units – in other words, the variable of interest for that unit only depends on whether that unit and that unit only has been exposed to the policy change, which guarantees independence and essentially rules out any spillover effects. In our earlier example, customers can only be affected if the change in policy has been rolled out in their market, but ought to be unaffected if policy remains the same, all else held constant. However, it is possible that, if individuals are connected by a network, for example, there might be some spillover effects. This growing literature has already accounted for some extensions into the general framework, but there will likely be much more developments in this area, which may particularly impact how antitrust litigation views competition when platforms are involved. One might consider, for example, how changes in Gen AI policy that is applicable only to European markets starts affecting the way companies conduct business in the United States, despite having no change in policy in this market.
Developments have also been made in conditional treatment effects, which seek to investigate how average treatment effects vary between subpopulations given some observable characteristics (one could investigate how the magnitude of price effects vary for distinct regions or distinct groups of customers for example once subject to a policy change). Other research has focused on distributional treatment effects, which instead of solely focusing on the average treatment effect, is interested on the entire distribution of an outcome. This may be particularly important if one is interested in how the price effect evolved over-time for example.
IV. Conclusion
In conclusion, Difference-in-Differences remains a valuable tool for estimating causal effects, offering a quasi-experimental approach to understanding and estimating the economic implications of alleged anticompetitive practices. Recent econometric developments have significantly enhanced the method's applicability, addressing concerns related to control group selection, unobserved heterogeneity, and parallel trends. By incorporating appropriate adjustments to their DiD specification, antitrust experts can improve the robustness of their estimates, ensuring that antitrust enforcement remains grounded in sound economic principles and evidence-based reasoning. As econometrics continues to evolve, it is paramount that practitioners stay up to date with state-of-the-art quantitative techniques, allowing DiD analysis to contribute to more accurate and reliable causal inference in antitrust cases.