In this article we group econometric techniques used by antitrust economists into three broad categories that generally follow a continuum of their specific connection to economic theory:
- Summary statistics and simple data depictions allow data to “speak” on their own without being tied to economic theory. These analyses can be easier for a judge or jury to understand and often serve as a useful starting point in demonstrating consistency between data and relevant economic and legal theories.
- Regression analysis is a common statistical tool that looks at the relationships within historical data and, if appropriately integrated with economic theory, can be used to make but-for world projections––that is, predictions about worlds that did not exist (the world following an unconsummated merger, for example). Regression results, however, relate to the functioning of the market only to the extent that the econometrician designs them to do so. Often, such designs are not explicit and, as such, the interpretation of regressions must be done with an understanding of the underlying assumptions. Further, their ability to make predictions about but-for worlds may be limited.
- Structural models typically begin with a mathematical model of the market––supply and demand––that can encompass both the actual and but-for worlds, using statistical tools to understand the effects of changing actions of the different players or other forces within the market. The result is an “estimated” model that can predict outcomes in relevant counterfactual worlds—e.g., a change in conduct or policy. A structural model requires specific assumptions that are subject to critique (similar to the critique of assumptions in regression analyses) and are more difficult to implement.
By understanding the benefits and limitations of these categories of econometric analyses, a practitioner is well on the way to comprehending the possibilities of antitrust econometrics—and more useful and enjoyable meetings with economists.
Summary Statistics and Simple Data Depictions
It is important that economists can describe data in a way that accords with the basic facts of a case before applying the filter of an economic model. Being transparent about the complexity and quality of the data can enhance understanding of the underlying economics of the market and bolster confidence in the predictions of economic models that employ the data as an input.
Descriptive empirical work is useful for at least two purposes: (1) evaluating the integrity of the data and revealing any concerns that may undermine economic inferences based on them; and (2) showing that patterns in the data are consistent with inferences from other economic evidence, including more sophisticated economic modeling and econometrics. As described further below, economic modeling and econometric estimation can usefully illuminate the likely economic implications of firm behavior.
However, the accuracy of empirical predictions depends on the accuracy of the data upon which they rely. An economist should begin by assessing the utility of the data themselves, then presenting descriptive work to build confidence in their analysis among reviewers. For example, an economic model may be more convincing if the prices and product characteristics upon which the model’s predictions depend have sensible values (e.g., average values are consistent with other information sources and the data vary in ways that are easily understood). Although this step may seem mundane, it is a fundamental building block in developing confidence in the process of making predictions.
The value of descriptive work is not limited to showing data integrity; simple empirical analyses also can support economic hypotheses and test the content of company documents and statements. For example, a simple comparison of before-and-after prices can be combined with other evidence to demonstrate that a historical merger is associated with price increases. Such a depiction can then be explored with a regression analysis that considers alternative explanations for the pattern observed in the raw data. These mutually reinforcing analyses often are more powerful than either approach on its own. But, as we discuss further below, descriptive work cannot make definitive statements about causal economic hypotheses––e.g., previous mergers caused price increases––and thus simple descriptive work should be combined with additional evidence before reaching conclusions about the impacts of market conduct.
Regression Analysis and Reduced Form Models
Regression models are workhorse statistical tools in social sciences, hard sciences, and engineering. In each of these settings, a regression can be considered as an examination of the correlation (or relationship) between an outcome of interest––e.g., price (the dependent variable)—and factors that are theoretically related to the outcome’s variation (the explanatory variables).
But a regression in a litigation setting often is considered to be more than just a correlation analysis—typically, we are interested in knowing the impact a change in an explanatory variable has on the outcome of interest. For instance, in a price-fixing matter, the economist is interested in knowing whether prices (the outcome) were systematically different during a time period when there was evidence of collusion (an explanatory variable), accounting for other factors that may have affected prices, such as changes to the cost of inputs to the relevant products (other explanatory variables). The analysis could first present average prices by time period, then show that certain input costs also increased during the period of the alleged misconduct. We would expect prices to rise independently of the alleged misconduct because of increases to the firms’ costs, and thus a regression is necessary to hold constant other factors (such as input costs) in order to test the validity of the hypothesis that the observed price increases were caused by alleged price fixing.
Regression models vary in the degree to which they integrate economic theory. Some regressions are descriptive, in which case economic theory motivates the factors that are included in the regression equation and the expected direction that those factors may push the outcome of interest. (In the example above we posited that higher costs cause higher prices.) Other reduced form regression models are derived from an economic model. We first discuss descriptive regressions and then turn to reduced form regressions.
Descriptive Regressions
Fundamentally, regressions are a succinct way of describing relationships among variables—for example, a regression can isolate that, conditional on other factors, on average an additional bathroom is associated with a $5,000 higher sale price of a home. A regression always provides correlations between variables but it does not always measure a causal relationship, which is what typically is of interest to competition agencies and courts. Misunderstanding the difference between correlation and causation is common. Underlying theory can provide a causal foundation for a regression model, but for this explanation to be reasonable, other potential causal theories must be considered by explicitly accounting for them in the regression model or by excluding them as irrelevant for sound economic reasons.
A classic antitrust example of this common misunderstanding is the relationship between prices and market concentration. Many empirical studies have shown prices to be positively correlated with market concentration—that is, higher prices are associated with higher levels of concentration. This is consistent, for example, with economic theories that horizontal mergers among competing firms can cause price increases. However, positive correlation between prices and market concentration does not prove that higher concentration caused higher prices. Statistically, positive correlation between prices and concentration is equally consistent with the opposite causation—i.e., that high prices caused high concentration.
Consider the following example: A firm is considering entry into two markets, one with high operating costs (such as labor or rent) and one with low operating costs. All else being equal, the firm will choose to enter the market with low operating costs and thus, in equilibrium, more firms will enter the market with lower operating costs. Hence, prices generally are lower in the market with more competition. However, it was the higher costs of operation (and thus higher prices) that caused fewer firms to enter high-cost markets, leading to greater concentration. Therefore, although a regression may show positive correlation between prices and market concentration, it does not necessarily mean that higher market concentration caused higher prices.
With this caveat in mind, we briefly describe several types of descriptive regressions that are useful for antitrust practitioners to recognize and understand. Cross-section regressions compare outcomes across different markets within a single time period; time series regressions consider many observations across time periods from a single market; and panel data regressions examine many markets over time, combining the two other approaches.
Cross-Section Regressions
Cross-section regressions estimate relationships between variables using data from a single point in time. However, econometric analyses based solely on a cross-section of data should be evaluated with caution. In many antitrust applications, it is dangerous to make inferences about changes in firm conduct by comparing outcomes in two markets in the same time period because, typically, there are many reasons why outcomes, e.g., prices, may differ across markets that are unrelated to the conduct at issue. For example, it may be tempting to evaluate whether a merger that reduces the number of independent firms in a market from four to three will cause prices to increase by comparing prices in markets with four independent firms to prices in markets with three. As explained, however, inferences in this example may be misleading if markets with three independent competitors differ for other, unobserved reasons, such as having higher operating cost than markets with four independent competitors.
Economists often do not carefully analyze the data that underlies cross-section regressions to understand their basic variation (for instance, they may omit the step of simple descriptive statistics and other exploratory steps to analysis) or review qualitative information about the differing structure of the markets. This is an invitation for an experienced lawyer to undertake a very uncomfortable cross-examination.
Time-Series Regressions and Event Studies
Time-series regressions examine relationships in one market over time. Generally, time-series methods use historical values of variables to predict present values of the outcome of interest without regard for the underlying economic causes of the observed relationships. For example, one might estimate current gasoline prices using past prices of gasoline and crude oil and economic growth.
The most common time series approach used in antitrust contexts (and in litigation more generally) is an event study. An event study has the features of a standard time series regression model, but the focus is on how the outcome of interest changes after some event occurs. For example, a time-series regression model might consider how the price level of a product changes after an alleged collusive agreement comes into force.
Two important considerations in conducting event studies are: (1) precisely identifying the start (or end) of an event; and (2) identifying the periods of time before, during, and after the event that are relevant for the analysis. In many cases, it may seem trivial to determine the timing of an event; perhaps a revealing email from a CEO details an anticompetitive plan or a company publicly announces it had been manipulating the market but is no longer doing so. Despite these clear demarcations of time, it may be that the nefarious plan was implemented slowly and prices did not move immediately or early warnings of manipulation enabled market participants to anticipate the eventual announcement by the company. To the extent that the impacts of the event were either slow to materialize or predate the selected time of the event, the results of an event study can understate those impacts.
The second consideration is how many periods to include in the model on either side of the event, the “window” of the model. When the event has an immediate impact, then the ideal window would be a comparison of the variable of interest the instant before and the instant after the event. But, typically, the impact from antitrust actions play out over time, even in acts of financial market manipulation, and so a longer time window after the alleged conduct began may be necessary to fully capture its impact. A window that is too wide, on the other hand, may include periods where the nature of the market has changed so much that those additional periods are largely uninformative and their inclusion waters down the estimated impact of the event. It is common in event studies to present results from several windows of varying sizes to show that the results are robust to the chosen event windows. Ideally, each window size offers a similar estimate of the impact.
Time-series analysis poses well-known econometric challenges—some quite technical, but others more intuitive. A common, and intuitive, challenge arises in time-series analysis when inferring a causal impact between one or more variables that are trending together because of a set of unmeasured factors—for example, when each variable is growing over time for reasons that observed variables cannot explain. In such situations, the legal team should be wary of economists’ use of the technical language of time-series econometrics because problems associated with common time trends often can be demonstrated with simple graphical analyses.
Panel Data Regressions
Panel Data regressions estimate relationships between variables using data that vary cross-sectionally (e.g., across individuals, firms, or store locations) and over time. In general, panel data regressions are better able to isolate relationships between variables than cross-section or time-series regressions. In essence, panel data are able to capitalize on both time series and cross-sectional variation in data, while mitigating some of the weaknesses of cross section and time-series regression analyses.
First, by observing each market participant or product multiple times, a panel data approach is able to account for unobservable differences among them. More specifically, problems that arise in cross-section regressions due to such unobserved differences across observations can be mitigated in a panel data regression by using “fixed effects.” Fixed effects are variables that can “control for” unobserved factors that vary across a cross-section of data (but do not vary over time).
For example, consider a study of pulp wood monopsony in forest markets using county-level panel data on stumpage fees. An analyst might be able to control for the blend of trees cut in each county without having data on tree composition by including variables that indicate which county the fee is associated with—that is, a set of variables equal to one for the county associated with the observed stumpage fee and zero otherwise. Economists sometimes call these variables “fixed effects.” Economists often present fixed effects as a fix-all without a clear underlying theory for how the fixed effect addresses a specific (or general) shortcoming, but it is important to understand that fixed effects are “fixed” in one dimension. In the above example, county fixed effects control only for things that vary across counties and that do not differentially change over time across counties (i.e., they control for the blend of trees across counties so long as a county’s blend is constant over time). Hence, fixed effects may be less useful in highly dynamic settings (like those occurring in many tech markets).
Difference-in-differences (DID) regressions are the panel data version of event studies. With panel data, an event is most informative when it occurs differentially across cross-sectional observations (e.g., markets). For example, consider the impact of a merger of gasoline stations on gasoline prices. If the merger affects all locations the same way at the same time, then it may be difficult to untangle the impact of the merger from any other changes to the gasoline market at the same time (such as holidays or general economic conditions). However, if different locations experienced the event differentially, either because the merger was consummated in some regions at different times (or not at all) or because of varying pre-merger levels of competition (a merger is expected to impact prices less in less concentrated markets), then this variation can be used to determine the impact of the merger.
DID is designed to control for unrelated but coincident influences on variables of interest when studying the impact of an event. This approach is useful to the extent that the change in the variable of interest among the unaffected (“control”) units closely approximates the change that would have occurred at affected (“treatment”) units in the absence of the event. Thus, while the DID approach requires consideration of the same two issues as an event study (event timing and window size), it adds a third: selection of appropriate control units.
Enforcement agency economists and other antitrust practitioners often use DID techniques to estimate the effect of past mergers on economic outcomes in several industries. For example, in their study, Hosken et al. show that retail mergers are associated with anticompetitive price increases in areas where market concentration is high and price decreases in areas where market concentration is low. Similarly, several hospital merger retrospectives have shown hospital mergers to be associated with price increases in areas where market concentration is high.
Reduced Form Regression Models
As discussed above, economic theory can inform descriptive regressions in that it can provide guidance on the factors that may be relevant for a particular outcome and whether those factors are expected to affect that outcome. Unlike descriptive models, reduced form models have a specific relationship to economic theory. However, in contrast to the structural models discussed in the next section, reduced form models typically are not detailed models of individual behavior; instead, they tend to focus on one group (e.g., consumers), while making simplifying assumptions about the others (e.g., producers).
Economic theory can identify the conditions under which a reduced form regression can provide scientifically valid measures of the causal relationship. And theory will identify when simple regression is inadequate to describe the deeper—and mathematical—relationship among the variables at play while suggesting modifications to the regression framework that will do so. For instance, changing the regression framework to examine the percent change in the variable of interest may be enough to yield a theoretically valid reduced form measure of the relationship.
A first step in the reduced form approach is to posit, under simplified assumptions from economic theory, the specific mathematical relationships that exist between the outcome of interest and various relevant factors. Although reduced form regression models are explicitly or implicitly built up from models of individual consumer behavior, they typically employ simplifying assumptions to yield the regression equation applied to the data.
A well-known example of a reduced form model of consumer demand is the almost ideal demand system (AIDS). When written out, the AIDS model appears very similar to a descriptive regression model, but the variables included in the regression model and the formula itself have been derived from simplified economic theory. Further, the relationships among those variables are often restricted to follow the precepts of the underlying economic theory. In such a setting, the cross-examiner can then focus on whether the assumptions for an AIDS model are adequately met in the market at hand, and whether their violation is of economic and statistical importance.
As a practical matter, reduced form regressions are sometimes reverse-engineered from descriptive regressions. For example, an approach known as instrumental variables (or IV) starts with an otherwise descriptive regression model and adjusts variables to account for economic forces from outside the regression model. As an example, consider a regression model that describes new car sales—specifically, the number of cars that consumers demand. Car demand depends on car prices and attributes (such as horsepower and size). But there are potentially unmeasured factors that could influence both price and sales, which may cause a simple regression of quantity of car purchases on prices and car characteristics to generate misleading results—e.g., “quality” makes a car more expensive to manufacture, necessitating a higher price, but this also leads to higher demand for the car. Because we have an important omitted variable (i.e., quality), our model will seem to indicate that people want to buy more cars when the price is higher, a result that violates basic economic theory.
We can overcome this bias in the estimated relationship between car sales and prices by incorporating assumptions about the supply of new cars—that is, the number of cars a manufacturer is willing to produce for a given sales price. Because the automaker needs to recover the costs of manufacturing, high steel costs lead to higher car prices, but do not influence how many cars people want to buy. Put differently, steel prices influence how much a manufacturer wants to sell each car for (i.e., the supply curve), but not how much a consumer is willing to pay for a car (i.e., the demand curve). The price of steel can be used as an “instrument” for the price of a car. The price of steel econometrically severs the connection between the price of the car and unobservable factors that influence demand, thereby allowing for accurate estimates of the relationship between the price of a car and the number of consumers willing to pay that price (the elasticity of demand).
The focal point of contention for this reduced form regression is whether the steel price instrument is sufficiently independent of car demand but sufficiently related to the price of the car. In our example, this could be tested both logically and statistically. To illustrate when the instrument might be invalid, suppose we were examining steel making equipment instead of cars. In this instance, we would expect steel prices to be related to both the supply and demand for steel making equipment. Notice that the examination of the validity of this reduced form model relies on economic logic (i.e. common theoretical understanding) of how the market operates, offering a clear opportunity for questioners to probe the econometrician.
Reduced form models also can be used to identify the relationships that policymakers should consider in evaluating market conduct. A prime example is that of diversion ratios. A diversion ratio is the proportion of sales gained by one product when lost by another following a change in their relative prices. Diversion ratios can be calculated using elasticities of demand arising from an econometric model. This is most often applied in the case of a merger to determine how similar the products of two parties are. All else being equal, if the diversion ratio between the products of two companies is high, then a merger will tend to increase prices and if the diversion ratio is low, the price impact will tend to be small. This potential price impact can be calculated via the gross upward price pressure index (GUPPI), which is the product of diversion ratios and price-cost margins. Diversion ratios and the GUPPI are common metrics that guide the agencies in evaluating proposed mergers.
In sum, reduced form models serve as a bridge between the world of statistics and the world of economics. They can be quite successful at uncovering relationships within existing market structures, but become less reliable extrapolating to markets not previously seen. This extrapolation can be limited when the economic theory underlying the reduced form model may be too highly simplified and too general. To place extrapolations on firmer footing, on occasion, more effort may be necessary to develop economic theory that is more closely tied to the market at hand. This is the realm of structural modeling.
Structural Models
In the setting of competition economics, structural models seek to replicate the current world in a very detailed sense by modeling and measuring the consumer and firm conduct that determine supply and demand. With estimates of this underlying structure in hand, the model can be used to calculate counterfactual prices, quantities, and other economic outcomes following changes to at-issue conduct or policy. Hence, structural models begin with models of economic actors—as reduced form models do—but apply weaker restrictions on the precise nature of their behavior than their reduced form approaches and incorporate a more complete model of how the parties respond to one another. Further, because these models are tied to specific characteristics of the market at hand, there typically is not a single “standard” or “common” structural model that is employed widely, but rather a common or standard approach to designing a model.
Unlike standard regressions and correlations, the aim of a structural model often is to predict what will occur in a but-for world that has not yet been observed. Critically, the validity of the structural model is determined in the first instance by its ability to reflect accurately what happened in the actual world.
Many structural models of markets begin with that formulated by Berry, Levinsohn, and Pakes (BLP). BLP combines a logit model of consumer demand and instrumental variables. In doing so, the methodology generates models of both consumer demand and producer supply as a function of product attributes and supply inputs, which in turn generates a complete picture of how the market operates. With these relationships in hand, either supply can be changed (by removing a product from the market following a merger, for example) or demand can be (by addressing a product mislabeling, perhaps) and the new market outcomes can be “simulated.” That is, the prices that firms would choose and the sales that would result under these new market conditions can be determined. Because there are specific underlying models describing the behavior of both firms and consumers, extrapolations to market structures beyond those actually observed historically often can be more reasonably estimated using a structural model than by relying on a simpler reduced form approach that does not include the same level of detail on consumer and firm behavior.
While the BLP approach has been applied in a variety of markets, it cannot be applied in all markets. For instance, the relationship between health care providers and health care consumers cannot be captured by BLP alone because health insurers create a bargaining relationship between insurers and providers. More recently, several economists have developed structural models of hospital and insurer bargaining that account for negotiations between insurers and hospitals: Hospital systems with more desirable hospitals are able to negotiate higher prices with insurers, and insurers with more members are able to bargain better deals with hospitals, lowering their costs. Similarly, in media markets, Crawford and Yorokuglu developed a structural model to study content unbundling. This model was unsuccessfully applied to the National Hockey League and Major League Baseball because the model did not accurately reflect the real world.
Structural models can provide an accurate and thorough assessment of how a market changes when the behavior of its participants change. In contrast to reduced form models, they can perform well even when extrapolating outside market conditions previously observed. However, these models require many assumptions and can be difficult to develop and implement, requiring substantial time and expertise to ensure the real world is accurately reflected by the model.
Conclusion
It is rare in the world of antitrust that an econometric analysis can be both complete and accurate yet also easy for an antitrust agency or court to understand. For this reason, it is common to present evidence from a variety of methodologies to convince the reviewer that the analysis is reasonable.
Useful analysis may begin by outlining basic economic theory to give the reviewer a framework for thinking about how the conduct at issue may impact consumers or producers. Next, it could show simple relationships using descriptive statistics, allowing the data to shine. Usually, however, basic statistics will not be enough, as there will be other changes occurring over time or across markets that need to be addressed. Regression models can then be used to provide a clearer assessment of the impact of the conduct.
If the market without the conduct at issue is too different from any observed in the past or in other geographic locations, then a structural model may be useful in reaching a final determination on the magnitude of the impact. In this context, we estimate mathematical models of consumer demand, production characteristics such as marginal cost, and how firms interact to set prices and quantities. With these relationships in hand, we can estimate how a shift in the way firms interact can yield new prices and quantities in the but-for world.
In all instances, the key remains to make all of the points and supporting work as accessible as possible for those who have no economics background—often the decision maker in the matter or case.