August 14, 2018 Articles

Statistics in Class Actions: Is the Computer Algorithm Era upon Us? (Part II)

A discussion of the practical implications of using statistical methods in litigation.

By Paul G. Karlsgodt, Patrick T. Lewis, and Bonnie McNee

This is the second in a two-part series discussing the use of statistics in class actions. In Part I, we examined the history of statistics as a tool in supporting or defending against class actions and the recent case law discussing permissible and impermissible uses of statistical methods. In Part II, we examine some of the practical implications in light of the current legal landscape.

Types of Statistical Approaches
A background in types of statistical approaches is necessary as a foundation for understanding strategic considerations in presenting or defending against statistics in class actions. There are a variety of statistical methods that might be employed, depending on the facts and legal questions at issue in the case. E.g., David H. Kaye & David A. Freedman, Reference Guide on Statistics, in Reference Manual on Scientific Evidence 211 (3d ed. 2011). However, in general, statistics can be broken down into two main categories: descriptive statistics and inferential statistics.

The two categories are distinct. Descriptive statistics are used to describe or summarize a given set of data. See, e.g., Kaye & Freedman at 211. Examples of descriptive statistics include a baseball player’s batting average and a line plotted on a graph. In contrast, inferential statistics are used to attempt to infer an outcome from a known set of data. See id. at 241, 268 (statistical inferences depend on the validity of statistical models for the data; discussing statistical inference and probability models). Statistics used to justify class certification or establish facts on class-wide basis tend to be inferential rather than descriptive. This is because, at least theoretically, they provide a way to support factual inferences based on only a subset of available data. In other words, they provide a potential shortcut around having to analyze facts on an individual basis.

Inferential statistics generally involve the use of sampling from a set of data and then drawing conclusions, or inferences, from an analysis of the sample. As a result, there are multiple potential flaws in any inferential statistical method, including (1) the reliability of the data from which the sample is taken; (2) the extent to which the sample is representative of the data as a whole, i.e., whether the sample is statistically significant; (3) the mathematical validity of the methodology used to draw the inferences from the sample; and (4) the relevance to one or more issues in the case of the conclusions drawn using the methodology.

A particular type of inferential statistics often used in class actions is regression analysis, a method by which a statistician attempts to identify relationships between a particular result and one or more variables that may have contributed to the result. In Wal-Mart Stores, Inc. v. Dukes, for example, a regression analysis was presented in order to attempt to show that lower pay and fewer promotions for women were the result of a discriminatory pay policy. 131 S. Ct. 2541, 2555 (2011). As the result in Wal-Mart illustrates, regression analysis is problematic when used to demonstrate that a particular class impact has a single, common cause for all members of a proposed class, especially when there are numerous variables that also could have contributed to the same impact. See also McLaughlin v. American Tobacco Co., 522 F.3d 215, 228–29 (2d Cir. 2008) (a regression analysis suggesting that a disclosure about light cigarettes being unhealthy would have caused lower market prices did not isolate causation).

Probability theory is another concept related to inferential statistics that may be employed by expert testimony in a class action. Probability theory may be used to predict the likelihood that an event will occur or that a result will flow from a given cause or that a fact is true. Like regression analysis, probability theory involves drawing inferences from an analysis of variables. However, it differs from regression analysis in that instead of merely drawing a correlation between a particular variable and result, probability theory attempts to predict the odds of a result based on an evaluation of the interaction between known data and one or more variables.

Identifying Logical or Analytical Flaws in a Statistical Approach
The decisions summarized in this section reflect some of the flaws that courts have identified as being potential barriers to the successful use of statistical evidence in class actions.

One flaw is neglecting to account for all variables. A common analytical flaw identified by courts rejecting statistical evidence in class actions is that the plaintiff has proposed a regression analysis without taking into account and ruling out variables other than the allegedly illegal policy or practice that could have led to the claimed class-wide injury.

For example, in several of the cases involving alleged fraudulent pharmaceutical sales practices, the courts found that it was not possible to support the conclusion on a class-wide basis that fraud caused the drugs to be prescribed to class members because the methodologies employed did not rule out the many reasons why the same drugs may have been prescribed absent any reliance on allegedly fraudulent statements. See, e.g., UFCW Local 1776 v. Eli Lilly & Co., 620 F. 3d 121,135 (2d Cir. 2010) (plaintiffs did not rule out other variables causing prescriptions in Zyprexa to increase); In re Neurontin Mktg. & Sales Practices Litig., 244 F.R.D. 89 (D. Mass. 2007) (failing to account for all other variables contributing to prescription decisions led to denial of class certification).

Similarly, in Wal-Mart, the Court observed that Dr. Drogin had concluded that the disparities in pay and promotion between women and men were caused by a discriminatory policy without ruling out the variety of other nondiscriminatory factors that could have contributed to a pay disparity in any particular instance. And in McLaughlin, the court found that plaintiffs did not eliminate other variables that may have caused market prices of light cigarettes to be higher.

One way of evaluating this question is to create a fishbone, or Ishikawa, diagram, to demonstrate the various variable causes that can contribute to a particular effect. In order for a regression analysis to be used effectively, the expert will need to identify all of the possible variable causes that could have contributed to the alleged common impact and rule them out. On the flip side, an analysis that purports to show a cause-and-effect relationship between an alleged illegal act, policy, or practice and an allegedly common impact can be effectively discredited by illustrating the variable causes that have not been ruled out.

A second flaw is mistaking mere correlation for causation. As noted above, regression analysis is useful in evaluating a correlation between a result and one or more variables, but it does not necessarily show a cause-and-effect relationship between the variable and the result. For example, in Wal-Mart, while the plaintiffs could establish that Wal-Mart gave discretion to its managers in making employment decisions and also that women had a disproportionately lower rate of pay and promotion than men, the Court found that the statistical evidence presented failed to bridge the conceptual gap from finding a mere correlation between discretion and the lower rates of pay and promotion to finding a cause-and-effect relationship, i.e., that the discretion caused the alleged discrimination.

A third flaw is mistaking likelihood for commonality. A key logical flaw in attempts to establish common questions by statistical evidence is to use an analysis that is intended to establish the likelihood that a particular fact is true, i.e., probability theory, with the proposition that the fact is true for all of the members of a proposed class.

Take, for example, an analysis that shows that 60 percent of individuals who have been exposed to a particular pollutant have developed cancer. While this evidence might logically demonstrate that 60 out of every 100 individuals who have been exposed to the pollutant will develop cancer, it tells us nothing about which of the 100 individuals will develop cancer. It also does not prove that if a particular person who was exposed to the pollutant has developed cancer, this person’s cancer is necessarily caused by the pollutant, unless other factors can be ruled out. As such, while the statistical analysis might be relevant to the question of whether a particular person’s cancer was caused by the pollutant, it cannot prove causation on a class-wide basis. In other words, it is common evidence, but it does not by itself answer any relevant question on a common basis.

On the other hand, there are situations where statistical evidence may be common evidence that does have the potential to answer a question on a class-wide basis—a situation best illustrated by the Tyson Foods, Inc. v. Bouaphakeo, case. 136 S. Ct. 1036 (2016). There, the statistical evidence showing the average time to don and doff the protective gear was sufficient to support class certification because the defendant had not maintained records of the actual time it took any given employee to don and doff the gear, and, thus, the evidence could have been used by each class member individually.

Courts have identified a variety of other analytical flaws. Other potential analytical flaws include (1) failure to take into account the likelihood that a statistical method will result in false positives or lead to other erroneous results when applied to a particular set of facts, (2) use of an inherently unreliable or untested methodology to draw statistical conclusions, or (3) use of a methodology that has an unacceptable error rate or for which an error rate cannot be quantified. In Duran v. U.S. Bank National Ass’n, for example, the California Supreme Court concluded that the trial court had erred in part because it had essentially created its own nonscientific statistical sampling methodology rather than using statistically significant data sampling. 325 P.3d 916, 938–44 (Cal. 2014) (finding that sample was too small, sample was not random, and sample had “intolerably large margin of error”).

Preparing and Cross-Examining Experts
Because statistical evidence usually, if not always, requires expert testimony, a key practical consideration is how to prepare an expert to give opinions on statistical evidence or how to attack an expert’s opinion.

Is the expert qualified? A threshold question to consider is whether the expert is qualified to give opinion testimony about statistical evidence. An expert does not have to be a statistician to be qualified to opine about statistical methods, but the expert should be qualified in a field in which statistics is an accepted methodology. For example, economists may use a statistical method known as econometrics to make predictions and observations about a particular theory of economic damages. An engineer or physicist may use probability theory to make predictions about the results of physical processes.

Are the opinions based on a generally accepted methodology? The next question is whether the particular statistical methods being employed are commonly accepted within the expert’s field. Counsel should evaluate whether other experts in the field have used similar methodologies in published works or whether experts in the field have previously been qualified to testify about those methodologies.

Is the opinion based on reliable data? Another question to consider is whether the data on which the expert is relying is reliable. Even if the methodology is accepted within the expert’s field and has been followed correctly, the expert’s opinion is still subject to attack or even disqualification if the opinion is based on unreliable data. A nonexhaustive list of factors to consider in evaluating the reliability of the data include the following:

  • whether the data actually applies to the facts of the case (e.g., an expert applying generic industry data to draw conclusions about a particular defendant’s practices);
  • whether the data is being used in the expert’s analysis in a way that is consistent with the purposes for which it was collected (e.g., sample data compiled by a marketing department for market research purposes should not be used to prove something about the defendant’s actual sales);
  • whether the data set is complete or a mere subset or sample;
  • whether there are reasons to question the reliability of the source from which the data was compiled (e.g., the data is compiled using subjective or anecdotal reports or the automated process by which the data was collected is itself prone to error);
  • whether the data was compiled from an original source or whether it was transcribed or translated from some other source in a way that may have been prone to error; and
  • if the data is a sample, whether that sample is statistically significant.

Has the expert made any unsupported or speculative assumptions? Even if the methodology is sound and the underlying data reliable, an expert’s opinion on statistics can still be subject to attack if there are speculative assumptions underlying the opinion. See, e.g., McLaughlin, 522 F.3d at 228–29 (An expert’s survey asking plaintiffs to compare what they would pay for a truly healthy light cigarette and one that was misrepresented “conceptualize[d] the impossible.”). In preparing or cross-examining the expert, counsel should consider whether the assumptions underlying the expert’s opinion are based on facts that can be supported by either direct or circumstantial evidence that could reasonably support the inference that the assumption is true. Consider also whether the expert’s opinion successfully takes into account or rules out all of the possible variables.

Is the opinion relevant to the issues presented in the case? Even if the expert opinion is based on an accepted methodology, is supported by valid data, and rests on assumptions that are supported by facts in the record, the opinion may still be invalidated if it does not address an issue that is actually presented in the case. The result in Comcast Corp. v. Behrend, 133 S. Ct. 1426 (2013), is an example of a situation in which the expert’s opinion was insufficient to support class certification because it did not address the right question. The question was whether any of the allegedly anticompetitive conduct caused a price impact, whereas the expert’s opinion was that there was a price impact created by a combination of factors that included both the anticompetitive conduct and conduct that the court had previously found would not support an antitrust violation.

With the ever-increasing role of big data in modern society and the exponential expansion of computing power, issues relating to the use of statistics in class actions are only likely to increase as time goes by. Look for the courts to continue to grapple with this concept in the coming years. In this environment, attorneys who are well versed in statistical methods and their limitations and experts who have an understanding of the legal landscape will have an advantage over their opponents in class action litigation.


Paul G. Karlsgodt and Patrick T. Lewis are partners at BakerHostetler. Bonnie McNee is an associate at BakerHostetler.

Copyright © 2018, American Bar Association. All rights reserved. This information or any portion thereof may not be copied or disseminated in any form or by any means or downloaded or stored in an electronic database or retrieval system without the express written consent of the American Bar Association. The views expressed in this article are those of the author(s) and do not necessarily reflect the positions or policies of the American Bar Association, the Section of Litigation, this committee, or the employer(s) of the author(s).