In the “Abacus” case, on which the Oscar-winning film The Big Short was based, Goldman’s expert testified in opposition to class certification that there was “no evidence” that the disclosure of management’s misrepresentations caused Goldman’s stock price to fall. Declaration of Paul Gompers at *6, In re Goldman Sachs Group Inc. Sec. Litig., 1:10-cv-03461-PAC (S.D.N.Y. filed Apr. 6, 2015). The finder of fact may have been surprised to realize that the expert was not offering an opinion that the stock price did not fall due to the disclosure, or even that it was more likely to have not fallen than to have fallen. Instead, he was merely offering an opinion that he could not rule out the possibility that the stock price did not fall due to the disclosure.
Hypothesis tests, like those at the core of the Goldman testimony, are widely used powerful tools with the potential to provide reliable, relevant, and meaningful evidence to triers of fact. Yet, this potential is too often unmet because the testimony becomes so confusing as to distort the statistical truths beyond recognition.
A hypothesis test is a standard statistical method used to evaluate events of interest that are unknown either because they have not yet occurred or because they are not directly observable. Suitable events are those that can be described with two possible realities; thus, the hypothesis being analyzed in a hypothesis test is simply a statement that may be true or false. Examples include “tomorrow it will rain,” “tomorrow it will not rain,” “the DNA belongs to the defendant,” “the DNA does not belong to the defendant,” “the stock price fell due to management fraud,” and “the stock price did not fall due to management fraud.”
A hypothesis test results in a statistical conclusion. A hypothesis test assesses the hypothesis statement and results in a statistical conclusion. There are only two possible statistical conclusions to any hypothesis test: to reject the hypothesis or not to reject the hypothesis. Statistical conclusions are not predictions or estimates but rather reflect the application of a chosen benchmark to an analysis of the hypothesis.
To understand the difference, consider whether it will rain tomorrow. If a statistician is making a prediction, she will reject the statement “tomorrow it will rain” whenever the chance of rain is less than 50 percent. In contrast, if the statistician is evaluating her client’s planned outdoor wedding and is conducting a hypothesis test with the hypothesis “tomorrow it will rain,” she may not reject the hypothesis even when the chance of rain is much less than 50 percent. The tension here is between focusing on what is likely, which is when predictions are useful, and (roughly speaking) focusing on what is important to the decision maker, which is when hypothesis tests are useful.
There are two types of errors in hypothesis tests. There are two possible states of the world reflected in any hypothesis test: the world in which the hypothesis is true and the world in which the hypothesis is not true. If the statistician’s conclusion is incorrect in the first world, where the hypothesis is true, then it is called a “Type 1 error.” If the statistician’s conclusion is incorrect in the second world, where the hypothesis is not true, then it is called a “Type 2 error.” Thus, Type 1 errors always reflect incorrectly rejecting the hypothesis, and Type 2 errors always reflect incorrectly not rejecting the hypothesis. The statistician’s total error rate is essentially a weighted average of the Type 1 and Type 2 error rates, where the weight relates to the underlying probability that the hypothesis is true.
A simple way to see the relationship between Type 1, Type 2, and total error rates is to consider an example with a bag full of red and blue balls from which one will be removed and the hypothesis “the ball removed will be red.” If Dr. Red always predicts that the ball being removed will be red, then she will never reject the hypothesis, and she will always be correct when the ball is actually red. Thus, her Type 1 error rate equals 0 percent. In contrast, her analysis reflects a Type 2 error equal to 100 percent as she will always make the mistake of choosing red when the ball is actually blue. Now consider Dr. Red’s total error rate. If there is only one red ball and 99 blue balls in the bag, then Dr. Red will almost always be incorrect. In this example, it is easy to see that the total error rate is (.01 x 0 percent) + (.99 x 100 percent) = 99 percent.
Hypothesis test result reporting can be misleading. The reporting of hypothesis test results often focuses on Type 1 errors, minimizing or even excluding information about the Type 2 error rate and the total error rate. This can be so misleading as to distort the essential evidence learned through the hypothesis test analysis. In the illustrative example of Dr. Red, it is equivalent to highlighting Dr. Red’s supposedly impressive capacity to identify that the ball is red when the ball is red without acknowledging that Dr. Red has no capacity to correctly identify that the ball is blue when it is blue or that her overall error rate is 99 percent. This out-of-context focus on Type 1 error is a leading factor in the misinterpretation and misuse of statistical evidence in litigation.
Designing Hypothesis Tests: Choosing Between Scylla and Charybdis
Two key factors that are part of the design of any hypothesis test are the choice of the hypothesis in the affirmative or the negative and the selection of a benchmark (technically reflected in a “level of statistical significance” that is usually termed “alpha”).
The benchmark acts as a threshold for a statistician’s conclusion. As an essentially mechanical matter, when conducting a hypothesis test, the statistician compares a computed number from her analysis to her chosen benchmark and must reject or not reject the hypothesis statement on the basis of whether the benchmark is higher or lower. Thus, the benchmark is effectively a threshold that triggers her conclusion. Further (and this is the “aha” moment of hypothesis testing!), the benchmark is itself the Type 1 error rate. Thus, the statistician’s benchmark choice reflects the quantity of Type 1 error that she deems acceptable given the purpose of the study.
Unfortunately, Type 1 and Type 2 error rates are tightly linked: every reduction in the Type 1 error rate comes at the cost of an increase in the Type 2 error rate, and vice versa. However, the relationship between Type 1 and Type 2 error rates is not “one for one” but rather depends on the data. For example, a small reduction in the Type 1 error rate (say, 5 percent) can lead to a manyfold increase in the Type 2 error rate (say, 50 percent). Through her choice of benchmark, the statistician must carefully sail the waters between two evils, like Odysseus placed between Scylla and Charybdis.
Statisticians choose the wording of the hypothesis statement. The expert must choose whether to state the hypothesis so that the proposition is true or so that it is not true.
Consider a doctor evaluating blood test results for patients who may or may not have a disease. The statistician must choose either “the patient has the disease” or “the patient does not have the disease” as her hypothesis statement.
Suppose the disease is severe and the treatment benign. In this case, the doctor will want a hypothesis test that is designed like a fishing net with a tight weave—a net that will “catch” (identify as having the disease) virtually everyone with the disease. The statistician can accomplish the doctor’s goal by choosing the hypothesis statement “the patient has the disease” and a Type 1 error of 1 percent. With this design, if the patient does have the disease, there is a 99 percent chance that the statistician will not reject the hypothesis and the patient will receive needed treatment.
Now suppose the disease is mild and the treatment potentially dangerous. In this case, the statistician will keep the Type 1 error at 1 percent but flip the hypothesis, choosing “the patient does not have the disease.” This way, if the patient does not have the disease, there is a 99 percent chance that the statistician will not reject the hypothesis and the patient will not receive unnecessary and risky treatment.
The appropriate hypothesis test when a party has the burden of proof is to choose a hypothesis statement that negates the thing trying to be proved, along with a relatively low Type 1 error rate. Once the hypothesis test has been properly designed, the statistician conducts the test to see if the results support the thing trying to be proved. If she rejects the hypothesis, then the result will support the burden; and if she does not reject the hypothesis, then the result will not support the burden.
Expert Testimony in the Abacus Case
Now we return to the Abacus example. Here, as is typical in securities fraud litigations, there are allegations that Goldman made material misrepresentations that influenced trading behavior and impacted investors’ economic outcomes. Statistical experts are retained in these matters to analyze security price reactions to announcements revealing these alleged misrepresentations (disclosures). The impact of disclosures on securities prices is important to arguments regarding class certification, loss causation, and economic damages.
Defendants try to show that the security price did not decline upon disclosure. The relevance of the price impact of disclosure for class certification has been fiercely litigated in recent years. Since 1988, plaintiffs have invoked the fraud-on-the-market presumption established by the Supreme Court in Basic Inc. v. Levinson, 485 U.S. 224 (1988), allowing plaintiffs to prove reliance through an analysis of market efficiency rather than requiring evidence at the individual plaintiff level. More recently, the Supreme Court upheld the fraud-on-the-market presumption in Halliburton Co. v. Erica P. John Fund, Inc.,134 S. Ct. 2398 (2014), but ruled that the presumption could be rebutted by defendants at the class certification stage by a showing that the alleged misrepresentations did not impact the price.
Plaintiffs try to show that the security price did decline upon disclosure. Plaintiffs usually argue that the company’s misrepresentation led the price of the security to be artificially inflated and, further, that the amount of this inflation relates to the quantum of the price decline upon disclosure. Plaintiffs must provide evidence linking the decline to the alleged fraud and apportion the decline appropriately among causal factors such as unrelated company information or industry-wide news. Dura Pharm., Inc. v. Broudo, 544 U.S. 336 (2005). Plaintiffs contend that the price decline upon disclosure that is attributable to the fraud represents damages to investors who purchased the security at the inflated predisclosure price and sold it at the noninflated postdisclosure price. Thus, at the loss causation and damages stage, statistical experts are asked to opine whether the disclosure did impact the price of the security.
The hypothesis test in the Abacus case was not properly designed. To defeat class certification, Goldman had the burden of proving that the price of the security did not change due to the disclosure. An appropriate hypothesis test design to achieve this purpose is the hypothesis statement “the price of the security did change due to the disclosure,” with a Type 1 error of 5 percent. With the hypothesis so designed, the statistician’s results will be relevant to the task at hand. If she rejects the hypothesis, then the defendant will have meaningful evidence to support its opposition to class certification.
The Goldman declaration provides little meaningful discussion to explain how the hypothesis test underlying the expert’s conclusion that “there is no evidence that a corrective disclosure . . . had a negative impact on Goldman’s stock price” was designed or why this design was appropriate for an analysis of class certification. Goldman declaration, supra, at 6. Review of exhibit 2 of the Goldman declaration indicates that the expert chose the hypothesis statement “the price of the security was not impacted by the disclosure” and Type 1 error equal to 5 percent. Upon conducting the hypothesis test, the Goldman expert concluded that he could not reject the hypothesis.
The testimony of Goldman’s expert was offered in opposition to class certification, wherein the burden is on the defendants to prove that the price of the security was not impacted by the disclosure. However, the design of the expert’s hypothesis test was not suited to achieve that objective. Rather, it was suited to evaluating whether the stock price was impacted by the disclosure. Thus, the expert’s results would have been valuable in an effort to defeat the plaintiffs’ argument at the loss causation and damages stage but did not provide meaningful evidence in support of defeating class certification. Fortunately, from the perspective of correctly applying statistical evidence, the court seized upon the distinction and certified the class, indicating that Goldman did not meet its burden by stating that the “burden of proving a lack of price impact falls on the defendant.” Opinion and Order at *6, In re Goldman Sachs Group Inc. Sec. Litig.,1:10-cv-03461-PAC (S.D.N.Y. Sept. 24, 2015).
Hypothesis tests can provide reliable, relevant, and meaningful evidence to the finder of fact. Unfortunately, expert testimony is often rife with misinterpretations of the evidence. Cross-examination of the expert on the topics outlined below can greatly enhance and clarify the statistical evidence:
- The purpose of the hypothesis test
- How the design of the hypothesis test advances that purpose
- The probability that the hypothesis statement is true
- The probability that the test design would result in her coming to the statistical conclusion that she in fact reached
- The probability that the test design would result in her coming to a correct statistical conclusion
- What level of alpha would have minimized the statistician’s overall error rate and how her results would have changed had she chosen that level of alpha for her benchmark
- How the statistician’s results would have changed if she switched the hypothesis from its current statement to the negation of the current statement
Laura Robinson is the managing director at Navigant Consulting, Inc.
Navigant Consulting is the Litigation Advisory Services Sponsor of the ABA Section of Litigation. This article should be not construed as an endorsement by the ABA or ABA Entities.