For generations, appellate lawyers – like lawyers generally - have offered clients predictions of the future based upon a combination of experience, anecdotal information and (occasionally unjustified) optimism: this court or this panel tends to be pro-plaintiff or pro-defendant; if the court granted review, they’re reversing; the court never reviews unanimous decisions; or “it’s a good sign that the court asked so many questions at argument – they were really engaged.”
Today, we’re in the midst of a data analytics revolution, with seemingly a new vendor appearing every week. But this is far from being a new phenomenon. Academics have been using analytics to study judicial decision making for just short of a century, and the field is busier today than ever before.
The academic literature begins with Charles Grove Haines’s 1922 article in the Illinois Law Review. Haines reviewed thousands of cases of public intoxication from the New York magistrate courts. He found that one judge discharged only one of 566 cases, another 18 percent of his cases, and still another 54 percent. He suggested that the data showed that case results were dependent to some degree on the temperament, personality, education, environment and personal traits of the magistrates. In the early 1940s, C. Herman Pritchett published The Roosevelt Court: A Study in Judicial Politics and Values, 1937-1947. Based on detailed tables of agreement rates between various combinations of Justices, he argued that the sharp increase in dissent at the U.S. Supreme Court weighed against the traditional view that the law is an objective entity merely found and declared.
Another landmark in the literature, the U.S. Supreme Court Database, arises from the work of Professor Harold Spaeth. Today, thanks to the work of Spaeth and several colleagues, the Database encompasses more than two hundred data points from every Supreme Court case since 1791. The Database has been the foundation of most academic data analytic literature for a generation. It was also the model for my own work, which involved building databases for the Illinois and California Supreme Courts including well over one hundred data points for every case since 1990, databases which are the foundation for my two analytics blogs.
The data analytic approach began to attract attention in the appellate bar in 2013 with the publication of The Behavior of Federal Judges: A Theoretical & Empirical Study of Rational Choice. Then-judge Richard Posner and Professors Lee Epstein and William Landes applied various regression techniques to analyze a host of issues, including panel formation and dissent aversion.
Brief-Writing. Nearly all of the analytics literature on appellate briefs has focused on the role of amicus curiae briefs, particularly at state courts of last resort. In 2001, Professors Paul Brace and Kellie Sims Butler published a study for which they assembled data for amicus filings in all fifty state courts of last resort for all cases decided between the years 1990 and 2001. They concluded that in a total of nineteen states (Arkansas, South Dakota, Idaho, North Dakota, Iowa, Nebraska, Texas, Wyoming, Montana, Hawaii, Rhode Island, Arkansas, South Carolina, Maine, Nevada, Indiana, Virginia, West Virginia and North Carolina), less than 5% of cases resulted in amicus briefs. On the other hand, in five states (Oklahoma, Oregon, Michigan, New Jersey and California), 25% or more of all cases resulted in amicus briefs. My work on the California Supreme Court has demonstrated that this percentage has significantly increased in the years since 2001.
Scott A. Comparato studied data from seven state Supreme Courts for a 2003 study. He analyzed the distinctions between amici and party briefs and concluded that amicus briefs had become steadily more commonplace between 1965 and 1990.
Campaign contributions and expenses have risen substantially in judicial retention campaigns in most major states in the past two decades. A 2013 dissertation by Ryan J. Rebe concluded that both a preponderance of amicus briefs supporting appellants and a higher level of judicial campaign contributions from appellants’ amici increased the likelihood of a vote for appellants to a statistically significant degree.
Oral Arguments. Many studies of U.S. and state Supreme Court oral arguments have been published in recent years. The earliest study appears to be Sarah Levien Shullman’s 2004 article for the Journal of Appellate Practice and Process. Shullman analyzed oral arguments in ten cases at the United States Supreme Court, noting each question asked by the Justices and assigning a score from one to five to each question, depending on how helpful or hostile she considered the question to be. Once seven of the ten cases had been decided, she divided her observations according to whether the questioning Justice ultimately voted for or against the party. Based upon her data, she made predictions regarding the ultimate result in the three remaining cases. Shullman concluded that it was possible to predict the result in most cases by a simple measure – the party being asked the most questions generally lost.
John Roberts addressed the issue of oral argument the year after Shullman’s study appeared. Then-Judge Roberts (at the time, two years into his tenure on the D.C. Circuit) noted the number of questions asked in the first and last cases of each of the seven argument sessions in the Supreme Court’s 1980 Term, and the first and last cases in each of the seven argument sessions in the 2003 Term. Like Shullman, Roberts found that the losing side was almost always asked more questions. So apparently “the secret to successful advocacy is simply to get the Court to ask your opponent more questions,” Judge Roberts wrote.
Professor Lawrence S. Wrightsman, a leading scholar in the field of psychology and the law, took an empirical look at U. S. Supreme Court oral arguments in a 2008 book. Professor Wrightsman chose twenty-four cases from the Supreme Court’s 2004 term, dividing the group according to whether they involved what he called ideological or non-ideological issues. He then analyzed the number and tone of the Justices’ questions to each side, classifying questions as either sympathetic or hostile. Professor Wrightsman concluded that simple question counts were not a highly accurate predictor of ultimate case results unless the analysis also accounted for the tone and content of the question.
Timothy Johnson and three other professors published their analysis in 2009. Johnson and his colleagues examined transcripts from every Supreme Court case decided between 1979 and 1995 – more than 2,000 hours of argument in all, and nearly 340,000 questions from the Justices. The researchers isolated data on the number of questions asked by each Justice in each argument, along with the average number of words used in each question. The study concluded that, after controlling for other factors that might explain case outcomes, all other factors being equal, the party asked more questions generally lost the case.
Professors Lee Epstein and William M. Landes and Judge Richard A. Posner published their study in 2010. Epstein, Landes and Posner used Professor Johnson’s database, tracking the number of questions and average words used by each Justice. Like Professor Johnson and his colleagues, they concluded that the more questions a Justice asks, all else being equal, the more likely the Justice will vote against the party; and the greater the difference between total questions asked to each side, the more likely a lopsided result is.
Voting and Opinion Writing. Scholars have made considerable efforts to analyze issues involving appellate justices’ voting - including phenomena such as dissent aversion (a Justice joining a majority opinion notwithstanding his/her disagreement with the result or reasoning) and the impact of panel selection – as well as opinion writing.
In 2005, Professor Jennifer L. Peresie reviewed 556 Federal appellate decisions in Title VII gender discrimination and harassment cases between 1999 and 2001. Controlling for other factors, the judge’s gender increased the probability of a vote for the plaintiff from 22% to 41% in harassment cases and from 17 to 28% in gender discrimination cases. Controlling for ideology, sitting on a panel with a female judge increased the likelihood of a male judge voting for plaintiff by fourteen points in harassment cases and by nineteen points in discrimination cases. A 2015 study demonstrated that holding other factors constant, male judges with daughters were nine percent more likely to support plaintiffs in gender cases – a result the authors said was largely driven by a statistically significant increase in votes among Republican appointees.
In 2008, Professors Adam B. Cox and Thomas J. Miles published a study of every case decided under Section 2 of the Voting Rights Act since 1982. The study demonstrated a thirteen percent increase in the likelihood of a liability finding between a panel including one Democratic appointee and one with three.
Two years later, Judge Posner and Professors Epstein and Landes published a study called “Why (and When) Judges Dissent: a Theoretical and Empirical Analysis.” The authors showed that smaller courts tend to have lower dissent rates, presumably because of a higher cost in lost collegiality arising from questioning one’s colleagues’ reasoning in print. Not surprisingly, the likelihood of a dissent is strongly correlated with ideologically diverse panels.
Professors Jeffrey S. Rosenthal and Albert H. Yoon applied a technique known as stylometry to a database of Supreme Court and Circuit Court of Appeals decisions in “Judicial Ghostwriting: Authorship on the Supreme Court.” Stylometric algorithms focus on structural habits in a piece of writing which are independent of content, such as sentence structure and patterns of word and (especially) verb choice. The underlying theory is that an author’s habits don’t change significantly either over time or depending on what he or she is writing about, and thus a significant shift in the numbers indicates a different author. Stylometric algorithms have been used for a variety of purposes, from flagging pieces of writing suspected of plagiarism to identifying portions of unattributed Renaissance plays that were likely written by William Shakespeare. Professors Rosenthal and Yoon used their algorithm to identify significant shifts in stylometric data from opinions, attempting to identify which Justices and Judges wrote their own opinions (the ones whose data remained relatively constant over time) and which likely relied on clerks to draft their opinions (those whose data shifted significantly from one year to the next).
Forecasting Case Results. Academics have employed various statistical modeling techniques to attempt to predict appellate case results. Professors Andrew D. Martin, Kevin M. Quinn, Theodore W. Roger and Pauline T. Kim created a classification tree model for U.S. Supreme Court decisions to predict the outcome of every case in the October 2002 term. They matched their model, which was based on Justices’ votes in cases from the past eight terms and six observable characteristics of the pending cases, against predictions from a panel of legal academics and professionals. For the 2002 term, the statistical model performed significantly better than the legal experts at predicting case outcomes, correctly predicting 75% of the cases as opposed to 59.1% for the legal experts.
Professors Roger Guimera and Marta Sales-Pardo applied statistical techniques developed to analyze complex social and affiliation networks to predict Justices’ votes on the U.S. Supreme Court in a 2011 article.[19] The authors extracted from the Supreme Court Database the first 150 decisions each year from the first Warren Court in 1953 to the last Rehnquist Court in 2005. Using an algorithm accounting only for the votes of other justices on the case and the Court’s track record – but nothing about the legal issues in the case – the authors’ algorithm correctly predicted 83% of the Justices’ votes, as compared to 67.9% by legal experts working from the case files and 66.7% for content-based algorithms. The algorithm correctly predicted Justices’ votes even in 77% of cases divided 5-4. The authors found no evidence that Justices nominated by Democrats were consistently more or less predictable than Republican appointees, although curiously, the predictability of the Court trended downwards overall during Democratic presidencies.
Most recently, Professors Daniel M. Katz, Michael J. Bommarito and Josh Blackman built a model incorporating dozens of data points taken from the Supreme Court Database for every case of the 1946 to 1953 terms. Using a recent advance in traditional classification tree modeling techniques called “extremely randomized trees,” the authors then used the resulting models to predict both case results and Justice-by-Justice voting for every case from the first Warren Court in 1953 through the end of 2013. The authors’ model correctly predicted 69.7% of all case outcomes across the sixty-year test period and 70.9% of all Justice votes.
In recent years, predictive modeling has begun moving from the rarified air of Supreme Court dockets to predicting the results – whether jury verdict, settlement or dismissal – of more everyday cases. Professors Blakeley McShane, Oliver Watson, Tom Baker and Sean Griffith built a model using only variables known on the day of filing for 1,198 securities fraud lawsuits.[20] The model predicts both the likelihood of settlement and the settlement amount. The authors held out a randomly selected twenty-five percent of their dataset to test the validity of their model. For the in-sample data, 74% of actual settlements were within two standard deviations of the predicted amount. For the out-of-sample test data, 72% of actual settlements were within two standard deviations of the predicted amount.
Notwithstanding the advances in statistical modeling techniques, it is crucially important that case result modeling be approached by the bar with caution and humility. Although a great many clients have been using advanced analytics and projection techniques for well over a decade, other less mathematically sophisticated clients might not fully appreciate the limitations of predictive modeling. Furthermore, academics in a variety of different disciplines have demonstrated that although it is possible to make useful and valuable predictions about a wide variety of human behaviors, it is not possible to specify a sufficient number of variables to perfectly predict outcomes.
The analytics revolution is changing the business of law for good. In the coming years, the “datification” of the law will accelerate. More dockets will be partially or completely on-line. Data-scraping programs (which fuel the commercial analytics vendors’ databases) will improve. More analytics vendors will appear in the marketplace. And lawyers and academics will begin exploring new frontiers for their possible application to the law, such as sentiment analysis, affiliation modeling and game theory.
But more than anything else, the bar must develop a comfort level about working with analytics. One of the oldest rules of statistics is “correlation does not necessarily imply causation.” Several years ago, an analytics firm accepted a commission to study mechanical problems in used cars. They fit sensors on a fleet of cars, sent them out on the roadway and crunched the mountain of resulting data. And they determined that the best predictor of limited-to-no mechanical problems in a used car is: that the car be orange. Analytics isn’t going away, but it is imperative that the bar be able to distinguish genuine insights from orange used cars.