The presidential election of 2020 will retain its place in front-page news for months to come. However, less sensational are the numerous, important state referendums on issues that have immediate and direct effects on citizens. Among these was the vote against California Proposition 25, or SB 10, initially introduced in 2016. See SB 10: Pretrial Release and Detention, Cal. Cts. Crim. Just. Servs.. Were the Proposition to have passed, it would have ended the cash bail system in California and replaced it with the further integration of risk assessments in the state’s arraignment system. The near-even split to reject SB 10 is not only meaningful in its powerful support for cash bail, but also in highlighting the debate around the rightful place of risk assessments in criminal justice.
Bail reform has long been a lightning-rod issue in the criminal justice reform movement in the United States. In 2016 it was reported that two-thirds of California’s incarcerated population is in pretrial holding for an inability to make bail; in other words, these individuals have not been convicted of a crime. See Annual Jail Profile Survey, Cal. Bd. of State & Cmty. Corr. This statistic is weighty with implications, as it is also well established that pretrial detention increases the likelihood of a guilty plea, negative trial outcomes, and acceptance of plea deals, for both the guilty and innocent. None of these are unlikely outcomes, as the average bail in California is set at $50,000.
However, in practice, bail reform has been fraught with difficulties. States have made well-publicized strides in tackling the cash bail system in recent years, with mixed results. New York passed a bill to abolish cash bail for misdemeanors and nonviolent crimes in 2019, but shortly thereafter rescinded it amid political pressure in 2020; whereas Colorado and New Jersey have been making substantial and less-contested progress in abolishing their own cash bail systems. See Jamiles Lartey, New York Rolled Back Bail Reform. What Will the Rest of the Country Do?, Marshall Project (Apr. 23, 2020). In 2011 Kentucky implemented a bail reform package that requires judges to consider an initial risk score before setting financial bond. The reform attempt in California is notable not just for the sheer size of its prison population or commercial bail industry, but also in its decision to include algorithmic risk assessment at the center of its reform. This created an unlikely marriage between the powerful cash bail industry and civil rights groups opposing risk assessments in criminal justice, causing the failure of Proposition 25.
Enter the Machines
The reticence of California voters to welcome risk assessments into their courts is not due to the undefined specter of new technology. Risk assessment programs are already in use in at least 49 of California’s 58 counties and found in courtrooms across the US. Advocates for the tool claim that its use is far more objective than decisions made by individual judges and exceedingly more efficient for an already overtaxed criminal justice system. While studies certainly do exist to support the accuracy of risk assessments, the effect of their real-world application in a criminal matter is much less straightforward.
The example of the well-known program COMPAS (Correctional Offender Management Profile for Alternative Sanctioning) can be taken to illustrate the distinction between accurate and fair. Proponents of COMPAS rightly point to a program that is calibrated to produce equally successful results for Black and white defendants. Though the statistical accuracy of predicting which individuals will reoffend is impressive, a closer appraisal of the incorrect assessments exposes inconsistencies. More specifically, a breakdown of false positives and false negatives illustrates an uneven distribution of errors across race groups. A study conducted by the Center for Court Innovation found that though the tool shows high success rates, the rate of false positives (wrongly detaining one who does not reoffend) was higher for Black individuals; while conversely, the rate of false negatives (releasing a future reoffender) was higher for white individuals. See Sarah Picard et al., Beyond the Algorithm: Pretrial Reform, Risk Assessment, and Racial Fairness 4 (2019).
The mismatch between measured accuracy and real-world effects on individuals is rather stark. One explanation for this result is the use of police statistics. If the risk analysis is mathematically accurate and the algorithm operating effectively, one may infer that the apparent bias is a result of the data used. Most risk assessment programs prioritize prior arrests, missed court appearances, and convictions to determine the appropriateness of pretrial detention. These data therefore do not accurately reflect an individual’s criminal behavior, but rather policing or prosecutorial behavior. For instance, during the era of “stop and frisk” in New York City in which Black and Latino men were the target of egregiously numerous unlawful stops, arrest data for the same period reflect a higher proportion of arrests for these groups. The distinction between arrest and commission of crime is important. It follows that by using arrest data for pretrial risk assessments, such as that in New York, determinations may accordingly affect particular groups differently.
Evaluating the accuracy of risk assessments also requires an understanding of a variety of programs. In California, as in other jurisdictions, these programs are not standardized. Many risk-assessment programs are developed for other purposes such as predicting recidivism, but then they are also used to determine whether someone will appear for an arraignment, in other words, to determine flight risk. Though pretrial detention is to be a last resort for ensuring the appearance of the defendant as well as public safety, these are in fact two distinctive determinations. See ABA Standards for Criminal Justice: Pretrial Release, 3d ed. (2007). Flight risk is often conflated in many programs with a post-release return to violence or an arrest. The lack of distinction is further misleading as numerous studies focusing on flight risk show that the majority of individuals who do not appear for hearings are not willfully absent and are likely to attend future hearings. It is quickly clear how assessments that achieve statistical accuracy may not necessarily translate to correct results. See Bernard E. Harcourt, Risk as a Proxy for Race: The Dangers of Risk Assessment, 27 Fed. Sent’g Rep. 237 (Apr. 2015).
Finally, judicial review adds another layer of variation to the precision of using risk assessments. As required by US law, pretrial detention must be justified by a written statement of reasons based upon findings of fact and must be immediately reviewable. See United States v. Salerno, 481 U.S. 739 (1987). Judges therefore retain copious discretion. Under the Kentucky reform, for instance, there is a “presumptive default” that low- and medium-risk individuals will not be required to post financial bond in exchange for release. However, one study has found that the default is more often overruled for Black individuals than for white individuals. See Alex Albright, If You Give a Judge a Risk Score: Evidence from Kentucky Bail Decisions, Harv. John M. Olin Fellow’s Discussion Paper No. 85 (May 2019). A controlled experiment in California similarly found that judges are more likely to rely on a negative assessment and discard a positive finding. See Matt Henry, Risk Assessment: Explained, The Lab (Dec. 14, 2020). In these cases, it may be inferred that judicial bias and statistical inaccuracy may work together to achieve incorrect results. Where a judge may conform to the exact letter of a risk assessment, one must also question the labels used in forming a risk classification. Low, medium, or high risk may be arbitrarily defined, or formulated in terms of either success or failure. Ambiguity is especially prominent in programs that do not strictly measure flight risk. Furthermore, in some instances, risk classification is duplicative of the information separately provided judges, skewing their cumulative evaluation.
In the end, there are a number of open questions that neither legitimate nor invalidate the use of risk assessments for determining pretrial detention. Like any other use of technology, these tools are a sum of their parts and will vary with implementation. Machines perfectly calibrated only offer results as good as the data used; human interpretation is circumstantial. We must therefore ask whether, and how best, machines fit within a system such as criminal law that is constructed on social norms. See Emre Bayamlioglu & Ronald Leenes, The “Rule of Law” Implications of Data-Driven Decision-Making: A Techno-regulatory Perspective, 10 L., Innovation & Tech. 295 (July 2018). Perhaps the answer requires a more targeted approach to applying information technology to legal constructs and a preference for machines calibrated to register the more human aspects of criminal risk assessments.
As courts in states across the country implement risk assessments throughout the criminal justice system, room for improvement continues to grow. Hybrid approaches suggest that these tools would be better used to determine appropriate alternatives to pretrial detention. The Justice Department’s National Institute of Corrections encourages the use of combined assessments for criminal justice procedures, rather than a full reliance on the technology. See Julia Angwin et al., Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And It’s Biased Against Blacks, ProPublica (May 23, 2016). Civil rights advocates assert that the effects of distorted data may be mitigated by adjusting the weight given to various assessment factors. Whatever the solution, it is clear that the use of risk assessments will not be short lived. We should therefore strive to ensure their implementation is measured and thoughtful.