chevron-down Created with Sketch Beta.
May 09, 2024 Feature

AI and the Courts: A Look at the 2022 NIST-Funded AAAS Project Providing AI Guidance to Judges

Alain Norman

Recently, it seems that questions about the use or outputs of artificial intelligence (AI) in the legal field have exploded: When and how should AI be used by lawyers and courts? How trustworthy are AI outputs, and can these be explained? What about possible biases in the data used to develop an AI “tool”? Could AI help with certain types of court proceedings? How might AI relate to jury selection? Could AI someday replace human judges? What is “AI,” anyway?

The American Association for the Advancement of Science (AAAS) undertook a project in 2022 to develop materials for judges on AI. This project, funded by the National Institute of Standards and Technology (NIST), resulted in a number of papers and some podcasts, covering a wide variety of topics, ranging from basic concepts and definitions to how outputs from AI might—or might not—serve as trustworthy evidence under the Federal Rules of Evidence. Given that most of the materials are papers, this article seeks to provide, in writing, the highlights of the three podcasts, generated during the project, that centered around how AI-based tools are being used in the legal profession and how AI could affect decision-making by courts.

AI and Risk Scores

Each podcast brought together legal and AI experts, who answered questions and engaged in lively discussions. The first podcast looked at “risk scores”—tools that, on the basis of certain criteria, generate probabilities as to the risk of a given person either suffering some harm or engaging in harmful behavior. Among the key points made by the panelists was that a distinction ought to be made between risk scores that might support the provision of social services and risk scores that might be utilized in the course of legal proceedings, such as pre-trial release or, perhaps, sentencing.

Our experts noted, however, that there is a lack of standards regarding the factors that go into the “secret sauces” of risk scores. At the same time, even if risk scores used in legal proceedings include few factors (vice risk score formulas used for social service determinations), such factors might inadvertently reflect racial or other biases (e.g., using a person’s zip code can be a proxy for socioeconomic status; also, a person’s arrest record needs to be distinguished from a person’s conviction record, if any).

Panelists underscored that although risk scores—which often are just statistical assessments, not requiring AI—are predictive, a human (e.g., a judge) must still consider the costs and benefits—for the individual and society—of acting upon the risk score in one manner or the other (e.g., pre-trial detention or release). Importantly, work is underway to incorporate “mitigating” factors into risk scores related post-conviction release; such factors might include whether an incarcerated person completed their education, and/or exhibited good behavior, while in jail. Such factors may be termed “dynamic” to reflect whether or how an incarcerated person changed (for the better) over time.

AI in the Legal Profession

In the second podcast, AAAS brought in persons whose work involves AI tools to help law firms deal with otherwise traditional forms of work, particularly related to litigation, including discovery, assessing the terms and scope of contracts, and developing legal theories or arguments. Key take-aways from our panelists included that AI is clearly superior to humans at going through vast amounts of information—and doing so more rapidly than humans—to identify patterns that can help firms prepare for litigation. Indeed, lawyers need, as the ABA has long advocated, to maintain “technological competency.” This is likely to include the use of some kind of “technology assisted review” (TAR), to avoid professional malpractice.

Importantly, panelists repeatedly noted that “all major search engines” can handle synonyms. That means these tools are capable of searching for, and identifying, concepts—not just performing keyword searches. To create a simplified example, an AI tool—being used in assisting with “e-discovery”—might not be limited to finding the word “glad” or “happy” in a mass of data, but rather it might be able to find passages that seem relevant to the concept of joy or satisfaction. Indeed, AI tools can even perform “sentiment analysis,” i.e., help to determine, by evaluating an employee’s emails, whether the person’s messages were likely to be “sarcastic,” “serious,” or something else. This, said the panelists, can be important in flagging potential cases of harassment or misconduct in internal investigations.

Other uses of AI tools—given that they never tire and can rapidly process vast amounts of data to find patterns—include keeping tabs on myriad websites to detect possible violations of intellectual property; keeping abreast of new or changing regulations that might affect a business; reviewing large numbers of contracts to assess the totality of a given company’s contractual obligations; flagging possible plagiarism; and assessing documents and/or legal precedents to help identify new or better lines of argument or bases for litigation.

Yet, our panelists also felt certain that humans remain needed to double-check the results of AI tools. Indeed, humans will remain necessary, particularly for understanding and acting upon subtle, novel, or exceptional issues. As regards how AI tools might affect the future of legal work, companies developing AI “solutions” take the position that AI will help relieve humans (e.g., first-year associates) from “numbing” tasks in order to focus on “higher-value” analysis.

In sum, panelists suggested that some form of “conjoint” or hybrid decision-making is likely to be the best approach. Nevertheless, law firms will have to figure out what constitutes—for them and their clients—the right “mix” of AI and human intelligence to achieve their goals.

Also during the second panel discussion, the question arose of what judges are to make of AI-revealed patterns (from reams of data) that might be offered in support of a given party’s contentions. Put another way, a court might wonder what weight to place on the proffered information or finding, and that, in turn, might depend on whether the judge seeks to inquire as to how well the data were “coded” in the first place. For companies creating AI tools to assist with document review or data analytics, this becomes a question of “quality management control”—where humans and AI are “pitted” against each other in the process of “training” the AI (or machine learning) tool such that it can achieve a good balance between “over” or “under” capturing seemingly relevant information. This has been termed the “F1” score—the harmonic mean.

Leaving aside the possibility that AI tools might become useful to courts in managing their heavy workflows—in ways perhaps analogous to how law firms incorporate AI into their work, as indicated above—one way that AI-backed search engines are already intersecting courtrooms involves jury selection. That is, services now exist to find and assess the social media history of potential jurors—just as is being done in the context of checking on insurance claims or potential employees. Panelists indicated such probing of potential jurors’ social media presence is, currently, allowed in every jurisdiction, but courts might become concerned about manipulation of the voir-dire process and/or individuals’ privacy, over time.

AI and Decision-Making in the Justice System

In the third podcast, panelists peered into the future: Will AI replace human judges and/or jurors? There are three possibilities, broadly speaking: (1) AI-powered systems might replace humans at certain stages of legal proceedings or to perform certain tasks; (2) AI-powered systems might be rejected because of concerns about bias and/or insufficient “explainability” or transparency; or (3) a hybrid approach arises, whereby AI augments humans’ abilities. In this regard, our experts opined, if American judges’ reticence to leverage court-appointed experts were overcome, AI’s ability to analyze vast amounts of data might even help courts to assess the comprehensiveness, if not also accuracy, of testimony from witnesses who are—in our adversarial system—necessarily prone to provide their otherwise truthful testimony in the light most favorable to one side of a case.

Already, AI-backed tools are being used in connection with aspects of courts’ work. For instance, AI’s capabilities power both legal research and “judicial analytics” (i.e., the thorough review of judges’ rulings and perhaps their social media profiles). Indeed, the advent of “natural language processing” is powering the ability of law firms to find, e.g., the words that seem most effective in swaying a given judge. Further, algorithms are used to help perform DNA matching—and such tools are regarded as reliable. At the same time, AI might not be able to salvage the trustworthiness of questionable forensic “sciences” such as bitemarks. Certainly, bias in facial recognition systems is already a matter of public debate.

Yet, these examples beg the basic question of whether rules or laws exist that establish whether or when courts should use AI tools in civil or criminal matters. Our experts said that, so far, only “tentative guidelines” seem to have been issued in some jurisdictions, such as the ethical considerations put forth by the Council of Europe. Indeed, as one panelist put it, “Expecting legal systems to foresee when AI should be used would be ambitious.” Nonetheless, the use of AI seems rampant in the context of administrative adjudications, and the Administrative Conference of the United States has published a report on this.

So, studies are ongoing as to when AI might helpfully replace somebody in the judicial system, given AI’s strengths as regards to “data crunching,” which may facilitate certain types of fact-finding, a traditional function of judges and juries. Yet, challenges exist in trying to assess less numerically based information.

Indeed, a core question that NIST and others seem to wish to have answered is this: Would it be technically possible to build AI tools in such a manner that human values (e.g., justice and equity) are incorporated into those tools? For now, per our panelists, that question is being debated but is not yet answered.

Beyond the possibility that clever developers of algorithms might somehow build desirable values into AI “solutions” for use in legal proceedings, there lies the question of whether people will accept decisions made by machines. The answer to this is also unclear. On the one hand, there may exist a tendency for people to accept “findings” that, because they come from an advanced technology, appear to be more accurate or otherwise “better.” On the other hand, human juries can nullify laws—a power viewed as a notable component of our system of checks and balances—but that is something AI systems would likely prove unable to allow.

Thus, our experts’ discussion indicated only minor disputes—where the facts are agreed upon and/or the financial consequences are relatively modest—will likely prove most amenable to resolution using AI-backed decision-making. Already, AI-facilitated dispute resolution exists in Great Britain, for cases involving amounts capped at £25,000. Also, online dispute resolution (ODR) may have a place in cases where one party (e.g., a tenant) lacks the resources to obtain the assistance of counsel. If people cannot otherwise obtain redress, a system with little or no human involvement might prove acceptable.

Yet, in “complex” cases—or where the stakes are high or the risk of significant harm in the case of error is high—reliance mainly on AI will probably remain unacceptable. Indeed, even as regards the use of AI tools in “simple” cases, our experts advised that there be mechanisms for appealing AI decisions to a human panel or court. That is because, again, the databases on which such AI systems are built are “noisy”; in computer scientist terms, they are not perfect.

The Responsibility to Prepare for AI in the Law

Key overall take-aways include the following: AI is very good at analyzing large amounts of data to find patterns, which is something humans do, but which AI tools can achieve much faster; however, AI-revealed “correlations” or predictions do not necessarily constitute proof. Accordingly, humans are, and likely will remain, crucial to making final determinations as to the import or weight of information derived from AI tools. Meanwhile, studies are being conducted at “the nexus of computer and social science” to understand whether, or how, humans and machines might best be combined to achieve optimal “conjoined outcomes.”

Yet, it remains unclear whether or how human “values that are difficult to quantify” (e.g., justice, mercy, or equity) could be incorporated into AI tools. Certainly, NIST has suggested that we are all trying to define and address “socio-technical” challenges arising from the increasing use and sophistication of AI tools.

Indeed, AI’s role in legal proceedings has been, and will remain, complicated by very human limitations and concerns—for instance, certain human values or legal concepts (e.g., “beyond a reasonable doubt”) may be difficult to define; organizations may use an AI tool, designed for one function, in ways for which it was neither designed nor tested (i.e., “mission creep”); and humans often need to be trained on how to use a given AI “solution” and/or need a better grasp of statistical “uncertainty.” As regards that last point, lawyers and judges would do well, our experts noted, to consider not only the likelihood of an error arising from the use of an AI tool, but also the severity, or nature, of any harm arising from a possible (even if not likely) error.

Notwithstanding the many details to be sorted out, it is clear that AI is being rapidly and increasingly incorporated into legal work, administrative adjudications, and aspects of courts’ work. Accordingly, judges and lawyers must enhance their awareness of the presence, utility, and limitations of various AI tools: By better understanding what such tools can, or cannot, do and by better understanding how the data, development, and deployment of AI-backed tools might be questioned, these instruments can be leveraged responsibly to streamline workflow, identify possible issues or solutions, and—perhaps—contribute to improving the administration of justice.

    The material in all ABA publications is copyrighted and may be reprinted by permission only. Request reprint permission here.

    Alain Norman

    Alain Norman is an attorney who headed the Science and the Law Initiative of the Center for Scientific Responsibility and Justice at the American Association for the Advancement of Science for the past three years. He served for 22 years as a Foreign Service Officer with the Department of State and inter alia headed a regional office covering 15 countries in Latin America and the Caribbean. Prior to working as a diplomat, Alain established and ran the liaison office of the ABA’s Coalition for International Justice program in The Hague at the International Criminal Tribunal for the former Yugoslavia for three years.