chevron-down Created with Sketch Beta.

GPSolo Magazine

GPSolo March/April 2025 (42:2): AI for Lawyers

Bias in AI Large Language Models: Risks and Remedies

David C. Donald

Summary

  • When used to analyze résumés or generate assessments of employees, large language models (LLMs) can potentially perpetuate historical hiring or promotion biases against protected classes.
  • LLMs employed in credit scoring or loan approval systems might encode historical patterns of financial discrimination.
  • Risk assessment tools powered by LLMs could perpetuate the racial biases present in historical criminal justice data, creating unfair bail decisions, sentencing recommendations, or parole determinations.
  • Traditional civil rights laws, including Title VII of the Civil Rights Act, the Fair Housing Act, and the Americans with Disabilities Act, can be applied to remedy discriminatory outcomes from LLM-based systems.
Bias in AI Large Language Models: Risks and Remedies
Zerbor/iStock via Getty Images Plus

Jump to:

Large language models (LLMs) have emerged as powerful tools across numerous sectors, from legal research to automated decision-making. However, these systems can perpetuate and amplify societal biases, raising significant concerns about fairness and discrimination in areas such as employment, banking, criminal justice, and health care. If unlawful decision-making arises, the legal profession must be ready to step up and take on this new challenge. This article examines the sources of LLM bias, the potential impact of bias across various applications, and the legal frameworks available to address resulting harm.

Origins of Bias in Large Language Models

LLMs can generate biased results through several mechanisms and for several reasons. The primary cause of biased results is bias in the training data; these models learn from vast collections of text culled from the Internet and other sources, which inevitably contain historical biases, prejudices, and stereotypes present in the culture at the time that the data was generated. For example, if historical employment records or management texts predominantly feature male pronouns when discussing management, or if legal texts consistently refer to judges or attorneys with male pronouns, an LLM may develop a gender bias in its content regarding management and legal professionals.

Additionally, bias can emerge from architectural decisions and training procedures made or used when IT professionals design or prepare the model for launch. The choices made by program developers regarding model architecture, training objectives, and data filtering can inadvertently encode certain biases. Even well-intentioned attempts to filter out problematic content can sometimes lead to underrepresentation or overrepresentation of minority perspectives or experiences.

High-Risk Applications for Potential Discriminatory Impact

Several applications of LLMs carry a particular risk of generating discriminatory outcomes. Perhaps the most visible is employment screening. When used to analyze résumés or generate regularly recurring assessments of employees, LLMs can potentially perpetuate historical hiring or promotion biases against protected classes. An LLM trained on historical hiring data could learn to associate certain names or educational institutions with “preferred” candidates, disadvantaging qualified applicants who are members of groups previously underrepresented in the historically “preferred” category. Persons having characteristics that place them in such underrepresented categories could then be passed over in decisions on both hiring and promotion.

A similar problem can affect decisions on lending and the rendering of other financial services. LLMs employed in credit scoring or loan approval systems might encode historical patterns of financial discrimination. The model could learn spurious correlations between suspect characteristics and creditworthiness, leading to unfair denial of services. This digital “redlining” would have serious financial implications for persons categorized as undesirable by the LLM.

Schema and decision-trees have been used for decades in criminal justice for matters such as sentencing and bail decisions. Risk assessment tools powered by LLMs could perpetuate the racial biases present both in such existing schema and in historical criminal justice data on the application of such schema, creating unfair bail decisions, sentencing recommendations, or parole determinations.

Management of health care has also been a concern for many years, and decisions made on the basis of that data can have serious implications. Medical diagnosis or treatment recommendation systems using LLMs might provide different quality of care based on suspect demographic factors, potentially because of the historical underrepresentation of certain groups in medical research and literature.

Legal Remedies for LLM Bias

Existing legal remedies used to challenge discriminatory outcomes are also available for use against such LLM applications. Traditional civil rights laws, including Title VII of the Civil Rights Act, the Fair Housing Act, and the Americans with Disabilities Act, can be applied to remedy discriminatory outcomes from LLM-based systems. Plaintiffs can argue that the use of biased LLMs creates disparate impact or perpetuates systemic discrimination against protected classes.

Consumer protection tools under the Federal Trade Commission Act are also available. The Act’s prohibition of unfair or deceptive practices could apply to companies that deploy biased LLMs without adequate disclosure or safeguards. State consumer protection laws could, in specific instances, be used to provide additional remedies. Moreover, depending on the people or activity affected by the use of an LLM, other regulatory agencies could potentially challenge the use of biased LLMs under existing authorities, such as an Equal Employment Opportunity Commission challenge of employment screening tools or Consumer Financial Protection Bureau opposition to financial services applications.

Ultimately, LLMs are also products licensed or sold to their end users. Novel theories of product liability may be found to apply to LLM developers who fail to adequately test for or mitigate known biases. This could include claims of design defect or failure to warn.

As the LLM is a product, procedure, or device and not a person capable of bearing accountability for a legal action, remedies would target one or more players in the LLM development and deployment chain. Model developers and those who create and train LLMs could face liability for negligent design or failure to implement adequate bias testing and mitigation procedures. Developers might also be liable for insufficient documentation of known biases or limitations.

Commercial users that deploy LLMs are another important link in the supply chain, as such firms could make an LLM available for general use even though it creates discriminatory outcomes. This is particularly relevant where the commercial user operates in regulated industries with specific non-discrimination obligations.

Upstream data providers that supply biased training data could also face claims that they failed to deliver representative datasets or included known discriminatory content without appropriate controls.

Proving a bias-based claim involving LLMs presents interesting challenges. First, establishing a causal link between model bias and specific discriminatory outcomes could require sophisticated statistical analysis and expert testimony. Second, counsel may well have to guide courts (and perhaps a jury) through highly technical questions about model architecture, training procedures, and bias measurement methodologies. Third, access to data, model parameters, and testing procedures would require the discovery process, which could be fought on the grounds of intellectual property rights.

Opportunities as Well as Dangers

As LLMs become increasingly integrated into decision-making systems across society, addressing their potential for bias and discrimination becomes crucial. Lawyers in both private practice and regulatory agencies will have to develop new tools, frameworks, and strategies for challenging discriminatory outcomes generated by artificial intelligence. While much of the discussion around AI has focused on how automation could take away jobs from lawyers and law firm staff, these same high-powered tools are also capable of generating an unlawfully discriminatory decision each and every second, which could create a significant new practice area to hold such behavior in check.

    Author