chevron-down Created with Sketch Beta.
February 26, 2024 5 minutes to read ∙ 1200 words

Security and Best Practices for Lawyers Using Generative Artificial Intelligence and Large Language Models

Girish Chiruvolu

Generative artificial intelligence (GenAI) offers the potential to greatly increase the automation of routine tasks that traditionally are repetitive and time-consuming for lawyers. Particularly important in this regard are large language models (LLMs), a type of GenAI that is trained to understand and process human language by learning patterns and associations from large amounts of text data. Key capabilities of LLMs include (1) the ability to generate textual responses similar to humans along with contextual awareness and (2) strong problem-solving and decision-making abilities using text-based information for tasks. (While LLMs are a subset of GenAI tools, for the rest of the article, the terms GenAI and LLMs are used interchangeably.)

From expedited document review to refined contract generation, talent acquisition, conflict resolution, and legal research, LLMs are revolutionizing how legal professionals approach their work.

GenAI applications in law practice are still maturing, however, and are not without risk.

“Hallucination” and Untrustworthy Results

In one of the best-known examples of this risk, plaintiff’s counsel in Mata v. Avianca, Inc., 22-cv-1461 (PKC) (S.D.N.Y. Jun. 22, 2023), were found to have submitted a brief that contained references to nonexistent cases. The two attorneys were sanctioned after admitting that some of the nonexistent cases and references could be attributed to ChatGPT, the LLM they had used for their research, and that they were unaware that its content could be false.

As the example above demonstrates, LLMs have been shown to “hallucinate,” responding to prompts with information that is entirely fabricated. And even when the results are factual, they might not be current; laws are constantly updated and reinterpreted by the courts. Legal professionals who employ LLMs must adopt strict protocols for cross-checking and verifying the information provided by LLMs.

Confidentiality and Data Security

Additionally, the importance of client confidentiality in legal work raises legitimate questions about the use of open-source LLM applications such as ChatGPT in the legal industry. Attorneys face the risk of disclosing confidential data when feeding information about cases, clients, and even law firm personnel into an LLM.

LLMs present several novel security threats. The Open Worldwide Application Security Project (OWASP) recently released its list of the “Top Ten” threats:

1.      Prompt injection. Manipulating LLMs via crafted inputs can lead to unauthorized access, data breaches, and compromised decision-making.
2.      Insecure output handling. Neglecting to validate LLM outputs may lead to downstream security exploits, including code execution that compromises systems and exposes data.
3.      Training data poisoning. Tampered training data can impair LLM models, leading to responses that may compromise security, accuracy, or ethical behavior.
4.      Model denial of service. Overloading LLMs with resource-heavy operations can cause service disruptions and increased costs.
5.      Supply chain vulnerabilities. Depending upon compromised components, services or datasets undermine system integrity, causing data breaches and system failures.
6.      Sensitive information disclosure. Failure to protect against disclosure of sensitive information in LLM outputs can result in legal consequences or a loss of competitive advantage.
7.      Insecure plugin design. LLM plugins processing untrusted inputs and having insufficient access control risk severe exploits like remote code execution.
8.      Excessive agency. Granting LLMs unchecked autonomy to take action can lead to unintended consequences, jeopardizing reliability, privacy, and trust.
9.      Overreliance. Failing to critically assess LLM outputs can lead to compromised decision-making, security vulnerabilities, and legal liabilities.
10.  Model theft. Unauthorized access to proprietary large language models risks theft, competitive advantage, and dissemination of sensitive information.

The good news is there’s no need to reinvent the wheel as far as managing the cybersecurity risks when using LLMs. Most precautionary and security measures are tried-and-tested best practice security tips: anonymization/obfuscation, encryption, access control with authentication, and authorization. They just need updating and tweaking for the AI world.

Managing Risk When Using LLMs

If your organization is keen to start tapping the potential of generative AI for competitive advantage, consider the following to mitigate some of these risks:

Choice of Vendor

As you would with any supplier, verify that the company providing the LLM follows industry best practices around data security and privacy. Look for vendors that isolate your data as much as possible from other tenants using the platform. Ask the following questions:

  • To train the LLM, would the vendor mix your firm’s data with that of other tenants who are simultaneously using the platform?
  • Would other tenants on the platform be offered responsive outputs based on the inputs from your firm?

Some law firms are addressing these risks by developing their own LLM applications, trained on their own databases and customized for each case. Unlike ChatGPT, which is trained on publicly available data from the Internet, the AI tool Harvey is trained on legal data. Once engaged with a law firm, it can be trained by the firm’s own work product to automate document drafting and research tasks along with applications to improve the accuracy of predicted outcomes.

Data Anonymization

Consider anonymization techniques to protect the privacy of individuals who could be identified in the datasets used for training the LLM. A simple way to achieve this is to use a mapping table that records both the original data and the anonymized data. Below are a few examples of mapping methods (by no means an exhaustive list) by which the client name “Baker” could be anonymized into a “handle” (transformed text):

  • Hash method: 6723ADEEF
  • Masking method: B***r
  • Tokenization method: 175384648

The handle would be inserted in place of “Baker” wherever it appears in the case document to be used by LLMs to process and train. The mapping table would then be used to reverse the process in LLM-generated text, with each instance of the handled being restored to the word “Baker.” Mapping tables can be pre-generated and stored for reference. Such anonymization can also defend against “cross-tenant” attacks that use inferences based on another tenant’s data to attack your data.

Security Hygiene and Training

Use good anti-virus and anti-malware protection to ensure that your devices and systems are not compromised by malware. Data from a compromised device can be injected into LLM platforms, amplifying the attack across multiple tenants.

Enhanced Access Controls

Strong passwords, multi-factor authentication (MFA), and “least privilege” policies will help to ensure that only authorized individuals have access to the generative AI model and back-end systems.

Conclusion

While GenAI and LLM technologies offer boundless opportunities, adoption of the technologies for law practice is still in its infancy due to substantial risks, including fabricated/unverified responses, data leaks, and malicious changes to the information in responses. The suggestions presented in this article can help minimize such risks for law practitioners.

    Entity:
    Topic:
    The material in all ABA publications is copyrighted and may be reprinted by permission only. Request reprint permission here.

    Girish Chiruvolu ([email protected]), PhD, MBA, CISSP, CISM, is a cybersecurity practitioner with more than a decade of experience in running information security and risk management programs for Fortune 500 companies such as Thomson Reuters, Citi Bank, Capital One, and Experian. He is a subject matter expert in authentication, application, and risk management within industry frameworks such as NIST and ISO. He holds more than 22 patents in the cybersecurity, IT, and telecom fields. He has taught cybersecurity and computer science courses at the graduate level at Southern Methodist University Dallas, the University of Texas at Dallas, and the University of Dallas and has been a regular speaker at cybersecurity events and conferences.

    Published in GPSolo eReport, Volume 13, Number 7, February 2023. © 2024 by the American Bar Association. Reproduced with permission. All rights reserved. This information or any portion thereof may not be copied or disseminated in any form or by any means or stored in an electronic database or retrieval system without the express written consent of the American Bar Association. The views expressed in this article are those of the author(s) and do not necessarily reflect the positions or policies of the American Bar Association or the Solo, Small Firm and General Practice Division.