Summary
- AI can remove much of the drudgery from the e-discovery process, allowing you to focus on the substantive areas that matter most.
Every client wants a satisfactory resolution out of a lawsuit, whether it is a victory at trial, an advantageous settlement, or an immediate disposition. To reach these outcomes, you need to develop an explanation of what happened first. Next, you need to persuade the decision-maker—judge, jury, arbitrator, opposing counsel, or even your client—to adopt your explanation by delivering the most compelling story you can. You must accomplish all this knowing the opposing side is doing the same. Today, more than ever, you will rely on evidence that will include electronically stored information (ESI). The concept of finding and working with ESI is commonly referred to as “e-discovery.” What follows is an overview of how to incorporate artificial intelligence (AI) into e-discovery-related work and six technologies currently changing the face of e-discovery.
The first day you work on a matter, you start assembling stories by delving into the available evidence. At first, there will be much to work with—your client’s version of what happened, an interview or two, a few communications someone thought would be pertinent. As you move through the case, you will amass more information, and you will test that information, and your array of potential stories, against each other. Your objectives will be to reshape and refine the various stories you are working on, compare those potential stories with one another, and, ultimately, present the most compelling story you can, given the facts at your disposition.
Historically, attorneys developed cases by collecting evidence from three key areas. First, we drew information from people’s heads by interviewing witnesses, taking statements, conducting depositions, and examining those witnesses at hearings, arbitrations, and trials. Second, we turned to paper, where people recorded their thoughts, impressions, aspirations, and transactions. Finally, on occasion, we looked to tangible objections, such as the broken tread on an escalator.
Today, we turn to a fourth area for more varied evidence than was ever worked with before: ESI. ESI offers evidentiary possibilities lawyers only could dream of three decades ago. It also presents challenges we never contemplated.
The smart, successful, and modern practitioner of e-discovery has almost certainly incorporated some level of AI into their workflows while working with ESI. While not quite the “holy grail,” AI has gotten us closer to this ideal of working with e-discovery more than any earlier methods.
To formulate a compelling story, the very nature and volume of ESI require you to catalog and understand vast amounts of diverse data, even when much of that ESI will be useless to your cause. Failure to approach this work with due care threatens to waste limited time and other valuable resources, posing the potential of danger to lawyers. This danger is precisely why the culling of ESI is so critical to your efforts.
Traditional approaches to culling ESI have relied upon methods like searching for keywords and terms or filtering metadata. While these methods have performed reasonably well for decades, they are often imprecise. As data sizes have grown, so too has the volume of unresponsive material being swept up using these imprecise methods. E-discovery experts have long sought to find the “holy grail” of ESI—new, improved methods and tools to identify only relevant documents and forgo irrelevant documents—to address constrained budgets and compressed time lines.
AI is everywhere. It is a core backbone of modern tech and is used in everything from search engine suggestions to self-driving cars. While the technology is exciting, a little scary even, the truth is AI does not yet provide generalized intelligence or reasoning capabilities rendering humans obsolete. Instead, AI exploits what computers do best: quickly processing large datasets to build mathematical models. Those models are used to classify data, predict outcomes, and provide highly targeted information that humans (or other processes) can then use.
The fundamental and successful AI strategy understands that most AI is tuned to solve narrow problem sets and combines multiple AI tools to define a formidable, well-rounded strategy.
A common AI use-case comes from real estate, specifically the prediction of house prices—an issue arising in alleged violations of the Fair Housing Act. Data fed into the AI model includes the prices of recently sold homes and features of the homes such as zip code, square footage, and the number of bedrooms and bathrooms. Using these data points, across millions of houses sold each year, AI models assess inputs, predict the market price, and indicate whether a new listing is priced according to current market trends or, possibly, in a discriminatory manner.
While AI can seem otherworldly, it is based upon sound math and statistics that have been around for decades. Even though AI can feel like a black box, do not worry. Data science experts can testify to the soundness of these methods in a defensible manner consistent with the high standards demanded by the courts.
The following six technologies are currently changing the face of e-discovery.
Image recognition AI is built to recognize and label entities commonly depicted in photos. For example, in a photo of a skateboarder in the middle of a vacant city street, more than a dozen labels were applied by image recognition AI upon examination. This AI was smart enough to identify which regions of the image represented each entity, such as cars, vehicles, downtown, sports, and skateboard, to name a few.
Image labeling technology has helped drastically lower discovery costs and time by allowing a review team to select those images most likely to be relevant while excluding or deprioritizing those that aren’t. Many image labeling models even can recognize logos, ads, and other “noise” images that often expand the size of a review.
Image labeling is particularly useful in construction and sexual-misconduct cases, where images will often represent a large percentage of the responsive material. Image labeling is also helpful in product-liability matters. For example, in failure-to-warn cases, image labeling can find and help categorize warning labels—or the absence of them. Where a design defect has been asserted, you can use image labeling to search for schematics, component-part diagrams, and logos found on documents such as the American Society for Testing and Materials standards. In manufacturing-defect claims, image labeling can help identify photographs containing the allegedly defective part.
We all work with written language every day, but until the recent popularization of more advanced techniques, AI has struggled with the nuance and expressiveness of human language. Natural language processing AI can now parse documents to understand the meaning and the emotion the writer was trying to convey.
AI can make a judgment as to whether a writer was exhibiting a positive or negative tone, trying to rationalize their behavior, or pressuring others to follow their lead. In an email from the Enron case, Skilling v. United States, 561 U.S. 358 (2010), the writer stated, “That may work—I don’t want to end up with an equity position I just worked hard to eliminate.” AI classified the email as having highly negative language where the author was also attempting to rationalize behavior.
This type of emotional intelligence is indispensable when you know little about the case or perform an open-ended investigation into matters where intent, a key element across many areas of law, is essential. For example, intent is often crucial in criminal matters but may also be found in common-law actions for fraud—or statutory actions, such as under the False Claims Act—including in actions to obtain punitive damages.
You can train AI models on an infinite variety of concepts (e.g., sexual comments, career advancement, or even privileged documents). Pretrained models, available through AI model libraries, let you cut through the “noise” and identify possibly relevant documents without having to waste time devising search terms or filtering through useless documents.
Imagine being assigned to a sexual-harassment case with millions of documents of discovery. At this point, you may not yet know any useful keywords, and you probably are not aware of what coded language was used by key actors. You may not even know the key actors beyond a core person or two. Knowing where to start and which documents are likely to be most illuminating can be particularly challenging. One of the pretrained AI models can filter out documents containing “sexually explicit comments” or “hate/discrimination.”
While models come pretrained to identify a generic concept, each project can embody this concept uniquely. The good news is that AI models will learn more about your data as your team reviews documents. If your team tags a document as privileged, then the privilege model will incorporate this document into its definition of what it means to be a privileged document in that project. With each document your team reviews, the model will become more accurate.
Legal teams that make heavy use of models often start with an off-the-shelf model. From there, they can build an in-house model library that has been further trained using the firm’s reviewers. The excitement around pretrained models has led to creating secondary marketplaces where models can be traded or sold.
Currently, models available for use include those identifying privileged content—including conversations involving requests for legal advice, legal advice itself, and preparing documents for depositions; those finding comments on appearance (e.g., conversations related to a person’s attire or physical attributes), whether these conversations have positive or negative connotations; and, those looking for advertisements, newsletters, and other forms of promotional materials you might want to filter out base irrelevant noise.
AI detects statistically significant patterns and uses these patterns to give you valuable information, but it also detects deviations from those patterns, the anomalies. Anomaly detection AI examines all aspects of your data and calls out those patterns that seem rare, suspicious, or just out of place. Anomaly detection AI will often rate these anomalous events based upon their rarity and uniqueness so the team can “hone in” on those events that are most unusual.
A meaningful anomaly in an employment matter might be, “John Doe sent email to himself 56 times between December 1st and December 5th; these emails contained attachments and were all sent outside of business hours; this pattern did not occur otherwise.”
No matter what type of lawsuit or investigation you are working on, finding and using anomalies will bear fruit. Anomaly detection was a critically important tool in a multibillion-dollar lawsuit where counsel needed to understand what really happened. The plaintiff alleged its product was failing because a component part it bought from the defendants did not work as advertised. By ingesting and then analyzing a wide array of data obtained from the plaintiff, defense counsel was able to locate anomalies suggesting a very different story: the plaintiff had been so successful and had grown so fast, it had lost control of the quality of its manufacturing and installation processes, which was the cause of the problems the plaintiff was experiencing, not the defendant’s component part.
Language simplification is a young and promising area of AI research. This technology creates document summaries to aid in e-discovery review or search and retrieval. An especially intriguing use of this technology is to convert convoluted contract language into plain English. For example, the AI took a 95-word sample of George Washington’s 1796 farewell address and simplified it to read, “I’m not going to run for president.”
The use of language simplification is not limited to contract analysis and can be a useful exploratory and investigative capability anytime documents or communications contain complex language.
AI may be able to help remove the drudgery of reviewing language from contracts or other structured documents. Industries (e.g., software, oil and gas, construction, medicine, and real estate) have documents that follow a standard layout and contain common elements, even if the precise language varies across documents and projects.
You can train AI models to recognize language and clauses commonly found in specific types of documents and rate each clause as being favorable or not. These models can give your document an overall rating based on how closely it matches your model definition and alert you when elements of the document are missing or otherwise incongruent with your expectations.
While these systems don’t altogether remove the human reviewer, automated document analysis allows your team to prioritize those documents most useful to your review process.
Well-designed, effectively implemented AI offers those who work with ESI in lawsuits and investigations a powerful set of capabilities. AI can remove much of the drudgery from the e-discovery process by allowing attorneys and allied professionals to focus their thoughts and efforts on the substantive areas that matter most to them.
Even though AI has paved the way for a more efficient e-discovery workflow, be sure to keep an open mind when choosing your e-discovery methods. Sometimes the most useful tools won’t be those on the leading edge but could be some of the classic workhorses (e.g., search terms or metadata filters).
For example, you may have two nearly identical emails, one responsive and the other not. If the presence of the word “Japan” is the defining characteristic making the one document responsive, then all other documents containing the word “Japan” would also be responsive. Therefore running a search for “Japan” will provide better results than attempting to train an AI to understand this nuanced concept separating nearly identical documents.
One key advantage of AI is it can provide insight into data you would otherwise be unable or unwilling to see. Investigative case teams sometimes become myopic. They are so focused on what they expect to find that they can overlook new and interesting evidence right in front of them. Pretrained models have the advantage of inserting new perspectives, information, and insights into your case.
While this sounds exciting, there is a potential pitfall you need to watch for: AI trained by your team can exude your team’s biases. When training AI, your team should be aware that every choice may affect the model. A single choice has little overall effect in many cases, but a systemic bias is likely to reduce the AI’s overall efficacy.
A bevy of AI capabilities designed to help users find what matters most (e.g., image labeling, emotional intelligence, pretrained models, anomaly detection, language simplification, and automated document analysis) are available now and will continue to get better with time. Even if you do not use any of those capabilities today, rest assured, you will tomorrow.