March 29, 2018 Articles

Finding Needles in the Haystack: E-Discovery Search Technologies and How to Use Them, Part I

The first of two parts about how to use technology to create better e-discovery plans.

By Charles-Theodore Zerner and Andrew R. Lee

Learn about E-Discovery Tools to Create Better E-Discovery Plans
Legal discovery today requires a new skill. Attorneys must know how to effectively and efficiently identify documents and communications that are relevant to their case, within an ever-growing mountain of unstructured electronic data. Today—whether the case is large or very small—it is generally impossible to review all the potentially relevant electronic documents manually. Even a single person’s email box can easily contain over 100,000 items. E-discovery tools are therefore essential—and knowing when and how to use them makes all the difference.

Yet seemingly few have the requisite knowledge or mindset. Indeed, many in the legal community still believe there are only two options on the e-discovery menu: (1) manually reviewing every document returned by a list of Boolean keyword searches (i.e., “keyword culling”), or (2) using “predictive coding.” They mistakenly assume that e-discovery is a relatively straightforward and repetitive process from the perspective of the attorney, in which e-discovery technologies will do more or less of the work, behind the scenes. And so, they presume that picking the right software vendor will largely determine the efficacy of the process. They are mistaken.

The reality is that effective e-discovery is a creative process. And it requires understanding the capabilities and limitations of the e-discovery tools available. Inevitably, attorneys must apply their knowledge of e-discovery tools—and their knowledge of the case—to make legal determinations regarding entire groups of documents that will not be reviewed manually.

Attorneys in supervisory roles need to understand the capabilities and limitations of the tools available to them to develop and implement efficient, effective, and defensible e-discovery plans.

Organize Unstructured Data Using E-Discovery Tools to Isolate the Relevant Material
Today, e-discovery tools are usually bundled into a single software product: the review platform. There are many competing offerings. But each typically includes some combination of a more limited set of tools and technologies:

  1. Search tools (e.g., Boolean keyword search);
  2. Classification tools (e.g., unsupervised and supervised learning); and
  3. Review tools (not addressed here; these include tools to view, tag, redact, and bates label documents, etc.).

To use them effectively, attorneys must approach the e-discovery challenge with the right mindset. Searching for a document (locating it within a larger corpus of data) and classifying a document (as relevant or irrelevant, for example) are very different tasks. In keeping with this distinction, search tools are good at locating specific documents based on the user’s prior knowledge about them. They retrieve documents with specific features specified by the user. Classification tools assemble documents into groups, or predict the likelihood of their belonging to a classification. Attorneys should use search and classification tools together, to organize unstructured data in a way that isolates relevant or irrelevant material.

Boolean Search Tools: Don’t Rely on Them Alone
Nearly all review platforms include Boolean search functionality. It is fast, transparent, and highly effective at search tasks—locating documents with features one already knows about. But unlike Google, a Boolean search only finds what you already know to look for. It returns all documents whose text contains the exact terms searched for by the user, within the exact conditions specified. It is less effective and harder to use than most people realize, and often misused in e-discovery. Attorneys must understand its limitations to use it properly.

  • Users relying on keyword search must investigate what terms to include by interviewing custodians, reviewing documents, and updating their search.
  • To better capture relevant documents and reduce the number of irrelevant ones, users must draft queries that compensate for the three inherent problems with keyword search: synonymy, polysemy, and contextual meaning. (For more information on these problems and how to address them, see Charles-Theodore Zerner, Relying on keyword search for e-discovery? It may harm your case: important pitfalls and how to escape them, New Orleans Bar Association, (December 5, 2017) available at (last verified, February 12, 2018).)
  • Work backwards: Search for irrelevant material you can safely eliminate without manual review.
  • Avoid keyword culling followed by a strict manual review. It is generally ineffective, inefficient, and expensive. Use keyword search in combination with other tools to more quickly and cost-effectively whittle down the corpus of documents to a manageable number. Or hire an e-discovery attorney to do this for you: commission an early data assessment.

Charles-Theodore Zerner is an associate at Flanagan Partners, LLP in New Orleans, Louisiana. Andrew R. Lee is a partner at Jones Walker in New Orleans, Louisiana.

Copyright © 2018, American Bar Association. All rights reserved. This information or any portion thereof may not be copied or disseminated in any form or by any means or downloaded or stored in an electronic database or retrieval system without the express written consent of the American Bar Association. The views expressed in this article are those of the author(s) and do not necessarily reflect the positions or policies of the American Bar Association, the Section of Litigation, this committee, or the employer(s) of the author(s).