Conventional wisdom holds that technology-assisted review (TAR) is useful and efficient only for large cases (perhaps 100,000 documents or more). Conventional wisdom is wrong. Various elements of TAR can be applied quite effectively to smaller cases. And the development of TAR 2.0, a system of continuous active learning (CAL), has dramatically changed the cost-benefit calculus for use of TAR in smaller matters.
To begin, it is important to note that TAR generally consists of a suite of review tools and that practitioners need not use every tool in every case.
Some of the most basic tools, though, such as de-duplication, can apply to nearly every case. De-duplication (and the identification of near-duplicate documents) can ensure that human reviewers need review documents only once, and in the case of near-duplicate documents can (much more efficiently) review near-duplicate documents seriatim. The same is true for email threading, the process of organizing related emails in a conversation chain into one thread, limiting review to only messages that contain the fullest form of the communication thread.
More advanced TAR tools include clustering, keyword expansion, and concept searching. Clustering is often used in early case assessment to impose order on unstructured data by allowing a machine to group conceptually similar documents. Clustering may help eliminate large swaths of irrelevant information (such as spam and purely personal communications); and it may identify key custodians, project names, and jargon that can help guide keyword searching. Keyword expansion software can aid searches by applying automated stem searching, application of a thesaurus, and related methods of ensuring that a literal keyword search does not omit related, but literally different, versions of a keyword. Concept searching can go even deeper into a document collection, using sophisticated statistical algorithms to show relationships between documents.
The foregoing techniques are useful in document production, even in small-document cases and even when the final form of review is entirely human.
They may be especially valuable when a party faces a large “data dump” from an adversary. The ability to identify “hot documents” quickly and efficiently can aid the discovery process (collecting potential exhibits for depositions, for example), and may aid in developing themes for motions, settlement discussions, or trial.
And that efficiency applies even where a party uses human reviewers. Suppose, for example, that an adversary produced 50,000 documents (approximately 10 gigabytes of data, which is not unheard of in small-value cases), but perhaps 25 percent or more were totally irrelevant (or mere duplicates of other documents) and another 25 percent or more were highly relevant. The ability to prioritize the review (eliminating the irrelevant and duplicative, and focusing first on the highly relevant) and to collect similar categories of documents for a more organized review could greatly improve the efficiency and robustness of the review process. If the cost of human review is $60/hour and a human reviewer can review 60 documents per hour (yielding a $1/document cost to review), then simply by eliminating 25 percent of the document collection, a party could save $12,500. In a small-value case, that savings could be significant. And the cost savings could be even larger, depending on the actual cost to review (the above numbers were given for illustration purposes only) and the percentage of documents deemed irrelevant or not worthy of immediate review.
Improvement over TAR 1.0
TAR 2.0 leverages the benefits of highly organized review (for production or for review of an adversary’s production) at a cost and speed that far exceed the original TAR 1.0 method.
The original TAR, other than embodying some of the tools outlined above, was meant to conquer large document reviews (where human review of all documents was simply not possible, given the volume), with the goal of reproducing (by machine) the same review quality that could be achieved through human review. Toward that end, typically, TAR 1.0 involved a subject matter expert (generally, a senior lawyer on the litigation team) reviewing and tagging a random sample of documents to use as a “control” set (with the idea that when the computer is capable of producing the same results as an expert human reviewer, as applied to the control set, it is capable of performing an effective review of the entire population of documents). The machine algorithm was then refined by application to a series of “seed” or “training” sets, such that the machine’s results were compared to the results (on the same seed sets) obtained by the subject matter expert. The training process continued until the algorithm was “stable,” meaning that it no longer improved at identifying relevant documents in the control set. The algorithm then could be run against the entire document population, often with “second-pass” review (of at least a random sample of selected documents) by human reviewers.
TAR 1.0 presents several problems, which can greatly limit its usefulness in a smaller case. The process is not immediate; the creation of the control set, training with seed sets, refinement of the algorithm and testing—all must take place before any substantial review begins. The process is static—once the algorithm training is complete, there is no opportunity to feed additional judgments (based on reviewed documents) back into the algorithm to improve ranking. The process is inflexible—the control set, based on a random sample of documents from a review population, may become unrepresentative if the review population grows (due, for example, to the discovery of additional document custodians or locations). And the process may be cumbersome, especially in “low-richness” environments (perhaps less than 10 percent relevant documents in the population of documents to be reviewed), as it is difficult to find enough relevant documents to train the system.
By contrast, TAR 2.0 offers several advantages that may make it fit the discovery needs of a smaller matter. The process does not use control sets, and there is no need for training by a subject matter expert. Instead, the process can begin immediately, extracting potentially relevant documents, conducting a human review of a sample from the extracted set, and then feeding information from that sample back into the algorithm. As a jump-start for the process, the machine can be fed (along with the rest of the document population) a set of known hot documents or even a set of dummy documents, created by the reviewers, containing the kind of smoking-gun information that is the focus of reviewer interest. The process is continuous, such that each machine result (compared to human review of the same sample) can be fed back into the algorithm. When sample documents are tagged relevant, they (and similar documents) move up in the algorithmic ranking; when tagged irrelevant, they move down. The process works even as new documents are added to the review population (since the question is not whether some control set from the original population is representative, as in TAR 1.0). Thus, “rolling” production, as new documents are discovered, cannot stymie the process. And TAR 2.0 operates effectively in low-richness environments (again, because there is no need to create a control set). The algorithm does its best with an initial set of parameters and then continuously learns as it proceeds through the review.
Will TAR 2.0 finally provide scalability of sophisticated review technology to make it cost-effective in smaller cases? The potential exists for such a breakthrough (and not simply because the volume in “small” cases continues to grow).
One significant development that may aid adoption of this new technology is increasing availability of on-demand review software, web-based and priced according to use (rather than as a capital cost that can impact the bottom line of a smaller law firm). On-demand software tools can provide the latest-generation tools without significant up-front costs or investment in personnel.
With wider use (and thus improvements) of these tools, greater competition within the e-discovery industry, and increasing availability of alternative pricing models (such as flat fee or “all-in” pricing), this technology may become ever more cost-effective for use in small cases.
Challenges remain, however. The legal profession is notoriously conservative and often hangs on to procedures and technologies that other industries quickly bypass. That bias against change is evident in the slow adoption of TAR 2.0 and a “show me it’s better” attitude among many lawyers (and quite a few judges).
The single most effective counter to that attitude is a more transparent and cooperative approach to e-discovery. Where parties operating in good faith share information regarding the needs of the case and the capabilities of the personnel and systems available, agreement on use of TAR technologies may arise. The contrary—mistrust and expensive motion practice—could kill the benefits of the technology, even where otherwise apt for the circumstances of the case.
Short of judicial (or rule-maker) recognition that human review is not really optimal for effective document review, some vestiges of the profession’s conservative approach to technology will, no doubt, persist. But greater efforts at education, pilot programs, and other demonstrations of the benefits of this technology will, in time, effect change. Such developments as the Sedona Conference Cooperation Proclamation and the recent amendments to the Federal Rules of Civil Procedure (highlighting the importance of proportionality in discovery) may contribute to this movement.
Steven C. Bennett is a partner at Park Jensen Bennett LLP in New York, New York, and an adjunct professor (e-discovery procedure) at Hofstra Law School. The views expressed are solely those of the author and should not be attributed to the author’s firm or its clients.
Copyright © 2018, American Bar Association. All rights reserved. This information or any portion thereof may not be copied or disseminated in any form or by any means or downloaded or stored in an electronic database or retrieval system without the express written consent of the American Bar Association. The views expressed in this article are those of the author(s) and do not necessarily reflect the positions or policies of the American Bar Association, the Section of Litigation, this committee, or the employer(s) of the author(s).