August 26, 2019

Preparing for a Post-LIBOR Future with Technology-Assisted Review

Michael J. Bommarito II, Robert Sancrainte

The future of LIBOR is wrought with uncertainty, and the FCA’s planned declaration of obsolescence at the end of 2021 is rapidly approaching.  Even now, some financial institutions are preparing for it. The potential consequences are substantial and wide-ranging, but there are also many options during this transitional period.

One commonly-leveraged option is Technology-Assisted Review (TAR), which combines human reviewers with platforms that handle much of the heavy lifting. While widely used today for electronic discovery and M&A document review, when pre-trained to read more complex financing agreements TAR platforms can investigate an organization’s contracts for LIBOR-related material, categorizing each contract, finding potentially affected clauses, and in some cases, identifying and inserting proposed amendments on their own.

Natural Language Processing

A document management system (DMS) can provide invaluable assistance to an organization, with many capabilities that can be combined to complete many different tasks. Even a well-organized file share can make a team’s life easier.

What some DMS and all file shares lack, however, is a functional natural language toolkit for analyzing the text of a contract on a deeper level. Some Natural Language Processing (NLP) toolkits are designed to work with real, unstructured legal text, including contracts, plans, policies, procedures, and other material. Many of these toolkits can search not just for specific words, but for explicit and implicit semantic and syntactic associations between and among concepts.

Stems and Lemmas

Two of the most important semantic elements of legal language (really, any type of language), are stems and lemmas. A stem is that segment of a word that contains the word’s root, and the basis for variations on that root. An example of a stem relevant to LIBOR would be advance. An NLP toolkit searching for the stem advanc would capture inflected forms like advance, advances, advancer, and advancing

A lemma is similar to a stem, but obtains a more complete grammatical understanding of a word stem. A simple example of a lemma would be the word go. A lemmatization of go would return words like going and goes, but would also return grammatically similar words like went that stemming alone would miss. While some toolkits and DMS provide support for stems, very few leverage lemmas in their organization and analysis of text.

So, in the context of LIBOR, an NLP toolkit must understand that both advance and advancing are semantically related concepts, so that it does not miss analogous terms when analyzing a document.

Extracting Structured Information from LIBOR Contracts

Some NLP toolkits are capable of more than just simple natural language tasks like identifying stems in unstructured text. Another method of analysis involves extracting structured information or statements. Examining structured information is something our human brains do constantly every day; we read text, or listen to speech, and not only process each word alone, but also the words themselves as they relate to one another and as they relate to the overall communicative goal of the context. Computers, however, require thorough instructions in order to accomplish the same structuring tasks.

If properly designed, a TAR platform should use NLP to return monetary amounts, whether specified numerically (e.g., “$200.00”) or written out longhand (e.g., “zweihundert Euro”). In the following sentence:

That little dog in the window is one-hundred and fifteen dollars.

The output of a well-designed TAR system should be:

(115, 'USD', 'one-hundred and fifteen dollars')

Many platforms fail to identify this example, or fail to capture the complete phrase; some platforms do capture the full phrase, but fail to structure the information into its components – both quantity (115), and units (dollars, US implicit).

More advanced NLP systems can understand grammatically that “dog” is a noun, that “in the window” is a positional aspect for the dog, and that the dog has a value in “dollars” (probably US dollars, though not necessarily). The TAR system could subsequently be queried with a prompt like “How much does a dog in a window cost?” and might respond with “$115.” 

Defined terms in a contract can be similarly analyzed and categorized by some TAR platforms.  Consider the following example:

Such definitions are critical to many large or complex documents, such as the syndicated commercial credit agreements relevant to LIBOR analysis. In order to truly increase efficiency, it is critical that TAR systems identify and understand definitions like Adjusted Base Rate and how they connect to other definitions within a document.

Word Embedding Models

Word embedding functions a bit like a thesaurus, approximating a conceptual understanding of words and phrases from large amounts of data. Critically, an NLP toolkit with word embedding functionality or pre-trained models needs substantially less human supervision – it just needs a large sample of similar language. Like many of the most effective models in machine learning and natural language processing, the approach is based on a simple idea, as demonstrated with the following example:

The parties to this ____________.

Fill in the blank with how you think this sentence should end. You can probably think of at least two ways this sentence would probably end in the context of a legal document:

The parties to this agreement
The parties to this contract

Word embedding models are trained by reviewing many, many phrases like this. Whenever words appear in a similar context – like “agreement” or “contract” – the model takes note of the similarity. This technique is not limited to nouns or single words, either. It is equally effective at handling verbs, noun phrases, or more complex concepts like “date” or “expression of distance”.

Two different phrases in a simple word embedding

Two different phrases in a simple word embedding

An example from a LIBOR-related contract might be a search for the concept of “purchase” that would also return similar words like “acquisition,” “transfer,” or “repurchase.”

In the example image above, a word embedding model could also return a word like “lease,” as it has a semantic function within legal documents that is similar to “agreement” and “contract”. A user of a TAR platform would likely want to find these variations, in order to obtain a more holistic understanding of each individual contract, and all of the contracts within a corpus.


Natural language processing is an integral part of TAR. Combining several different NLP techniques, and refining data models over time, can lead to ever-increasing accuracy and efficiency in any TAR platform.

The field of Technology-Assisted Review is entering its adolescence – systems continue to evolve, but none have reached a state of final “maturity” yet, in the sense of having found the perfect balance of multiple types of natural language processing and other techniques. It remains to be seen what TAR will accomplish in the years ahead, and the only certain thing is that the LIBOR transition will be a meaningful test of the techniques explored in this article, as well as other techniques such as regular expressions.

Lenders and borrowers are now aware that they need to analyze the language of their loan agreements for their relationship with LIBOR rates. TAR platforms, utilizing NLP techniques, will make a crucial difference during the re-negotiation and re-papering process that has already begun to unfold and that will continue for the next few years.

Michael J. Bommarito II

Michael J. Bommarito II founds, builds, operates, consults for, invests in, and advises businesses in legal and financial services, tech, and logistics. His experience spans strategy, technology, business, and operations, ranging from top Am Law firms and $B+ AUM investment firms to idea-stage startups. He was the CEO and Co-Founder of LexPredict, now a part of Elevate.

Robert Sancrainte

Robert Sancrainte is a researcher, data analyst, and writer with over a decade of experience, including three years in the legal technology space for LexPredict, now a part of Elevate.