You are in a highly contentious litigation, taking the deposition of someone you believe to have the most important information you need in order to win the case. It’s the moment of truth, and, handing the witness a list several pages long, you ask, “Please tell me every time in the last five years you have used the words on the list I just handed you.”
Does that scenario sound absurd? It should. But it is a fairly good approximation of the legal profession’s overuse and over-reliance on keyword searches to identify relevant information in large data sets. And, for better or worse, predictive coding, which is the latest, and some would say greatest, approach to reviewing documents, is nothing more than keyword searching on steroids—matching and ranking the entire set of words in a document with those of other documents in the set, based on the frequency and proximity of the words contained in the document. Fortunately, there are other ways to attack the problem thanks to some recent advances in technology and data science.