June 21, 2011

How to Find Relevant Electronic Evidence: A Nuts-and-Bolts Guide to Search Methods

Law Practice Magazine

Law Practice Magazine Logo

 Table of Contents

October/November 2008 Issue | Volume 34 Number 7| Page 22

Hot Buttons

How to Find Relevant Electronic Evidence: A Nuts-and-Bolts Guide to Search Methods

If you haven’t yet had a case involving electronic evidence, don’t despair—you will. And one of the first things you’ll want to know when that time comes is how to reduce the amount of information that has to be reviewed. Fortunately, skillfully crafted search methodologies can trim the volume-—without losing relevant data in the process.

The most costly part of any case dealing with electronic evidence is the attorney review time. So naturally, reducing the amount of information reduces the time and money that has to be expended on the case. In large part, it comes down to the search methods that are involved.

There are many ways to search for relevant information in electronic data discovery (EDD), and a qualified, reputable EDD expert will help steer you -toward the right ones. If you are very new to the handling of electronic evidence, your first question might be, do you really need an expert? Sorry, but yes. Federal Judge John Facciola’s opinion in United States v. O’Keefe (D.D.C. Feb. 18, 2008), stating that lawyers and judges are not capable of properly searching without using an expert, woke up a lot of attorneys—and it also made them resolve to “get smarter” so they can ask probing questions to make sure they’re getting good advice. So for those who want to get smart, here’s your guide to the nuts and bolts of searching.


Software Constraints

To begin, let’s look at some of the restrictions and requirements for searching. No matter what data you are analyzing, the first limitation is the capabilities of the search software. Some questions to ask: What file types can the search software handle? Can it search the internals of an Outlook PST file? What if your case involves e-mail messages stored in a Notes NSF file? Can the software search for particular terms in a Word document, even if it is an attachment to a message? Can you search the information contained in compound files such as ZIP files? These are the types of questions that must be answered to understand what limitations may exist in the software. Not all search software is created equal—and the use of inferior software may make deep inroads into your client’s wallet.

And there’s the question of foreign-language capabilities. Is all of the information in English, or do you need searching done in some other language? This may present a particular challenge, especially if your expert has to use a foreign alphabet to search the data. Foreign language searching is just beginning to come into its own, and it can add considerable expense.

Handling of e-mail is also a great concern. Many search applications will not search the attachments as part of the native file, so you may have to extract the attachments from the messages prior to their being searched. But no matter what method you are faced with, you want to know if all the data is being searched.


Basic Types of Searches

Now let’s get to the various ways that data can be searched. Certainly the most common method is through the use of keywords. Generally speaking, keywords are words or sections of words that would be contained within the relevant data. For example, searching on “pot” could return the words spot and potential. You may be able to narrow the results, though, by using the search engine’s operators, which instruct the engine how to handle the keyword and any variants. One example is the use of the wildcard symbol (*): Selecting the keyword as “pot*” would return the word potential but not spot, because the wildcard appears at the end of the word and not the beginning. Make sure you know what operators are available for the search engines being used in your cases.

Searching with phrases can also help reduce the amount of “noise results,” which is industry jargon for unintended search results. Reducing noise hits means less review time—and less money expended in attorney review time.

Another type of method is Boolean searching. Boolean operators are terms such as AND, OR, NOT and NEAR. As an example, you can construct a Boolean search as “Everett AND Washington” to ensure the results include both words. The NEAR operator means that the words are not in the exact order as listed, but are fairly close to each other. In some search engines, using quotation marks around the words indicates that an exact order is required.

Proximity searching is another common method. Many search engines use the operator WITHIN or the plus (+) symbol to denote a proximity search. An example of this would be “apple +5 pie.” This means the word “pie” has to appear within five words or less of the word “apple.”

Another advanced search engine capability is stemming—finding variations in the endings of a selected search word. Stemming using the word “metal,” for example, would return metalized and metallic. This is the same as putting a wildcard operator at the end of the keyword.


Grep Searching

Grep (which loosely stands for global regular expression print) is a command-line search program originally written for UNIX. Grep uses a complicated and daunting syntax to search for specific character-string patterns within data. There are wildcard and placeholder values in Grep along with several other operators. As an example, the use of brackets means that the search should include any character inside the brackets. So, [a-f] means to include the lowercase letters a through f in the character position. You’ll normally see GREP used in forensic analysis searches and not in typical electronic data processing. After reading this paragraph, you’re probably glad you won’t have to deal with it often.


Fuzzy Searching

Some search engines also support fuzzy searching, which can be particularly effective for finding results containing words that are often misspelled. Spelling errors are pretty common in cases involving technical terms. The degree of “fuzziness” is normally adjusted via a numeric value, and many search engines use that number to determine the amount of letters that can be wrong in the misspelling. As an example, a fuzzy value of 1 would mean that only one letter can be wrong in the word.

Fuzzy searching is very valuable when dealing with sophisticated terms, where the proper spelling isn’t widely known. However, while fuzzy searching can find data that would normally be missed, it can also generate a lot of noise results. And since some
e-discovery vendors charge based on the number of search hits, the addition of noise results can add to the invoice. Fuzzy searching is like casting a wide net to gather much more information than a direct keyword search.


Conceptual Searching

Conceptual searching takes the input term and returns results that are related in meaning. For example, if a search uses the term “car,” then results with the terms “automobile” and “vehicle” will also be in the returned hits. This is an extremely -powerful (and expensive) capability.

One of the most well-known concept search engines is Attenex. The Attenex engine is very effective, but it is not for the faint of heart and its cost is quite high. It is typically used in large, high-profile cases where there is a large volume of electronic information. The key issue to be aware of with conceptual search engines is that they are only as good as the programming logic. Because there’s an element of artificial intelligence when doing conceptual searches, you are dependent on the programmer’s view of what term has a related meaning. However, certain engines allow you to modify your own thesaurus file so that there’s some element of control in the results.


Leverage Your Know-How

An exhaustive look at electronic data searching would require a small book, but today’s savvy lawyer should have some knowledge of the subject. If you’re a knowledgeable consumer in this area, you can save your clients money—and grateful clients usually return to give you more work!

About the Authors

Sharon D. Nelson and John W. Simek are President and Vice President, respectively, of Sensei Enterprises, Inc., a computer forensics and legal technology firm based in Fairfax, VA. They are coauthors of The Electronic Evidence and Discovery Handbook: Forms, Checklists, and Guidelines (ABA, 2006).