Download a printable PDF of this article (membership required).

###### June 01, 2011

# E-Discovery: Sampling Helps Keep Relevance Relevant

###### William Hamilton

Each era’s dispute resolution process must work in accordance with the predominant information formats of the time. Unfortunately, the modern litigation and discovery processes do not play nice with electronically stored information (ESI).

Gathering and reviewing all information that could lead to the discovery of admissible evidence is a very tall order. The amount of ESI generated nowadays has eclipsed the concept of relevance.

Ironically, salvation has arrived in the form of a Federal Rule of Civil Procedure that preceded the 2006 amendments addressing ESI.

Federal Rule of Civil Procedure 26(b)(2)(C)(iii)—or the “rule of proportionality,” as it has come to be known—provides that the court *must *limit the frequency and extent of discovery when the burden or expense of the proposed discovery outweighs its likely benefit, considering the needs of the case, the amount in controversy, the parties’ resources, the importance of the issues at stake in the action, and the importance of the discovery in resolving the issues.

“Proportionality,” as applied to the monstrous volumes of ESI, corrals the amount of discovery and is a theme in almost every case. In *Mancia v. Mayfower Textile Services Co.*, 253 F.R.D. 354 (D. Md. 2008), Federal Magistrate Judge Paul Grimm has penned an “always carry in your back pocket” anthem to proportionality. Judge Grimm demands that a realistic case-value guide discovery and set the outside parameters for the expenditures for the case.

How much e-discovery will the case afford if the fair value of the dispute is $100,000? Keeping in mind that e-discovery does not include depositions, summary judgment, trial, and other costs, some might say 10 percent of the amount in dispute; others might say 20 percent. But whatever the e-discovery spend percentage, you will have your case parameters. For $10,000, you get less than one gigabyte. Which witnesses (or “custodians”) and date ranges do you want? Anything else is disproportionate to the case and thus subject to a Rule 26(b)(2)(C)(iii) limitation.

How exactly do you go about determining which witnesses likely have the relevant ESI? The first approach is the simple witness interview. Most cases boil down to a few key players. But what about the secondary players (e.g., the office assistants, the sales backup support team)? Their ESI may contain valuable case information.

Statistical sampling is the solution. Sampling starts with a defined ESI collection and attempts to draw reasonable inferences about the general collection from an appropriately sized sample of the collection.

For example, in a large suburban neighborhood, it would not take too many interviews to determine the prevalence of registered Democratic voters. If there were only two per 100 households, the neighborhood would probably not be useful for studying the opinions of Democrats. If the sample showed a high number of Democrats, it would be an ideal candidate. The statistical sampling tells us whether the population is worth further review.

Sampling in e-discovery does the same thing. We locate an ESI volume (population) that is reasonably homogeneous for sampling—for example, the sales staff support manager’s data—to see if the rest of the ESI is worth looking at. The sample has to be random and the population homogeneous. That’s why we picked one custodian. Numerous free online programs will define a random sample for any population size (e.g., Research Randomizer).

So far so good, but what is the size of the sample? The answer is remarkably few documents. A good statistical sample would be a 95 percent level of confidence with a confidence interval of plus or minus 5 percent. (The confidence level—or “margin of error”—tells us how sure we can be.) Keep in mind that sampling is not perfectly precise, but we are living in the time- and cost-constraining era of ESI.

There are many free tools that will tell us the size of the sample (e.g., The Survey System). If the population size is 100,000, an online calculator tells us the sample size is 383. Returning to our support manager’s ESI, if our sampling of the 383 ESI files discloses that only 1 percent of the files in the sample are relevant, we can predict that the population will have the same degree of relevance.

Is it worth studying that entire population? The answer depends on the circumstances of the case. The important point is that we will have gained a sense of the population’s relevance that can be used to decide whether dollars should be spent on a “fishing exhibition” into any witness’s ESI.

ESI sampling can also be used to determine the precision (whether our search is identifying only responsive documents) and recall (whether our search is identifying all the responsive documents). We can also sample to determine whether backup tapes should be restored (*Zubulake III*) and how well document reviewers are doing their job (*Sedona Quality Review* publication).

Sampling is also a highly persuasive tool. Courts will demand evidence that the requested ESI discovery is “disproportionate”; litigators should use the sampling tool to make their case.