chevron-down Created with Sketch Beta.
December 03, 2018 Articles

The Importance of Data Validation in Litigation

Learn the key steps to ensure that your data analytics expert produces an analysis that is useful, sound, and admissible, without costing a fortune.

By Jason Blauvelt and Jeremy Guinta

The growth of data in business has led to companies generating, retaining, and storing vast amounts of data. According to a 2017 survey conducted by the Coalition of Technology Resources for Lawyers, “99% of respondents agreed that analytics will be indispensable to the practice of law over the next 10 years.”

Data has been playing an increasingly important role as evidence in litigation and is used as primary evidence in a wide variety of cases. Any company that becomes engaged in litigation today will have much more data that could be produced, in both volume and variety, compared to even a few years ago.

Often, it makes sense to retain a data analytics expert to interpret the electronic evidence as data is becoming such a crucial portion of the overall evidence. However, attorneys should be wary of taking a hands-off approach. In fact, attorneys should work closely with the expert to ensure that objectives are aligned. There are four key questions, detailed below, that attorneys should ask themselves and their experts as they navigate data production, analysis, and quality control.

Question No. 1: Do we have the right data to answer the question?
Data production is too often treated as a simple exercise of extracting database tables that “seem relevant.” It is crucial to ensure that there is appropriate and sufficient information to answer the key questions in the matter. Practically speaking, this means that attorneys need to work with their experts to ensure that they have all relevant information. Furthermore, experts need to be proactive in their review of the data, understand the contents of the data, and ask probing questions regarding the sources of the data.

Roty v. Battelle Memorial Institute stands as a good example of a case where data insufficiencies caused a judgment to be reversed and remanded. 2017-Ohio-9125 (Ohio Ct. App., 10th Dist. 2017). Employees of Battelle Memorial Institute filed suit against their employer alleging age discrimination after a 2013 reduction in force. The trial court concluded that “company-wide statistics showing the ages of employees retained and terminated during a 2013 reduction-in-force were so irrelevant as to not even be discoverable,” but the appeals court found that this conclusion was improper. Id. The appeals court ruled that data on employee retention and termination was crucial to answering the question of whether there was probable age discrimination. If the right data had been provided from the outset, then significant time, energy, and resources would have been saved.

It takes careful communication between the client, attorney, and expert to identify the proper data sets that should be used to answer the question at hand.

Question No. 2: Has the expert sought to verify the information received?
An expert has an obligation to verify information produced to him because any opinion drawn from poor, incomplete data will likely result in an invalid opinion. The expert should spend considerable time validating all aspects of the data to the extent possible. This would include, at a minimum, the following:

  • Checking for gaps in time series data
  • Researching the nature of missing values
  • Checking consistency of references
  • Reviewing documentation related to all data fields
  • Comparing data sources to each other by common data fields (e.g., comparing employee IDs found in the payroll data to employee IDs found in timekeeping data)
  • Comparing data sources internally across data fields (e.g., comparing regular hours and overtime hours to total hours)
  • Matching summary control totals provided by the client to the data throughout the analysis process (e.g., checking whether the final analysis accounts for $1.5 million if the total payroll for the year is $1.5 million)

Attorneys can assist in this process by working with the client to obtain additional data documentation, answer questions, and match to control totals. The process of data validation often leads to re-production of data, which delays cases and increases costs for the client. In complex matters, it can be difficult to get perfect data on the first attempt, so validate early and often to minimize re-productions.

Kreidler v. Pixler offers a poignant lesson in data validation. Case No. C06-0697RSL (W.D. Wash. Apr. 14, 2010). In this case, plaintiffs claimed that defendants failed to properly pay workers under their compensation contract. Defense counsel produced disbursement accounting data solely for the purposes of the litigation, and the court found it to be inaccurate and “not maintained in accordance with good accounting practices.” Id. The expert relied on the data without verifying it, and the court did not find “any evidence that [the expert] sought to verify the information presented to him.” Id. While the expert cannot verify every single data point, he must take steps to test reasonableness and validate key aspects of the data.

Question No. 3: Have we created a data-analysis plan that lines up with the question we need answered?
Even if the expert has the right data, a poor analysis plan can plague a case with cost overruns, wasted time, and even excluded testimony. It is crucial for the expert to create a data-analysis plan early in the engagement. A good plan should contain the following:

  1. Key objectives
  2. Areas of inquiry
  3. Data sources relied on
  4. Assumptions
  5. Analysis steps

It is common for the expert to share this plan with counsel to ensure that the proper questions are being asked and answered and that the correct data is being used for the analysis.

Most of the time, it is impossible to plan out every single assumption and analysis step in the beginning of an engagement. Therefore, as the analysis matures, the data analysis plan needs to be continually updated, and assumptions should be shared with counsel. The expert may be making assumptions that seem reasonable; but in the context of a certain case, seemingly reasonable assumptions might not be reasonable to the trier of fact.

In Rojas v. Marko Zaninovich, Inc., a database expert was excluded due to several flaws in his analysis. 2011 U.S. Dist. LEXIS 106044(E.D. Cal. 2011). In this case, the plaintiffs alleged various violations of employment laws, including improper wage payment, forced work time off the clock, and failure to provide meal and rest periods. The court found that the expert misinterpreted how the defendant calculated wages, did not use all of the data provided, merely relied on the plaintiff’s counsel’s assertions instead of seeing the information directly provided by the defendant, and failed to cross-check his analysis with the actual paychecks of the defendant's payroll register. Had the expert developed an analysis plan in conjunction with counsel, he likely would not have committed these errors that led to the judge ordering his declaration and opinions stricken.

Question No. 4: Can we reproduce the analysis?
If the expert cannot reproduce an analysis, starting with the raw data and ending with consistent summary figures, then opinions based on that analysis are not reliable. If an analysis is not reproducible, then the results could be in doubt. This can be the result of unknown or unforeseen errors or unsubstantiated assumptions. It also could simply be the result of sloppy documentation of the analysis process.

In Elcock v. Kmart, in which Kmart conceded liability for injuries suffered by the plaintiff when she slipped and fell at a Kmart store, the appellate court dismissed the expert’s testimony because he could not reproduce his analysis. 233 F.3d 734 (3d Cir. 2000). The court stated, “If such testing did not generate consistent results, [the method is] unreliable because it is subjective and unreproducible.” Id. at 747. Unfortunately, the expert had not created a reproducible analysis and therefore failed the Daubert requirement of testability.

Experts should be able to recreate and explain each step of their analysis. With proper documentation, other experts should be able to replicate the results. In statistical analyses and simulations involving random number generators, the expert should specify the seed number that was used in the analysis so that even the random numbers used can be recreated.

There are many pitfalls to working with data, but this article has laid out specific steps that counsel and the expert can undertake to mitigate these risks. The recipe for success is tied to ensuring that the right information is produced, verifying the information received, independently reviewing an analysis plan, and producing replicable analyses.

Jason Blauvelt is a managing consultant with Ankura in Austin, Texas. Jeremy Guinta is a senior director with Ankura in Los Angeles, California.

Ankura is the Litigation Advisory Services Sponsor of the ABA Section of Litigation. This article should be not construed as an endorsement by the ABA or ABA Entities.

Copyright © 2018, American Bar Association. All rights reserved. This information or any portion thereof may not be copied or disseminated in any form or by any means or downloaded or stored in an electronic database or retrieval system without the express written consent of the American Bar Association. The views expressed in this article are those of the author(s) and do not necessarily reflect the positions or policies of the American Bar Association, the Section of Litigation, this committee, or the employer(s) of the author(s).