Fifteen years after the landmark Exxon Valdez natural resource damages (NRD) settlement, we are witnessing a new generation of NRD claims. As shown by both the recent Deepwater Horizon and New Jersey Bayway refinery settlements, large petroleum releases remain in the focus of trustees, responsible parties, and the public. However, a wide range of other types of sites (e.g., historic smelters, wood treating facilities, industrial landfills) and chemical releases (e.g., specialty chemicals, heavy metals, solvents, pesticides, polychlorinated biphenyls (PCBs)) have been subject to recent ecological damage claims. In addition, there has been an increase in ecological damage claims as part of citizen suits under several environmental statutes (e.g., Resource Conservation and Recovery Act (RCRA), Clean Air Act (CAA), Clean Water Act (CWA)). Although evaluating ecological damage associated with petroleum releases has always been challenging, the expansion of ecological damage claims to new types of sites and releases has further complicated and confounded attempts to produce reliable data that adequately address the objectives of an environmental damages investigation.
NRD liability was first authorized when the Comprehensive Environmental Response, Compensation, and Liability Act (CERCLA) was enacted in 1980. Since then, claims for NRD have been filed under a number of other federal statutes, including the CWA and the Oil Pollution Act (OPA). In addition, statutes in over 40 states contain provisions authorizing NRD recoveries. According to the Ad-Hoc Industry Natural Resource Management Group, since 1980, over 800 NRD claims have been filed by federal and state trustees. See Barbara J. Goldsmith, et al., Beyond the Headlines: Best Practices to Restore Natural Resources Injured by Long-Term Hazardous Waste Releases, Oil Spills and Transport and Other Accidents, BNA Daily Env’t Rpt. (Aug. 18, 2014). Natural resources are defined in CERCLA (42 U.S.C. § 9601(16)) and OPA (33 U.S.C. § 2701(20)) to include “land, fish, wildlife, biota, air, water, ground water, drinking water supplies, and other such resources, held in trust for the public.”
The citizen suit enforcement provisions included in most environmental statutes also are used increasingly by a diverse group of plaintiffs to seek enforcement of those statutes and their associated regulations. For example, RCRA citizen suits are used increasingly to seek cleanup of hazardous wastes alleged to cause an imminent and substantial endangerment to the environment. There is significant overlap in the data and methods used to evaluate ecological damage claims as part of citizen suits or as part of an NRD claim filed under the aforementioned statutes (e.g., CWA, OPA). As a result, concerns with data quality, reliability, and usability are similar.
Data used to evaluate harm to natural resources varies widely, and limited guidance exists on how such data should be evaluated for quality, reliability, and usability. Proper NRD investigations and determination of associated liability require representative data of known quality and integrity. Unfortunately, data quality, reliability, and usability are not always easily measurable. Further, data collection activities themselves may be poorly conceived and the resultant data incorrectly interpreted. For example, environmental groups are increasingly collecting their own data through citizen science efforts. See Miranda R. Yost & Patrick J. Fanning, Citizen “Suit Yourself”: New (and Very Real) Water Compliance Challenges for Coal Power Utilities, Proceedings of the Thirty-Sixth Annual Energy & Mineral Law Institute (2016). However, such data are often collected using novel, non-standard procedures and made publicly available without an adequate data quality evaluation. This article describes the unique technical challenges associated with data collection in the context of ecological damage litigation, and provides recommendations on best practices for obtaining reliable results.
Data Quality Primer
Environmental data quality is a concept often associated with the collection of chemistry data, but it obviously pertains to other types of environmental data. Before expanding the concept of data quality to the realm of ecological data, we briefly review the historical development and need for data quality concepts, together with key issues that must be considered, whether focusing on the quality of chemistry or ecological data.
Acceptable and defensible data quality underpins the value and validity of decisions made by environmental managers and litigators. The concepts and importance of quality control and quality assurance in analytical chemistry and environmental measurements have been recognized for decades. It was not until regulatory methods were established, however, that formal quality control procedures became a mandatory element of environmental investigations.
The importance of data quality was highlighted in a two-paragraph appropriations rider attached to H.R. Con. Res. 4577 § 515(a), 106th Congress (2000). The law, known as the Data Quality Act, mandates that the Office of Management and Budget (OMB) issue guidance to federal agencies for ensuring and maximizing the quality, objectivity, utility, and integrity of information (including statistical information) disseminated by federal agencies, including EPA. These guidelines apply not only to federal agencies, but also to outside parties that intend to have federal agencies cite or use their information.
Data quality requirements can vary to some extent depending on the objectives of the study. Data quality objectives derived using procedures developed by EPA include statements about the level of uncertainty that a decision maker is willing to accept. Data quality objectives are often confused with acceptable levels of accuracy and precision. However, analytical uncertainty is only a portion of the uncertainty of an environmental measurement and only one element of an environmental decision. Data quality objectives should also consider the uncertainty in health-based standards, exposure pathways, and sample collection, because they all contribute to the overall uncertainty of a decision. Unfortunately, uncertainty can be difficult to measure.
Confounding the issue of measurement uncertainty is measurement variability and the failure to consider variability in data collected for regulatory purposes, which can lead to “false” liability or excessive regulatory burdens. Courts usually view any exceedance of a regulatory standard as an enforceable violation, even though that exceedance falls within the variability of the analytical method. Method variability should be expected, can be significant, and should be planned for when implementing a sampling and analysis program. For example, exceedances of numeric or narrative water quality standards have been used as evidence by plaintiffs to allege imminent and substantial endangerment to the environment without proper regard of the underlying data uncertainties and variability.
Although the quality of data is a function of errors in the design and implementation of the measurement process, it is important to note that not all errors announce themselves, and they can occur or persist when something amiss goes unnoticed or unaddressed. Guidance from the Federal Judicial Center (FJC) notes that “error, on the other hand, is intrinsic to any measurement, and far from ignoring it or covering it up or even attempting to eliminate it, authors of every paper about a scientific experiment will include a careful analysis of the errors to put limits on the uncertainty in the measured result.” FJC, Reference Manual on Scientific Evidence, Third Edition, 51 (2011).
Evaluation of errors in a measurement process requires consideration of numerous levels of detail. Measurable factors are those whose impact on accuracy, precision, and representativeness of a measurement process can be detected, monitored, and quantified by quality control samples. The proper identification of an analyte being quantitated is also essential, and it should not be assumed. Measurable factors include blanks, which provide information on possible contamination during sampling and analysis activities; replicates, which provide information on precision; and spikes, which indicate bias. Another important measurable factor is sensitivity, which is the limit of reliable detection of an analytical method.
Nonmeasurable factors are those whose impact cannot be detected by quality control samples, but can be controlled through quality assurance programs, standard operating procedures, comprehensive and transparent documentation, and training. Practices controlling non-measurable factors are inherent in most EPA regulations, such as the Good Laboratory Practices (GLP) promulgated under the Federal Insecticide, Fungicide and Rodenticide Act (FIFRA), 40 C.F.R. Part 160, and the Toxic Substances Control Act (TSCA) 40 C.F.R. Part 792. Unlike measurable factors that can be detected by quality control samples, a nonquantitative and somewhat subjective evaluation by an experienced scientist is necessary to determine if nonmeasurable factors have affected the accuracy and representativeness of a measurement.
When scientists are confronted with a situation for which a standard method is not available, it is often prudent to modify a proven method rather than start anew, resulting in use of a nonstandard method. Nonstandard methods are often necessary in NRD assessments due to the variety of biological matrices (e.g., animal or plant tissues) that have not been studied previously for the analytes of interest. Novel methods are often fraught with unforeseen problems that are not always amenable to problem solving within the schedule and budget of a natural resource damage investigation. Further, the use of nonstandard methods can be more problematic from the standpoint of admissibility in court.
Whatever method is chosen, its performance must then be shown to be adequate for matrices and analytes of interest via method validation and method detection limit studies. Validation of a test method, a necessary prelude to using a method, is the planned and documented procedure to establish the method’s performance characteristics. A scientifically valid method is one that is accurate, precise, and specific for its intended purpose. There are numerous reasons for conducting method validation studies, one of which is to demonstrate that the proposed method, when performed on the matrix of interest, does not produce false positives either by artifact formation or matrix interferences. This evaluation is important in determining whether a detected analyte is indigenous to the original sample. Once the method has been shown capable of meeting all data quality objectives, a quality assurance/quality control system must be instituted to define, demonstrate, and document method performance. Bias precision, representativeness, and sensitivity are parameters that should be measured in evaluating the suitability of the method.
Finally, as one would expect, data must be what they purport to be. Although data quality requirements can vary depending on the objectives of the study, their integrity cannot. Investigators and litigators often rely on data that they assume to be truthful and representative of the testing performed, but they should assume nothing. The news is replete with instances where laboratories have altered, biased, spoliated, or simply fabricated results. All aspects of sample collection, sample analysis, and data generation should be recreated and verified prior to litigation. For instance, because of calculation errors resulting from poor data review practices, Oregon State University researchers recently retracted polycyclic aromatic hydrocarbon (PAH) results that alleged airborne pollutants near Ohio hydraulic fracturing (fracking) operations posed elevated cancer risk. The researchers redid the calculations and found that the estimated human health risk for maximum exposures to fracking-related PAH pollution is well below EPA’s threshold for unacceptable cancer risk. L. Blair Paulik, et al., Retraction of “Impact of Natural Gas Extraction on PAH Levels in Ambient Air”, Envtl. Sci. & Tech., June 29, 2016.
Non-Chemistry (Ecotoxicity) Data Quality
NRD assessment relies not just on analytical chemistry data, but also on non-chemistry data obtained from field observation, toxicity testing, and modeling (e.g., fate-and-transport modeling, exposure modeling, and food web modeling). These non-chemistry datasets each carry various types of uncertainties and data quality considerations. Despite their complexity and diversity, these datasets are also often distilled down to a single numeric estimate of ecological harm (e.g., a hazard quotient), giving the impression that ecological risk or injury assessment is a process that is robust, relevant, and easily reproduced. Anyone who has been involved with NRD assessment is likely to agree that this impression is overly optimistic, at best.
A wide variety of data are used to evaluate potential harm to natural resources, ranging from observational data and analytical chemistry data to field and laboratory animal toxicity test data and modeled data. Guidance on data verification and validation has been developed by a variety of government agencies, including U.S. federal (e.g., EPA) and state agencies, as well as organizations such as the Organization for Economic Co-operation and Development (OECD), and the International Standards Organization (ISO). However, existing guidance is primarily focused on analytical chemistry data and is more limited for non-chemistry data.
We will focus here on ecotoxicity data quality and reliability, although many of the issues described here also apply to field observational and modeled data. Verification and validation of ecotoxicity test data can be challenging because it involves a wide variety of test methods and includes chemical, physical, biological, and toxicological data. Further, results obtained from ecotoxicity testing are often a critical line of evidence in ecological risk-based remedial decisions, NRD assessments, and ecological damage claims. A clear understanding of the key data quality and reliability issues associated with ecotoxicity data is therefore essential in the context of environmental damage litigation.
The ecotoxicity of hazardous substances or site media (e.g., sediment and soil) can be evaluated using a wide variety of methods that range from short-term acute tests with simple endpoints, such as mortality, to long-term chronic, full life-cycle, or multigenerational tests with complex endpoints, such as growth, reproduction, biochemistry, and pathology. For example, methods have been developed by EPA to support federal programs, such as the National Pollution Discharge Elimination System (NPDES) under the CWA (i.e., Whole Effluent Toxicity (WET) methods specified at 40 CFR § 13.6.3) and the National Oil and Hazardous Substances Pollution Contingency Plan (NCP; methods specified at 40 CFR Part 300, Appendix C). There are also EPA methods that have been developed to support product registration and safety evaluation under FIFRA and TSCA. See www2.epa.gov/test-guidelines-pesticides-and-toxic-substances. The member countries of the OECD, including the United States, have developed similar toxicity test guidelines that provide a common basis for international organizations and researchers to evaluate ecological toxicity. See www.oecd-ilibrary.org/content/package/chem_guide_pkg-en.
There also exists data quality guidance relevant to ecotoxicity testing. As indicated earlier, EPA’s GLP regulations provide detailed guidance for performing toxicity studies. See 40 C.F.R. Parts 160 & 792. Other entities, such as the OECD and the Food and Drug Administration have similarly produced GLP regulations. See www.oecd.org/chemicalsafety/testing/goodlaboratorypracticeglp.htm; FDA, 21 C.F.R. Part 58. Commercial and, to various extents, research laboratories are expected to be familiar with current data quality standards and requirements. The GLP rules specify the use of a written protocol for each study; the methods for collecting, recording, reporting, and storing data; and the inclusion of a quality assurance review of reports. Despite the existence of the GLP rules, much of the published ecotoxicity literature is not produced under GLP, and ecotoxicologists have expressed growing concern about the quality of published research.
Some research groups have tried to address these concerns by developing more specific guidance for evaluating the quality of data from toxicity tests and the validity of the test method. For example, “Klimisch scores” are used to evaluate the reliability of ecotoxicity data in many regulatory programs. Data are assigned into one of four reliability categories, using a method published by H.J. Klimisch and colleagues at the chemical company BASF in 1997. H.J. Klimisch, A systematic approach for evaluating the quality of experimental toxicological and ecotoxicological data, Regulatory Toxicology & Pharmacology (Feb. 1997). The categories are reliable without restriction (Category 1), reliable with restriction (Category 2), not reliable (Category 3), and not assignable (Category 4). Typically, Category 1 studies or data are those conducted following a standard guideline under GLP conditions, although any study may be placed into this category if it is described sufficiently and carried out according to a scientifically acceptable standard. If some criteria of a standard test method are not met (Category 2), a qualified reviewer may still find the data to be valid for its intended use. Often, Category 1 and 2 studies or data are found to be of sufficient quality to support decision making, whereas Category 3 and 4 studies are generally considered supporting evidence or not relevant. The Klimisch method has been criticized by some for a lack of detail and guidance, resulting in inconsistent application of the method. For example, a new method called “Criteria for Reporting and Evaluating ecotoxicity Data (CRED)” was developed by researchers at Stockholm University and the German Environmental Protection Agency to provide further detail and guidance and ensure greater consistency in the evaluation of ecotoxicity data. M. Ågerstrand, et al., Reporting and evaluation criteria as means towards a transparent use of ecotoxicity data for environmental risk assessment of pharmaceuticals, Envtl. Pollution (Oct. 2011). The CRED method was subsequently evaluated in a two-phased ring test in which 75 risk assessors from 12 countries participated. The results of the ring test evaluation were published in 2016, and the study authors concluded that the CRED method may provide a suitable replacement of the Klimisch method. Robert Kase, et al., Criteria for Reporting and Evaluating ecotoxicity Data (CRED): Comparison and perception of the Klimisch and CRED methods for evaluating reliability and relevance of ecotoxicity studies, Envtl. Sci. Europe (Feb. 2016). The CRED method is currently being piloted and tested in several regulatory programs in the EU.
Although it might seem from the previous discussion that clear guidance exists for conducting ecotoxicity testing and verifying and validating the data produced by such testing, this simply is not the case. Three key issues related to ecotoxicity data quality and reliability are explored further below. It is important to understand these issues when reviewing ecotoxicity data or relying on such data in the context of ecological damage claims.
First, the same ecotoxicity test method typically can be conducted in many different ways: using a variety of different animal or plant species; using nominal or analytically verified test concentrations; using different variations of artificial or natural test media (e.g., artificially created sediment versus sediment collected from a “clean” site); using different test conditions (e.g., temperature, light); and using sediment, water, or food-based exposure. For example, aquatic toxicity data produced in support of the Deepwater Horizon NRD assessment generally followed existing regulatory ecotoxicity test methods. However, those methods provide little detail on how to prepare a solution of oil in water for the purpose of ecotoxicity testing. As a result, ecotoxicity data were produced using a wide variety of preparation methods, such as various oil loading rates, mixing times, settling times, labware types, and with or without the addition of the chemical dispersant used during response actions. See www.dwhdiver.orr.noaa.gov; data.gulfresearchinitiative.org. So, while all data could be construed as comparable since they relied on the same test method, in fact, comparison between results were often difficult, if not impossible. The issue of methodological differences in oil toxicity data generation and the resulting implications for the use of such data in decision making are further discussed in a recent technical review paper. See Adriana C. Bejarano, et al., Issues and challenges with oil toxicity data and implications for their use in decision making: A quantitative review, Envtl. Toxicology & Chemistry (Feb. 2014). The authors of this review paper describe how procedural differences (e.g., differences in species selection, differences in source oils, and differences in exposure medium preparation) make it challenging to compare toxicity data across studies. The authors conclude that while standard testing can generate useful information, the best available toxicity data need to account for procedural differences and reflect typical field conditions of the spill.
Second, the use of “flags” (e.g., estimated value, questionable value, and rejected value) to qualify chemistry data is common practice and provides a good initial indication of data quality. However, ecotoxicity data typically do not contain data qualifiers or flags. As described earlier, even if ecotoxicity data or studies are assigned a Klimisch score (or other data qualifier), such assignment may not apply if the data are used for a different purpose. As such, the reliability or relevance of ecotoxicity data may be challenged in court based on the independent review of a qualified expert (or team of experts, depending on the complexity of the analysis). For example, ecotoxicity test data used in support of submissions under the Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH) regulation in the European Union (EU) are reported in a public database hosted by the European Chemicals Agency. See European Chemicals Agency, Registered substances, available at https://echa.europa.eu/information-on-chemicals/registered-substances. In many instances, the database will report details such as the Klimisch score, the guideline method that was followed, whether GLP was followed, and whether there were test method deviations. However, the full test report, including raw data, often is not provided. Without an independent review of the underlying data and the criteria used to assign the Klimisch score to a specific study, the data or its reliability flag may be irrelevant in a different context. This issue is increasingly recognized by ecotoxicologists, and there is an ongoing debate in the scientific community regarding the value of GLP rules and Klimisch scores for evaluating toxicological evidence for regulatory decision making. See, e.g., Christopher J. Borgert, et al., Does GLP enhance the quality of toxicological evidence for regulatory decisions, Toxicological Sci. (May 2016).
Third, translating chemistry and non-chemistry data into measures of ecological risk or injury is not an exact science and often involves policy judgments. This subjectivity invites a number of questions, particularly during an ecological damage assessment. What constitutes an acceptable ecological risk or natural resource injury? (E.g., is an estimated loss in abundance of 20 percent of one invertebrate species acceptable?) What is a baseline ecological condition, and what is the natural variability in such a condition? (E.g., are the conditions preindustrial, consistent with those present prior to the start of operations at an industrial site, or consistent with those at relatively unimpacted sites nearby?) How do individual measurements translate into ecosystem service losses? (E.g., how do toxicity test results with a single species translate to fish production in a large river?) Many of these questions do not translate into testable hypotheses and their answers often rely on significant simplifying assumptions and value judgments.
As an illustration, several soil ecological screening levels for metals were developed by EPA using compounding worst-case assumptions. The resulting values are, in some instances, well below naturally occurring background levels (e.g., vanadium, lead). See EPA, Ecological Soil Screening Level (Eco-SSL) Guidance and Documents, available at https://www.epa.gov/risk/ecological-soil-screening-level-eco-ssl-guidance-and-documents. Clearly, an exceedance of these benchmarks is not indicative of an actual ecological risk, and indeed, the prospect of an unacceptable risk determination based on these agency-endorsed screening values undermines the scientific credibility of the risk assessment process as whole.
Another illustration of how policy decisions, rather than sound science, have driven environmental regulation can be found in the development of water quality standards. Water quality standards have been developed on the basis of aquatic ecotoxicity tests that have changed little in their conceptual design over recent decades. The standards rely on several simplifying assumptions: (1) the chemical concentration measured in the water (external dose) is an adequate surrogate for the unmeasured tissue concentration (internal dose); (2) the unmeasured tissue concentration is an adequate surrogate for the unmeasured concentration at the site of toxic action; and (3) a single mode of action is responsible for the endpoint of interest (e.g., mortality, growth). Research has shown that these simplifying assumptions can account for variability in aquatic toxicity test results of several orders of magnitude. The response has not been to develop better test designs reflecting the current state-of-the-science. Rather, the policy response has been to use large default uncertainty factors and apply them to the lowest ecotoxicity value (i.e., obtained with the most sensitive species). This has resulted in water quality standards that are not based on the best available science and offer a wide range of largely (unknown) ecological protection. See Daniel W. Smith, et al., ERAs and NRDAs: When Policy Masquerades as Science, American Bar Ass’n Envtl. Litig. E-Newsletter (March 2012) (exploring the policy and science issues implicated).
The three issues described above are by no means a comprehensive listing of ecotoxicity data quality and reliability issues. They illustrate, however, the significant data quality challenges that are unique to nonchemistry data, that inform ecological risk or injury assessment, and that need to be carefully considered in the context of ecological damage claims.
When ecological damage investigations rely on such data, they must be representative and of a known quality and integrity to ensure that scientifically sound decisions are made such that they can be defended and ultimately be admissible in court. Here, we provide a number of recommendations for the evaluation of ecological damage claims.
First, you are defenseless without defensible data. Data quality and reliability should always be a key consideration when evaluating data in the context of ecological risk or injury assessment. Existing data cannot support or disprove an ecological damage claim without an adequate evaluation of its quality, reliability, and relevance to the claim.
Second, having more data is not always better. Ecological risk assessors present unique opportunities for data collection and real-world verification that are not available to the human health risk assessor. Field evidence or experimental proof can be very powerful in the context of environmental damage litigation. However, such data are only useful if generated on the basis of clearly defined data quality objectives and all aspects of data generation are documented properly.
Finally, uncertainty should be recognized. Although ecological risk and injury assessment can seem like a black box or a value-driven process, there are significant advantages in clearly describing the quality and reliability of the data used in the assessment. Whenever possible, uncertainty associated with the underlying data or data analyses and interpretation steps should be evaluated—ideally quantitatively—and articulated. While the uncertainty section of an ecological risk or injury assessment is typically overlooked or relegated to an attachment, properly recognizing uncertainty on the basis of an in-depth data quality evaluation is very powerful. Think of it as describing the results of a political poll without properly recognizing its margin of error. The poll results are meaningless without describing its associated uncertainty.
Ultimately, evidence will more likely be admitted when a reliable, relevant, and defensible method is implemented correctly by a trained operator and all relevant data are maintained. Data produced by all parties should be closely scrutinized. The adage “assume nothing” is certainly apropos in natural resource damage investigations.