chevron-down Created with Sketch Beta.

Jurimetrics Journal

Jurimetrics: Spring 2024

The Impact of Defense Experts on Juror Perceptions of Firearms Examination Testimony

Brandon Louis Garrett, Richard Gutierrez, and Nicholas Scurich

Summary

  • Few studies have examined how laypeople evaluate firearms examination testimony.
  • A study randomly assigned participants to one of four conditions, each involving a prosecution firearms examiner testifying about an identification, to empirically test the impact of defense experts, including methods experts, on jurors.
  • Findings show, in part, that the rate of guilty verdicts fell substantially when defense expert testimony of any kind was present.
  • Defense attorneys and courts should account for the reality that different varieties of expert witness conclusions appear to affect jurors unevenly.
The Impact of Defense Experts on Juror Perceptions of Firearms Examination Testimony
nycowl via Getty Images

Jump to:

Abstract: Firearms examiners, who seek to link fired ammunition to a particular gun, have testified in criminal trials for over a century. Research suggests that such evi­dence is highly persuasive to jurors. However, no studies have examined the effect of divergent conclusions offered by defense firearms examiners, nor have any explored the impact of testimony by research scientists—sometimes called “methods experts”—re­garding the scientific foundation and limitations of the firearm examination discipline. The effect these types of testimony might have on jurors is, therefore, unknown. This Article reports the results of a novel empirical study testing the effects of such defense experts. Over 350 jury-eligible adults read a synopsis of a criminal trial, including tran­scripts of a firearms examiner. All four experimental conditions involved a prosecution firearms examiner testifying about an identification. In one condition, this testimony went unrebutted. In the remaining conditions, the defense relied on an expert of its own: either a firearms examiner who had reached a different conclusion (inconclusive or elim­ination) or an academic called to explain the limitations of firearms examination meth­ods. While most participants found the unrebutted testimony of a firearms examiner sufficient to convict the defendant, guilty verdicts were significantly reduced when the defense called an expert. Further, defense experts reduced the perceived likelihood that the defendant discharged the firearm, the strength of the prosecution’s case, the case-specific reliability of the firearm examination, and the general reliability of the firearm examination. However, critical differences existed between our various conditions in­volving defense expert testimony.

Citation: Brandon L. Garrett, Richard E. Gutierrez, & Nicholas Scurich, The Impact of Defense Experts on Juror Perceptions of Firearms Examination Testimony, 64 Jurimetrics J. 223–48 (2024).

For over a hundred years, firearms examiners have testified in criminal tri­als linking fired bullets or cartridge cases to a particular firearm. Such testi­mony continues to be in great demand, given the degree of firearm-related violence in the United States, with more than 100,000 requests for bullet or car­tridge case comparisons each year. But more recently, the consequences of the uncritical judicial acceptance of firearms comparison testimony have come into sharper focus. Indeed, we now know that firearms evidence has played a role in numerous high-profile wrongful convictions and that multiple forensic labora­tories have shuttered as a result of errors by practitioners. In the 2014 per cu­riam opinion in Hinton v. Alabama, for example, the U.S. Supreme Court reversed a conviction because of a defense lawyer’s inadequate performance in failing to develop firearms evidence at a capital murder trial in response to a state examiner’s comparison. Hinton was subsequently exonerated, and he commented: “I shouldn’t have [sat] on death row for thirty years . . . . All they had to do was to test the gun.”

As detailed in a recent comprehensive review of all judicial rulings in the United States concerning firearms comparison evidence, judges did not antici­pate or account for these consequences for many decades. Instead, after some initial skepticism, judges uncritically accepted firearms comparison experts and did not quickly change their approaches—even in the years after the United States Supreme Court’s 1993 decision in Daubert v. Merrell Dow Pharmaceu­ticals, which imposed clearer and more rigorous gatekeeping responsibilities to assess the reliability of scientific evidence. However, recent scientific schol­arship has called into question the validity and reliability of firearms examina­tion—contributing to an explosion of judicial rulings, some of them critical of the field. In a 2008 report, the National Academy of Sciences (NAS) found that “[t]he validity of the fundamental assumptions of uniqueness and reproduc­ibility of firearms-related toolmarks has not yet been fully demonstrated.” In its 2009 report, the NAS concluded “[s]ufficient studies have not been done to understand the reliability and repeatability of the methods.”

Solidifying this trend, in 2016, the President’s Council of Advisors on Sci­ence and Technology (PCAST) reviewed in detail all of the firearms examina­tion studies that had been conducted to date. While numerous studies were reviewed, PCAST found that only one study had been appropriately designed to test the performance of firearms examiners, and it had yet to be published in a peer-reviewed scientific journal. Therefore, PCAST concluded that “the current evidence falls short of the scientific criteria for foundational validity.” Addi­tionally, numerous research scientists—including psychologists, statisticians, and other academics with training in conducting science rather than applying a forensic technique—have furthered and expanded NAS and PCAST’s critiques in recent years by discussing how sampling issues, attrition, error rate calcula­tions, external validity, and the like impact the value of foundational research into the accuracy and consistency of firearms examination methods. Indeed, several such scientists have even testified about the research base of firearms examination during pretrial admissibility hearings or contributed to amicus briefs in support of defense motions to exclude such evidence. Generally, these methods experts reach the same conclusions as NAS and PCAST—that is, fire­arms evidence remains scientifically unvalidated. As one judge put it, “[R]arely do the experts fall into such cognizable camps, forensic practitioners on one side and academic researchers on the other.”

The impact of these modern critiques on the admissibility of firearms ex­amination has borne concrete results, but gradually. Judges were slow to react to scientific concerns raised regarding firearms comparison evidence, even after the Daubert ruling. As lawyers have litigated the findings of scientific reports and error rate studies with the addition of methods experts, more judges have imposed increasingly stringent limits. Rather than permit the expert to testify to a “source identification,” to the exclusion of all other firearms in the world, judges have instructed that experts must testify to a “reasonable certainty,” a “more likely than not” conclusion, or, still more limited, that they “cannot ex­clude” a firearm. In more recent years, following a well-known 2019 ruling in the D.C. Superior Court, judges have limited testimony of firearms experts in increasingly stringent ways. One trial judge even excluded firearms expert tes­timony outright in one case, finding that the field’s methods lack general ac­ceptance given inadequate evidence of reliability. But with the exception of that one ruling, admissibility remains the norm.

If courts have intended that such limits on the overstatement of findings would counteract what has been described as the “talismanic significance” af­forded forensic testimony by jurors, then there are reasons to doubt that their compromise solution focusing on conclusion language (in combination with vigorous cross examination) will have its desired effect. One prior study that examined how lay jurors react to variations in testimonial framing of conclu­sions as well as confrontation through cross examination found that (with the exception of precluding any inclusionary testimony and permitting experts to say only that they could not exclude the firearm in question) neither were effec­tive in moderating the guilty verdicts of participants. But there are also reasons to think judges may examine firearms testimony more carefully in future years and potentially impose greater restrictions or bars on admissibility. Federal Rule of Evidence 702 was amended for the first time since 2000 on December 1, 2023. The Advisory Committee notes emphasized these revisions were “espe­cially pertinent” to forensic evidence. The rule changes squarely addressed two issues that judges have grappled with in the area of firearms evidence: the reliability of the methods and the overstatement of conclusions.

The effectiveness of other approaches (by judges) to moderate or (by liti­gants) contest firearms examination evidence is not well understood. Therefore, stakeholders in the criminal legal system will benefit, in navigating the changes to Rule 702, from an expanded understanding of lay reactions to testimony of­fered by defense rebuttal experts, including concerns about methods used in the comparison of fired munition. Given trends in judicial regulation of firearms expert testimony, advances in the scientific understanding of that testimony, and changes to Rule 702 itself, we sought to examine how laypeople evaluate firearms testimony where the defense seeks to contest it using expert testimony of its own. Next, we turn to an introduction to how firearms experts conduct their work and testify, before describing our methods and study results.

I. An Introduction to Firearms Expert Testimony

A. Firearms Comparison Methods and Testimony

Firearms examination is a subspecies of toolmarks examination, which is the practice of examining marks to opine on whether they were left on a sub­stance by a particular type of tool or particular tool. When conducting com­parisons, firearms examiners seek to link crime scene evidence—such as spent cartridge casings or bullets—with a firearm. These examiners assume that the manufacturing processes used to cut, drill, and grind a gun leave distinct and identifiable markings on the gun’s barrel, breech face, firing pin, and other com­ponents. When the firearm discharges, those components contact the ammu­nition and leave marks. Examiners have long assumed that firearms leave distinct toolmarks on expended munitions. These examiners believe they can definitively link spent ammunition to a particular firearm using these tool­marks.

When firearms examiners testify as experts, they begin by opining on class characteristics. These class characteristics are design features such as the shape of the firing pin or the number and direction of the grooves on the barrel of the gun, which vary by manufacturer and type of firearm. Those design features would be shared by all firearms of that type, however, and do not permit any more specific identification of a particular firearm.

So-called “individual” characteristics permit those more searching conclu­sions. By the late 1990s, firearms examiners premised expert testimony on a “theory of identification” set out by a professional association, the Association of Firearms and Tool Mark Examiners (AFTE). AFTE defines individual char­acteristics as “[m]arks produced by the random imperfections or irregularities of tool surfaces. These random imperfections or irregularities are produced in­cidental to manufacture and/or caused by use, corrosion, or damage. They are unique to that tool to the practical exclusion of all other tools.”

In reviewing such class and individual characteristics, AFTE instructs prac­titioners to use the phrase “identification” to explain in testimony what they mean when they identify “sufficient agreement” of markings when examining bullets or cartridge cases. There are no quantitative guidelines or numeric thresholds for how many individual characteristics must be observed to reach “an identification.” Rather, the AFTE protocol states an identification is justi­fied “when the unique surface contours of two toolmarks are in sufficient agree­ment.” As the PCAST Report observed, it is a circular definition, ultimately relying on the expert’s own subjective decision that sufficient commonalities, nowhere defined, exist. AFTE nevertheless associates statistical certainty with “identifications” claiming that such a conclusion means “the likelihood another tool could have made the mark is so remote as to be considered a practical im­possibility.”

Even in the face of criticism of both the “sufficient agreement” standard (as described above) and their assertions of certainty, firearms examiners have largely refused, in recent years, to temper their conclusions. For example, fed­eral experts, following Department of Justice guidelines regarding expert testi­mony, use the term “source identification” to express their ultimate conclusions; are not prohibited from expressing their conclusions as a practical certainty; and are encouraged, having reached an “identification” to describe “the probability that the two toolmarks were made by different sources” as “so small that it is negligible.”

More importantly, though, firearms examiners have limited their conclu­sions regarding firearms evidence for a different reason: judges have ordered them to do so. Those court-imposed restrictions have included limiting examin­ers to opinions of reasonable certainty, “more likely than not,” “consistent,” or “could not exclude.”

B. Jury Research on Firearms and Forensic Experts

As noted, few studies have examined how laypeople evaluate firearms ex­amination testimony. One prior paper, presenting two studies, found as a pre­liminary matter that laypeople place great weight on such testimony. In the first study, the authors found variation in conclusion language (reasonable cer­tainty, more likely than not, source identification, and the like) did not affect guilty verdicts, nor jurors’ estimates of the likelihood that the defendant’s gun fired the bullet recovered. In contrast, a more limited conclusion that an exam­iner “cannot exclude the defendant’s gun” did significantly reduce guilty ver­dicts and likelihood estimates alike. In the second study presented in that paper, presence of cross-examination largely did not affect these findings.

A small earlier study examined how laypeople evaluate firearms conclusion language, surveying 107 participants and finding “a significant main effect for certainty,” with increased expression of expert certainty generally leading to in­creased participant certainty. A follow-up study with 437 U.S. participants ex­amined the impact of cross-examination on lay evaluations of firearm testimony. The study placed half of the participants in a group who were told the conclusion of a firearms expert and half in another group who were given a statement that the expert acknowledged limitations on cross-examination. Neither group was provided with a transcript. As predicted, the acknowledgment of limitations on cross-examination reduced the weight participants placed on the evidence, where the expert had earlier professed certainty.

Finally, a recent study explored the impact of firearms examination testi­mony on 492 and 1002 undergraduate psychology students across two experi­ments. Experiment 1 varied firearms examination conclusion testimony offered by the prosecution (identification vs. inconclusive vs. elimination) and found statistically significant differences in guilty verdicts between each. Ex­periment 2 added a control condition without firearms examination testimony of any kind, as well as a condition in which the firearms examiner considered the evidence unsuitable for comparison, and found that while identification tes­timony substantially increased guilty verdicts, lay participants treated inconclu­sive conclusions as essentially neutral (i.e., less inculpatory than identification testimony, more inculpatory than elimination testimony, and equivalent to un­suitable and no forensic evidence conditions).

This small number of studies examining firearms expert evidence follows a larger body of research examining how jurors evaluate other types of forensic testimony. Generally, laypeople place strong weight on forensic science and view it as highly accurate and persuasive. Other studies have found that lay­people are “sometimes insensitive” to variations in the way in which a forensic “match” is communicated using qualitative terms. For example, jurors place great weight on fingerprint evidence and regard it as accurate and reliable, re­gardless of whether the expert expresses conclusions in more certain or more cautious terms. However, some evidence suggests that the weight mock jurors place on forensic evidence varies depending on the forensic discipline. In ad­dition, multiple research efforts have concluded that cross-examination shows “little or no ability . . . to undo the effects of an expert’s testimony on direct examination.” But specific lines of cross-examination, about error rates and proficiency of experts as well as subjectivity and bias, appear to buck this trend and impact laypeople.

One prior study examined the impact of defense expert testimony during a battle of the experts in a criminal case. That study examined three types of rebuttal testimony in a mock trial involving fingerprint expert testimony: (1) a methodological rebuttal explaining the general risk of error in the fingerprint-comparison process; (2) a new-evidence rebuttal concluding the latent finger­print recovered in this case was not suitable for comparison; and (3) a new-evidence rebuttal excluding the defendant as the source of the latent finger­print. All three rebuttals significantly altered perceptions of the prosecution’s fingerprint evidence, but new-evidence rebuttals proved most effective. No such study has been done in the context of firearms evidence, that is, examining the impact of different types of defense expert witnesses on lay perceptions of the evidence.

Other studies have focused on how laypeople evaluate DNA evidence, which is presented using statistical and not qualitative conclusions. Studies have found that jurors place especially high weight on DNA evidence. However, jurors have been found in a variety of studies to be sensitive to different presen­tation formats of statistical conclusions in the DNA context, including under­valuing the evidence in some instances, as well as falling prey to logical fallacies of different types and undervaluing the risk of error.

Overall, that body of work suggests that laypeople place great weight on firearms expert testimony, and alterations to the language used to communicate the conclusions have little impact. What has yet to be tested is whether, in the specific context of firearms examination testimony, defense experts—including research scientists who explain the weakness in the scientific foundation of the relevant methods, or “methods experts”—have any effect on jurors. Methods experts testify almost exclusively in admissibility hearings to judges. They have rarely been called at trial to explain their analyses to jurors. It may take a fair amount of evidence regarding the lack of reliability of firearms methods to mod­erate jurors’ prior beliefs. It is also unclear if jurors have the wherewithal to understand the scientific foundation of firearms examination. As one judge noted after conducting an extensive pretrial evidentiary hearing:

[A] full exploration of the issues surrounding the reliability of [firearms exam­iner] evidence in the present case required several days of testimony from mul­tiple expert witnesses, close evaluation of numerous applied-science studies, exploration into the studies’ design and methodology and the problems arising therefrom, and advocacy by counsel on each side specially tasked with litigat­ing forensic science issues. It would be fanciful to conclude that the normal adversarial process would enable a lay jury to adequately understand these is­sues . . . .

The present Article reports the results of a study designed to empirically test the impact of defense experts, including methods experts, on jurors. As described below, we report the results of an online study in which 351 jury-eligible adults read transcripts of a criminal trial and rendered a verdict along with several other judgments regarding the guilt of the defendant and the strength and reliability of the prosecution’s evidence.

II. Study Design and Methods

A. Participants

The study participants were recruited through Prolific and completed the study online. Prolific is a crowdsourcing platform that can produce high-quality data suitable for social science research. Individuals had to be jury-eligible (i.e., over age 18, a resident of the United States, and able to speak English) to participate in this study. The study also included several attention check and reading comprehension questions to ensure participants were engaged with the materials. Participants who failed attention or reading check questions or were identified from suspicious, duplicate geolocations were terminated from the study and excluded from subsequent analyses.

The final sample was comprised of 351 participants, with ages ranging from 18 to 75 (median = 34, IQR = 18). The sample was gender balanced, with 50% self-identifying as male and 50% as female. In terms of self-reported racial and ethnic backgrounds, 10% identified as Black, 9% as Asian, 66% as White, 10% as Hispanic, 0.3% as Native American or Pacific Islander, and the remaining participants selected other categories. Education-wise, 54% held a two-year col­lege degree or less, 33% possessed at least a four-year college degree, and 13% held a post-graduate degree. The sample included residents from 43 states.

Participants were asked to self-identify their political preferences, with 3.4% identifying as “Very Conservative,” 14.8% as “Somewhat Conservative,” 24.5% as “Middle of the Road,” 30.2% as “Somewhat Liberal,” and 26.8% as “Very Liberal.” Additionally, 9% reported an annual household income of less than $20,000, while 15% reported an income exceeding $100,000. Furthermore, 17% reported having previously served on a jury. The vast majority (75%) of participants who had previously served as a juror reported having served in a criminal trial. Participants were then asked which trial error causes more harm in society, 45% thought that “erroneously convicting an innocent person” caused the most harm, while 8% thought “failing to convict a guilty person” caused the most harm, and 47% thought “both are equally bad.”

A minority (18%) of participants reported being firearm owners. Approxi­mately 1/3 of the sample reported being extremely, moderately, or slightly com­fortable with firearms, whereas over half of the sample reported being extremely, moderately, or slightly uncomfortable with firearms, and a small mi­nority (12%) were neither comfortable nor uncomfortable. Note that none of these demographic or individual difference variables appear to be related to or predictive of the outcome measures in this study (e.g., guilty verdicts).

B. Procedure

After consenting to participate in the study, participants provided demo­graphic information, which was reported above. In addition, participants an­swered the following question in Figure 1, adapted from Koehler, about the false positive error rate:

 

Figure 1. Question to Participants About the False Positive Error Rate

Figure 1. Question to Participants About the False Positive Error Rate

Figure 1. Question to Participants About the False Positive Error Rate

 

The median and modal responses are 1-in-1,000, though there is quite a deal of variability in the responses. Notably, 7% of participants thought that “such an error is impossible,” and 17% of participants thought the error rate was between 1 in 1 million and 1 in 1 billion.

After completing this question, participants were presented with the case materials. The case materials, which were over 1,600 words in length, are de­scribed in detail in the next section. After reading the case materials, participants received judicial instructions regarding the standard of proof (i.e., “Beyond a Reasonable Doubt”) and were asked whether, given the state’s burden to prove guilt beyond a reasonable doubt, they would convict the defendant (“guilty” or “not guilty”). Participants also rated the likelihood the defendant was the indi­vidual who discharged the firearm on a scale from 0 to 100% and rated the strength of the case against the defendant on a 9-point scale from “not at all strong” to “extremely strong.”

Participants subsequently responded to three items probing the validity of the firearm analysis presented in the instant case. These items used a 7-point Likert scale, with higher values indicating greater validity. These items were combined to create a composite score referred to as “Case-specific Reliabil­ity.” Participants also answered two questions (also rated on a 7-point Likert scale) about firearm examination analysis generally. These items were com­bined to create a composite score referred to as “General Reliability.” Finally, participants were asked whether guns leave unique markings on discharged bul­lets or casings and which trial error they believed causes more harm to society: “failing to convict a guilty person,” “erroneously convicting an innocent per­son,” or “both are equally bad.” Participants were then thanked for their effort and compensated.

C. Materials

Each participant read a synopsis of a criminal case involving a defendant charged with discharging a firearm. The case synopsis was adapted from an ac­tual criminal case, United States v. Driscoll (2003). In brief, the defendant, referred to as “Mr. Cole,” was charged with willfully firing a firearm during the commission of a felony and attempted armed robbery. During the attempted robbery of a convenience store, a firearm was discharged into the floor, and the masked culprit fled. A 9mm bullet was recovered from the floor. During a rou­tine traffic stop two days later, the police confiscated a 9mm handgun from the defendant.

At trial, the prosecution called a firearms examiner. Participants were pre­sented with transcripts of his testimony. In short, the examiner testified about his training and experience and gave a detailed analysis of how firearm exami­nation is conducted, including a description of class and individual characteris­tic evaluation and as his analysis of the case at hand. The portion of testimony in which he described his conclusion is as follows:

Q. What did you conclude?

A. I observed sufficient agreement between the individual characteristics on the surfaces of the crime scene bullet and the surfaces of the test-fired exemplar bullets. Sufficient agreement means that the level of similarity I saw during my comparison exceeds the best agreement demonstrated between toolmarks known to have been produced by different tools and is consistent with agree­ment demonstrated by toolmarks known to have been produced by the same tool. Therefore, I concluded that the crime scene bullet was fired by the De­fendant’s gun.

Q. Is there a term for that conclusion?

A. Yes, it is called an identification.

Q. Can you explain what that term means?

A. It means that the likelihood that another gun besides the Defendant’s gun could have fired the crime scene bullet is so remote as to be considered a prac­tical impossibility.

Q. Does that mean you are absolutely certain that the Defendant’s gun fired the bullet recovered from the convenience store?

A. My conclusions are not made with 100% certainty. But repeated studies show that errors, especially false identifications are incredibly rare, they hap­pen in less than 1% of cases.

There were four experimental conditions in this study. Participants were randomly assigned to a single condition. In the control condition (hereafter de­scribed as the “Identification condition”), the defense did not call a witness nor cross-examine the findings of the prosecution expert. In two conditions meant to capture the impact of firearms examination’s limited reproducibility, the defense called its own examiner who had reached a different conclusion than the prosecution’s expert. In both conditions, the defense’s examiner described his training and experience (akin to that of the prosecution’s expert), explained that he used “the same basic methods” as the prosecution’s examiner to conduct his own comparisons, and provided context for how a disagreement between examiners could occur as follows:

Q. Are those methods foolproof?

A. Of course not. Every scientific method has some degree of uncertainty when used in practice. The amount of agreement that each examiner considers suffi­cient is also somewhat subjective, there is no numerical standard, it is what each individual examiner has built up in their mind’s eye. That means, although it is not common, firearms examiners can disagree about whether two bullets were or were not fired by the same gun.

In the first of these conditions, the defense’s examiner then testified that the defendant’s gun did not fire the 9mm bullet recovered at the crime scene (here­after described as the “Elimination condition”). In the second of these condi­tions, the defense’s examiner reached an inconclusive decision about whether the defendant’s gun fired the bullet recovered (hereafter described as the “In­conclusive condition”).

In a fourth and final condition, the defense called a statistics professor from a major research university to testify about the scientific foundation of firearm and toolmark examination (hereafter described as the “Methods Expert condi­tion”). The methods expert explained that he was a research scientist, not a prac­titioner, and he had reviewed the scientific studies of firearm examination. He explained that the National Academy of Sciences and other prestigious scientific organizations had reviewed the discipline of firearm examination and found it had not been studied adequately. The methods expert explained that he reviewed twenty studies of firearm examination and reached the same conclusion. He also noted that inconclusive responses in the studies made the results ambiguous. A portion of his testimony appears below, while the complete version appears in Appendix A:

Q. Let’s start with this; what is your general conclusion from these studies?

A. Generally speaking, I reached the same conclusion as the committees of research scientists. The studies of firearms examination report error rates of around 1–2% but these error rates are not trustworthy because of weaknesses in the design of these studies: they do not cover a wide enough range of guns, they use volunteers, and they do not make sufficient efforts to include chal­lenging comparisons. But there is an even bigger problem involving how the authors of these studies calculate error rates.

Q. What is that problem?

A. First of all, note that these studies are like an exam that I give to students. The firearm examiners who participate in the studies know they are being tested, and they know that if they fail the test, so to speak, there could be major problems for their field. So, the biggest problem with the studies, in my opin­ion, is that the examiners call most of the comparisons “inconclusive.” They are basically skipping the questions with this response.

We note the case materials did not include elements such as cross-exami­nation, opening or closing arguments, or additional evidence aside from the fire­arm expert’s testimony. The absence of these features limits the ability to extrapolate the findings without additional testing. However, we also note that many of the issues that would be brought up on cross-examination were pre­sented during the direct examination of the rebuttal witness. Given that this study is the first of its kind, we deemed the sacrifices to the realism of the study appropriate and acknowledged that the findings should be replicated under more realistic conditions before definitive pronouncements are made.

III. Study Results

A. Jury Verdicts

Overall, 35% of participants voted to convict the defendant at the conclu­sion of the trial. The percentage of participants who voted to convict in each experimental condition appears in Figure 2 below.

 

Figure 2. Guilty Verdicts as a Function of Experimental Condition

Figure 2. Guilty Verdicts as a Function of Experimental Condition

Figure 2. Guilty Verdicts as a Function of Experimental Condition

 

As is apparent from Figure 2, 66% of participants in the Identification condition voted to convict. Recall that participants in this condition were exposed only to the testimony of the state’s firearms examiner; the defense did not call a witness in this condition. The conviction rates decreased when the expert called by the defense reached an Inconclusive (28%) or Elimination (14%) conclusion. The conviction rate in the Methods Expert condition was 34%. The conviction rate differed statistically across the experimental conditions.

B. Perceptions of the Strength of the Case and Evidence

In addition to providing a binary verdict, participants also rated the likeli­hood that the defendant was the individual who discharged the firearm, the strength of the case against the defendant, the case-specific reliability of the firearm examination, and the general reliability of firearm examination. The im­pact of the experimental conditions on each of these measures is reported in Table 1 below.

 

Table 1. Impact of the Experimental Conditions

  95% Confidence Interval for Mean

Measure

Experimental Condition

Mean

Std. Deviation

Lower Bound

Upper Bound

How strong is the case against the defendant?

Identification

6.28

1.947

5.87

6.7

Inconclusive

4.62

1.82

4.24

4.99

Elimination

4.01

1.746

3.64

4.39

Methods Expert

4.78

2.054

4.33

5.23

What is the numerical probability that the defendant is the man who fired the shot in the convenience store?

Identification

77.86

22.65

73.06

82.66

Inconclusive

58.19

25.69

52.93

63.45

Elimination

45.02

23.20

40.05

50.00

Methods Expert

55.92

28.78

49.63

62.20

Case-Specific Reliability

Identification

6.07

1.23

5.81

6.33

Inconclusive

5.34

1.11

5.12

5.57

Elimination

5.22

1.01

5.01

5.44

Methods Expert

4.87

1.36

4.57

5.17

General Reliability

Identification

5.90

1.26

5.63

6.16

Inconclusive

5.30

1.23

5.05

5.55

Elimination

4.98

1.26

4.71

5.25

Methods Expert

4.49

1.55

4.15

4.83

For the strength of the case measure, statistically significant differences ex­ist across the experimental conditions. The case was perceived as weaker for all conditions compared to the Identification condition. This is not surprising given that the prosecution’s firearms examination evidence went unchallenged in the Identification condition. Compared to the Methods Expert condition, there was no difference in the strength of case ratings for the Inconclusive condition but the Elimination condition was deemed significantly less strong. In other words, the case was deemed weaker when the defense expert concluded “elim­ination” compared to a methods expert testifying about discipline, but there was no difference when the defense expert concluded “inconclusive” compared to a methods expert.

The findings are consistent for the probability the defendant fired the gun measure: while all experimental conditions yielded lower probability ratings than the Identification condition, the Methods Expert and Inconclusive condi­tions did not differ from one another but the Elimination condition was rated significantly lower than both. The findings are also quite similar for the relia­bility ratings (case specific and general). For the case-specific reliability, iden­tification yielded the highest rating, while the other conditions reduced the case-specific reliability. When a defense expert reached a different conclusion (ei­ther inconclusive or elimination) or when a methods expert testified about the weakness of the discipline, the identification made by the state’s examiner was deemed less reliable. While the ratings in the Methods Expert condition were not statistically different than the Elimination condition, the ratings in the Methods Expert condition were lower than the Inconclusive condition. All conditions involving a defense expert also reduced the General Reliability rat­ings compared to the Identification condition. The Methods Expert condition yielded lower ratings than the Inconclusive condition, though it was not statis­tically different than the Elimination condition. Aside from statistical signifi­cance, the method expert reduced the perceived reliability both in the specific case and of the discipline generally. In contrast, the Elimination condition yielded the lowest strength of case ratings and the lowest probability that the defendant fired the gun ratings.

IV. Discussion

As the United States Supreme Court made clear in its Daubert decision, both Rule 702 (governing expert witness testimony) and Rule 403 (calling on judges to weigh the probative value and prejudice of proffered evidence) impose gatekeeping responsibilities on judges to assess the reliability of forensic evi­dence. These responsibilities require judges to carefully “evaluate the evi­dence in light of the jury’s unique role,” to “assess not just how valid the data is, but how well the jury can understand it after direct and cross-examination, and legal instructions.” Although judges have cautioned that the safeguards of the adversarial process may not suffice to alert jurors to the limitations of meth­ods, as outlined above, judges have not often excluded questionable forensic expert evidence, including firearms experts. This study, therefore, meets the ju­diciary essentially where it lays and explores whether, in the absence of stringent gatekeeping, defendants might nonetheless succeed in highlighting method lim­itations and inconsistency using experts of their own.

As a caution to defendants and the courts, our results show that laypeople appear to assume a high degree of validity in firearms comparison and are likely to convict based on such evidence if left unrebutted. Initially, when we explored mock jurors’ intuitive sense of the false positive rate for firearms examination, our participants provided responses largely out of step with the empirical record underlying the method. Even discounting criticisms of the research exploring the accuracy of firearms examination described earlier and using calculations favored by the authors of much-cited studies, false positive rates for the field still fall in the general range of 1–2%, or put another way, a 1 in 100 or even 1 in 50 chance of a misidentification. But a sizable majority of our participants (72%) believed the false positive rate to be orders of magnitude smaller than such estimates. In fact, the percentage who placed the chance of a misidentifi­cation at 1 in a million or less (24%) was nearly equal to that of those who placed the chance of misidentification at levels equal to or greater than rates reported in studies (28%).

Our findings about the persuasiveness of firearms examination evidence are not limited to intuitions. A majority of our participants (66%) were also willing to convict on the basis of firearms examination evidence alone when it went unrebutted. Recall that our case synopsis included no evidence whatsoever cor­roborating the firearms examiner’s linking of the convenience store bullet to the firearm possessed by the defendant: no confession, eyewitness identification, or otherwise. These figures are consistent with prior work in which 55–62% of mock jurors voted to convict under most variations of the language used to de­scribe a bullet “match,” despite the complete lack of evidence corroborating the firearms examiner’s conclusion and when exposed to cross-examination. Thus, attorney preparation and strategy, as well as judicial gatekeeping, should begin from the premises that jurors may enter trial with an inflated sense of the reliability of firearms examination methods and may accord such evidence un­due weight if it is left unrebutted and uncontested.

However, when defense expert testimony of any kind was present, the rate of guilty verdicts fell substantially (along with our other measures of juror per­ceptions of reliability and case strength). These findings powerfully support the defense strategy (in the event a judge admits firearms examination evidence) of calling an expert of its own, even if that witness testifies solely regarding methods. Further, they should impel judges and legislatures to ensure that de­fendants have access to adequate funding to retain expert witnesses in cases in­volving bullet and cartridge case comparisons. Because of insufficient public defender budgets or below market expert rates set by states, defendants too often lack such funding. This reality must change if the accused are to have the opportunity to meaningfully contest forensic evidence.

Finally, when considering the impact of calling a rebuttal expert, defense attorneys and courts should account for the reality that different varieties of ex­pert witness conclusions appear to affect jurors unevenly. Guilty verdicts dropped precipitously when the defense expert concluded Elimination (14% voted to convict), compared to when the defense expert concluded Inconclusive (28% voted to convict), or to when a methods expert was called (24% voted to convict). The same pattern of results followed for the strength of case ratings and the estimated probability that the defendant fired the gun. However, ratings of the credibility of the firearm evidence—in general and in the specific case—were lowest for the Methods Expert condition. These findings suggest that mul­tiple factors affect juror decision making; the specific type of rebuttal evidence differentially affects at least some of these factors. Future research might ex­plore additional factors, such as the impact of experts presenting images of the examined bullets or cartridge cases or a second examiner verifying evidence.

In the meantime, we caution that, however predictable or reasonable, the lesser weight jurors appear to accord inconclusive and methods testimony by defense experts poses significant risks to the innocent. That jurors would privi­lege elimination over methods testimony is not surprising given prior research into jurors’ particular attentiveness to case specific information suggesting that an expert could have made an error. And, the same is true of elimination ver­sus inconclusive testimony; it is more exculpatory for a defense witness to opine that two bullets display “significant disagreement” than to merely suggest “an absence, insufficiency, or lack of reproducibility” of relevant characteristics. But the elimination conclusions jurors most value are likely to be unavailable even to innocent defendants. Across multiple studies of firearms examination, accuracy rates have been substantially lower on different-source comparisons than they are on same-source comparisons, with divides between sensitivity and specificity of up to 73.4% (96.2% sensitivity versus 22.8% specificity). Even if the bullets were not fired by the same gun, there might be only a 23% chance an examiner will report an Elimination condition compared to a 96% chance an examiner will call an Identification condition if the bullets were fired by the same gun.

The scientific community has raised concerns over this massively asym­metric production of inculpatory and exculpatory evidence and the potentially inappropriate use of the inconclusive category that drives it. But our results suggest that such concerns are not merely academic or related to the appropriate calculation of errors rates. Instead, because jurors do not perceive Inconclusive conclusions as having similar exculpatory value as an Elimination, the asym­metry of firearms examination is likely to exact a very real cost from defendants seeking to rebut bullet and cartridge case comparison testimony.

At present, laboratory policy and examiner practices cut against elimination conclusions. As other scholars have pointed out, examiners call Inconclusive “despite almost certainly knowing they were looking at actual nonmatches.” And even more problematic, some laboratories have an explicit policy that pro­hibits examiners from calling Elimination when class characteristics agree. Unless and until this ecosystem of policies and practices changes, even defend­ants whose firearms display significant, exculpatory differences are likely to be denied the expert testimony with the best chance of acquitting them. Adherents of firearms examination have long argued that issues concerning accuracy and reliability go to weight rather than admissibility because “unlike some DNA analysis, ballistic evidence is never consumed and is, therefore, always available to be reexamined.” But if all defendants can regularly hope for are inconclu­sive or methods experts, then our results show that they are losing critical ex­culpatory value, and this logic cannot stand.

Conclusion

Our study shows defendants will benefit from calling their own experts in cases involving bullet and cartridge case comparisons. But it simultaneously highlights the great weight jurors are likely to place on firearms examination evidence, and reinforces concerns about the field’s asymmetric production of inculpatory and exculpatory evidence. Some of these concerns might well be remedied by evolving reporting practices or the development of “objective” comparison algorithms untethered from current examiner biases against elimi­nations. But until then, and at a time when the rules of evidence have been amended to emphasize the role of judges as gatekeepers, we should continue to explore a wider range of tools to improve the litigation of forensic expert testi­mony.

 

 

Appendix A. Testimony of the Methods Expert

Q. Good afternoon. Can you please introduce yourself to the jury?

A. Good afternoon. My name is Jeffery Smith.

Q. How are you employed?

A. I am a professor at Northwestern University.

Q. What is your title and how long have you been employed there?

A. I am a professor of statistics, and I have been on the faculty for 15 years.

Q. Can you describe your educational background?

A. I have a bachelor’s degree in psychology from New York University, and a PhD in statistics from Georgetown University

Q. Dr. Smith, are you a firearms examiner?

A. No.

Q. Have you ever looked under a microscope at bullets or cartridges?

A. No.

Q. Are you aware that a firearm examiner—Mr. Patrick West—looked at bullets in this case and reached an identification?

A. Yes, I am aware of his findings.

Q. Ok, so you are not a firearm examiner and you have never conducted a fire­arm examination—what does your expertise have to do with firearm examina­tion?

A. I am a research scientist. I conduct research as part of my duties at North­western University. A basic principle of science is that hypotheses need to be tested by collecting data. My expertise allows me to conduct tests of firearm examiners to see if the methodology they follow produces valid results.

Q. Can you give us an analogy to help us understand that?

A. Sure. In medicine, there are scientists who work to develop a vaccine or drug and then they test the new drug with a randomized clinical trial. If the test vali­dates the drug, then nurses or doctors will administer the drug to patients. The nurses who administer the drug don’t actually conduct the research to see if the drug works or not. That requires a different skill set. That is where research scientists like me come in.

Q. Have research scientists examined whether firearm examination is a valid science?

A. Yes, a panel of renowned scientific experts reviewed the studies of firearm examiners in 2009. That panel found that the practice of firearm examination had yet to be studied adequately.

Q. What do you mean by “studied adequately”?

A. The panel reviewed the studies that existed at the time and found that they were not properly designed. The studies were created by firearm examiners with no specialized training in conducting research.

Q. That was over 10 years ago; have any other reviews been conducted?

A. Yes, another review was conducted in 2016. Again, that committee of preeminent research scientists found the studies were not properly designed.

Q. And who were these committees?

A. One committee was a group appointed by the National Academies of Sci­ence, which is the most prestigious group of scientists in the United States.

Q. Ok. Dr. Smith, have you conducted your own review of the scientific studies on firearm examination?

A. Yes, I have.

Q. How many studies did you review?

A. I would estimate 20 studies.

Q. Let’s start with this; what is your general conclusion from these studies?

A. Generally speaking, I reached the same conclusion as the committees of re­search scientists. The studies of firearms examination report error rates of around 1–2% but these error rates are not trustworthy because of weaknesses in the design of these studies: they do not cover a wide enough range of guns, they use volunteers, and they do not make sufficient efforts to include challenging comparisons. But there is an even bigger problem involving how the authors of these studies calculate error rates.

Q. What is that problem?

A. First of all, note that these studies are like an exam that I give to students. The firearm examiners who participate in the studies know they are being tested, and they know that if they fail the test, so to speak, there could be major prob­lems for their field. So, the biggest problem with the studies, in my opinion, is that the examiners call most of the comparisons “inconclusive.” They are basi­cally skipping the questions with this response.

Q. Can you explain that in more detail?

A. In fieldwork, fired bullets can be damaged such that they lack any infor­mation that an examiner could use. For example, when a bullet hits a concrete wall and flattens out. Comparing such a bullet to one from the defendant’s gun would be “inconclusive.” We can’t tell one way or another whether the two bul­lets were fired by the defendant’s gun. But that is not what happens in a study. In a study, bullets are excluded from the study if they don’t have good markings. So there should be no inconclusives in the studies.

Q. But are there inconclusives in the studies?

A. Yes, quite a bit of them actually. In one study, over 50% of all the responses were called inconclusive.

Q. And how did the study count those responses?

A. As correct responses.

Q. Do you agree with that approach?

A. No. In a study, there is only one correct answer. The bullets are either a match or not a match. When they say “inconclusive” that is not a correct response. So the error rate could be as high as 50%. That’s basically a coin flip.

Q. Can you give an analogy to help us understand why this is a problem?

A. Imagine I give my students an exam to test whether they understand statistics; if they respond “I don’t know” to half of the exam questions and then get the other half correct, I don’t conclude that my students are ready to be statisticians at the FDA. Obviously, they selected the easy questions and dodged the difficult questions. The test results do not tell me whether the students can actually do statistics problems. It is unknown.

Q. And is that what you are saying with respect to the studies of firearm exam­iners—we just don’t know if they can do what they claim or not?

A. Yes, precisely. The tests do not provide evidence that examiners can do what they claim to be able to do. Maybe they can do what they claim, but I have not seen a study that supports this conclusion. More studies are needed.

Q. Just to summarize, you do not believe the practice of firearm examination is scientifically valid?

A. Correct. It has not been proven valid.

Q. And that is the same conclusion researcher my many other committees of research scientists like yourself?

A. Yes.

Q. Thank you for your testimony today.

    Authors