chevron-down Created with Sketch Beta.

Jurimetrics Journal

Jurimetrics: Winter 2023

Empirically Assessing 510(k) Device Safety

George Horvath

Summary

  • A pilot study demonstrates the feasibility and utility of combining regulatory ancestry study with network visualization and quantitative empirical analysis and begins to develop a nuanced understanding of how the 510(k) pathway serves the goal of ensuring device safety.
  • Clinicians and academic physicians, legal scholars, courts, public safety advocates, the medical device industry, and even the FDA itself have developed a robust set of criticisms of the FDA’s record of and ability to ensure the safety of 510(k) devices.
  • Striking an optimal balance between ensuring the safety of and permitting or even facilitating the innovation of medical devices is critically important.
Empirically Assessing 510(k) Device Safety
Comezora via Getty Images

Jump to:

Abstract: Most medical devices that reach the U.S. market are cleared by the FDA through the 510(k) pathway. This pathway has been repeatedly subjected to a general criticism, that it fails to ensure device safety. Critics have also identified specific statutory and regulatory provisions and specific FDA implementation practices that they claim compromise device safety. Based on these criticisms, many have proposed reforms that would dramatically alter or even eliminate the 510(k) pathway and upend the entire med­ical device regulatory regime. However, the empirical evidence that supports these criti­cisms and reform proposals is woefully limited. Indeed, the Institute of Medicine (IoM) concluded in 2011 that empirical study of the 510(k) pathway would be cost-prohibitive and unlikely to yield substantial benefits.

This Article challenges the IoM’s conclusions by demonstrating that empirical analy­sis can be used to assess many of the criticisms of the 510(k) pathway and to inform future reform proposals. The Article presents a pilot study of 510(k) clearances in a lim­ited cohort of medical devices, combining the “regulatory ancestry” methodology that has been developed by medical scholars with quantitative analysis and network science visualization to analyze a set of medical devices in one technology space. The study demonstrates that these methodologies can provide reliable and useful information about how the 510(k) pathway functions to ensure medical device safety. The study supports the general criticism of the pathway regarding safety, finding that more than 10% of devices in the cohort were unsafe. The study also suggests that criticisms of some specific aspects of the pathway may have a solid basis, while failing to provide support for many other criticisms. Given the disruptive potential of many of the reforms that have been proposed, the study presented here provides a justification and a roadmap for a large-scale empirical study of the 510(k) pathway.

Citation: George Horvath, Empirically Assessing 510(k) Device Safety, 63 Jurimetrics J. 113–68 (2023).

The Federal Food and Drug Administration (FDA, or Agency) has the un­enviable task of regulating medical devices in the United States. The Agency must balance several often-conflicting statutory requirements and policy goals: to ensure that medical devices are safe, that they are effective, that the FDA’s regulatory activities do not stifle (or, even that they positively facilitate) the in­novation of new technologies, and that regulation does not unduly delay access to new technologies once they have been developed. At some point, many of our lives will depend—often quite literally—on a regulatory system that strikes an optimal balance between these policy goals.

In 1976, Congress amended the Food, Drug, and Cosmetics Act through the Medical Device Amendments (MDA, or the Act), creating the modern federal regulatory regime for medical devices. One aspect of this regime that has come under particularly harsh and sustained criticism is the Premarket Notification, or 510(k), pathway. Manufacturers may obtain FDA clearance to bring devices that present an intermediate level of risk (Class II devices) to the U.S. market through the 510(k) pathway, which by design is less burdensome than the Pre­market Approval (PMA) pathway for high-risk (Class III) devices. One of the key features of the 510(k) pathway is that rather than requiring clinical trial ev­idence that a new device is safe and effective, a manufacturer typically is only required to demonstrate that the new device is “substantially equivalent” to a device that has already been cleared for the U.S. market, or in other words, has the same intended uses and similar technological features as an earlier device.

Clinicians, academic physicians, legal scholars, patient advocates, the de­vice industry, and even the FDA itself have assailed various features of the 510(k) pathway, finding fault with aspects ranging from statutory and regulatory provisions that structure the pathway through the Agency’s implementation of the statute, and with the pathway’s effects on safety. In an example from a re­cent JAMA Viewpoint article, Doctors Eli Adashi and Katina Robison and Pro­fessor I. Glenn Cohen characterized the 510(k) pathway as leaving a “deadly legacy.” Many critics have urged reforms to the pathway, from minor tinkering with the FDA’s implementation to more fundamental changes to the statutes and regulations that structure the pathway to the complete elimination of the path­way itself. Some of these proposals would radically alter the medical device regulatory regime and upend decades of settled expectations by the device in­dustry, patients, physicians, and others.

Such serious criticisms and disruptive reform proposals should be evaluated and informed by the most robust empirical evidence that can reasonably be ob­tained. Unfortunately, this has not been the case: most criticisms and proposals have drawn on little to no empirical evidence. In fact, the most thorough assess­ment of the 510(k) pathway to date, the Institute of Medicine’s 2011 report, Medical Devices and the Public's Health: The FDA 510(k) Clearance Process at 35 Years, concluded that a broad empirical study of the 510(k) pathway would be too labor intensive and costly in comparison to the limited value it would afford to justify the effort, claiming that “[t]he cost of the exercise would be staggering; the benefit would be, it is hoped, small in terms of identifying de­vices that should not have gotten to the market by a 510(k) clearance.” Al­though these and other criticisms of and calls for reform to the 510(k) pathway have continued unabated since the Institute’s report, the pathway remains the avenue through which the vast majority of FDA-reviewed devices reach the U.S. market and thus patients.

This Article focuses on how well the 510(k) pathway serves the goal of ensuring that medical devices marketed in the United States are safe, advancing the claim that the Institute of Medicine’s pessimistic conclusions about the fea­sibility and value of empirical study of the 510(k) pathway are no longer justi­fied. I propose that many of the safety criticisms that have been levelled against the pathway can be evaluated empirically using publicly available information that can be obtained with reasonable effort and expense. Such an empirical eval­uation is made possible by the use of the “regulatory ancestry” methodology, which scholars in the medical literature have used to support qualitative criti­cisms of the safety assurance provided by the 510(k) pathway. In a pilot study presented here, this Article demonstrates the feasibility and utility of combining regulatory ancestry study with network visualization and quantitative empirical analysis, and begins to develop a nuanced understanding of how the 510(k) path­way serves the goal of ensuring device safety.

The paper begins in Part I with a detailed description of the statutory and regulatory structure of the 510(k) pathway. Part II provides an account of the criticisms that have been made of the pathway, drawn from a reading of the legal and medical literatures. Part II.A discusses a set of criticisms that are described as general in nature, consisting of arguments that the 510(k) pathway fails to ensure the safety of too many medical devices reaching the U.S. market. Part II.B assembles a set of specific criticisms, referring to claims that specific fea­tures of the statutes, regulations, and FDA’s implementation of the pathway compromise device safety. In addition to presenting both sets of criticisms, these sections show that the empirical evidence supporting the criticisms is limited and that the evidence that does exist suffers from serious methodological flaws. Further, Part II.B reformulates several of the specific criticisms into empirically testable hypotheses. Part II.C presents some of the reforms that have been pro­posed in response to these criticisms.

Part III presents an empirical study of the function of the 510(k) pathway in ensuring device safety in one limited biotechnology field: devices that are intended for use in removing blood clots (“thrombi”) from the blood vessels in the brains of patients experiencing acute strokes. This study introduces the regulatory ancestry methodology, which was developed in the medical literature to support qualitative criticisms of the safety of specific 510(k) devices. Ex­tending this methodology through the use of quantitative statistical analysis and network visualization, the study answers the questions framed in Part II. The study demonstrates that empirical evaluation of the 510(k) pathway is possible and that these methodologies can be used to test the hypotheses generated in Section II.B.

Part IV interprets the study, beginning with an analysis that confirms the reliability and utility of the information yielded by the study. The discussion then turns to the study findings, which are consistent with the general criticism of the 510(k) pathway regarding safety: more than 10% of devices in the cohort exhibited the study endpoint for being unsafe, a number that far exceeds esti­mates of the percentage of high-risk devices that are unsafe. The findings also suggest that only some of the specific criticisms have a solid empirical basis, while providing no support for many other specific criticisms. Because of the small sample size, the findings are treated as hypothesis generating. But in view of the serious nature of the criticisms and the disruptive potential of many pro­posed reforms of the 510(k) pathway, this study demonstrates that the Institute of Medicine’s pessimistic conclusions about the role of empirical studies can no longer be sustained. The study provides a justification and a roadmap for a large-scale empirical study.

Before proceeding, two important caveats are in order. First, the scope of the empirical study is small, and thus the findings presented here are not gener­alizable to the broader universe of 510(k) devices. Drawing conclusions about how well the 510(k) pathway functions to ensure device safety and about which specific statutory, regulatory, or implementation features of the pathway func­tion to compromise device safety must await a large-scale study. The primary goal here is to demonstrate that such a study is feasible. Second, the question of whether the 510(k) pathway should be reformed or even abandoned cannot be answered based on safety data alone. The answer to that question also requires data on how the 510(k) pathway functions to permit or even to facilitate inno­vation, as well as a normative framework for weighing safety, effectiveness, innovation, and other policy goals. Each of these are sufficiently broad topics that they deserve their own full-length treatment. Thus, providing an empirically driven set of recommendations concerning the 510(k) pathway is premature at this point. This, and other works in progress, are designed to provide the ground­work for such an undertaking.

I. The Statutory and Regulatory Structure of the 510(K) Pathway

The Medical Device Amendments created the modern federal regime for regulating medical devices, vesting this authority in the Food and Drug Admin­istration. Ensuring that devices are safe and effective were two important pur­poses of the Act. In the brief preamble, Congress stated that the MDA’s purpose was “to provide for the safety and effectiveness of medical devices intended for human use.” One of the Act’s sponsors likewise described Congress’s purpose as giving the FDA “the necessary authority to require that medical devices be proven safe and effective before they reach the American consumer.” And the Senate Committee on Labor and Public Welfare, reporting the MDA out of com­mittee, justified the need for federal regulation on the need to ensure safety in the wake of several widely publicized public health fiascos arising from medical devices, including the Dalkon Shield IUD, artificial heart valves, and permanent pacemakers.

But ensuring safety and effectiveness were not the only purposes for which Congress enacted the MDA. Congress also intended that federal regulation, by establishing uniform nationwide standards, would protect and even facilitate the innovation of and timely access to new technologies. Congress’s concern with balancing safety and innovation is evident in the context and structure of the MDA. In the years leading up to the Act, a quarter of the states had implemented their own premarket approval systems for medical devices, threatening to cre­ate a hodgepodge of regulatory requirements that would stifle innovation. By creating a single, nationwide regulatory framework, Congress sought to avoid this harm and to strike a balance between the objectives of ensuring safety, ef­fectiveness, innovation, and timely access. As the Institute of Medicine noted in 2011,

[t]he regulatory process can facilitate innovation that improves public health by making safe and effective . . . medical devices available to consumers in a timely manner. The FDA’s role in facilitating innovation . . . should be to create a regulatory framework that . . . ensur[es] that marketed medical devices will be safe and effective . . . [and] permit[s] timely entry of new devices that may offer improvements over already marketed devices.

Congress’s concern with balancing safety and innovation is also evident in the Act’s three-tiered regulatory scheme for devices, which is based on an ex-ante determination of risk. The lowest risk (Class I) devices, such as elastic bandages and crutches are those for which compliance with general controls applicable to all devices is sufficient to provide a reasonable assurance of safety and effectiveness. Intermediate risk (Class II) devices, such as x-ray machines and some hearing aids, are those for which compliance with both the general controls and with certain “special controls” was deemed necessary to provide an assurance of safety. The highest-risk (Class III) devices are those for which intense regulatory scrutiny would be necessary to provide an assurance of safety.

Devices are subjected to one of three levels of regulatory scrutiny, which are now largely congruent with the three risk tiers. Manufacturers of so-called “exempted” devices (most Class I and some Class II devices) encounter minimal regulatory burdens: no premarket approval or clearance is necessary. Class III devices are subjected to a relatively high regulatory burden: these devices may only be marketed after the successful submission of a lengthy and detailed PMA application, which requires extensive information about the device and in many cases at least one large pivotal clinical trial. Intermediate risk (Class II) and a small number of nonexempted Class I devices are subject to an intermediate level of regulatory scrutiny. For these devices, manufacturers must submit a 510(k) notification at least 90 days before they are marketed and must demon­strate that their new device is substantially equivalent, that is, has the same in­tended uses and similar technological features to another device of the same risk classification that is already legally marketed in the United States. This tiered framework is intended to balance the goals of assuring device safety and effec­tiveness with the goals of not disincentivizing innovation or unduly delaying patient access to innovative new devices.

When the MDA took effect on May 28, 1976, about 40,000 medical devices were already being sold on the U.S. market. For these “pre-amendment de­vices,” the Act created a mechanism through which the FDA was to assign each to one of the three risk classes. The Agency classified the majority of pre-amendment device types as Class I or Class II devices. In this process, indi­vidualized risk-benefit analyses were not performed; rather, evidence for all the devices in each of the generic categories was applied to all devices in that category. These devices were “grandfathered” onto the market without a for­mal assessment of their safety, their manufacturers subjected only to the FDA’s post-market authorities.

Devices that were not on the market as of May 28, 1976, (“post-amend­ment” devices), are presumptively classified as Class III devices. Thus, all post-amendment devices would have been subjected to the heavy regulatory burden of the PMA process. To mitigate this burden, the 510(k) provision provided that if the manufacturer of a post-amendment device could demonstrate that the new (“subject”) device is “substantially equivalent” to an already-marketed (“predi­cate”) device, the subject device would be assigned to the same risk class as the predicate. If the manufacturer of a post-amendment device can demonstrate that its new device is substantially equivalent to a Class II predicate device, the new device will be classified and regulated as a Class II device. Further, the FDA frequently issues findings of substantial equivalence where the manufac­turer has cited more than one device as the predicates for a subject device that combines specific features of its predicates in a new and untested way. Demonstrating substantial equivalence to an already-marketed Class II device quickly became, and remains, the predominant means through which the manu­facturers of intermediate-risk devices obtain clearance to market their devices in the United States.

A manufacturer intending to market a nonexempted device in the United States is required to inform the FDA “at least ninety days before making such introduction or delivery . . . [of] the class in which the device is classified . . . and action taken by such person to comply with requirements” of meeting per­formance standards for Class II devices or having a PMA in effect for Class III devices. The manufacturers of all medical devices, regardless of their risk clas­sification, must comply with general controls such as registering annually with the FDA, ensuring the device labeling comports with FDA regulations, and fol­lowing published good manufacturing practices. Manufacturers of Class II de­vices are also required to comply with any special controls such as performance standards that the FDA has promulgated for each type of device. However, the MDA does not explicitly require the FDA to establish performance standards for all Class II devices.

Under the original MDA, only devices that had been marketed before the MDA’s effective date (pre-amendment devices) could be cited as predicates in 510(k) submissions. The Safe Medical Devices Act of 1990 (SMDA) ex­panded the definition of legally acceptable predicate devices to include post-amendment devices, which had reached the market after the MDA took effect. As a result, a manufacturer of a Class II device can obtain clearance by citing to a very old, pre-amendment device (which is now a rare occurrence) or by demonstrating substantial equivalence to a more recent post-amendment device that was cleared based on substantial equivalence to an earlier device. This latter practice, which is by far the most common and which results in long chains of devices in subject-predicate relationships, is frequently referred to as “piggy­backing.”

The devices in these chains are not identical. Although a subject device must be substantially equivalent to its predicate, the original MDA did not de­fine the term substantial equivalence. Congress added a definition in the SMDA, establishing that “compared to a predicate device . . . the device has the same intended use” and that the subject device

  • (i) has the same technological characteristics as the predicate device, or
  • (ii)(I) has different technological characteristics and the information submitted that the device is substantially equivalent to the predicate device . . . demon­strates that the device is as safe and effective as a legally marketed device, and (II) does not raise different questions of safety and effectiveness than the pred­icate device.

The statute defines “different technological characteristics” as “a signifi­cant change in the materials, design, energy source, or other features of the de­vice.”

The 510(k) pathway applies to modifications of existing devices as well as new devices. A manufacturer making a significant change to the design, com­ponents, method of manufacture, or intended use of a device already marketed under a 510(k) clearance must submit a new 510(k) for the modified device. The regulatory definitions of a significant change are

  • (i) A change or modification in the device that could significantly affect the safety or effectiveness of the device . . . .
  • (ii) A major change or modification in the intended use of the device.

The FDA has only limited authority to require clinical trial data for 510(k) submissions. The Agency may only require clinical trials to determine if the intended use the manufacturer claims is the same as the predicate device, where the subject device has different indications for use, or where the subject device has different technological characteristics from the predicate device. The data must be necessary to establish that the subject device is as safe and effective as the predicate, and the Agency’s request must satisfy the requirements imposed by the “least burdensome principle.” Given these limitations, the FDA and Government Accountability Office reported that in the first decade of the twenty-first century only 10–15% of 510(k) submissions included clinical trial data. For devices other than in vitro diagnostic tests, only 8% of 510(k) sub­missions included clinical trial data.

These statutory provisions created one of the most important features of the 510(k) pathway: the entry of devices onto the U.S. market with technological features that have never been evaluated for safety and effectiveness. These pro­visions permit a new device (“D1”) to cite an older device (“D0”) as its predicate, even though the older device may never have been evaluated for safety. Later, a newer device (“D2”) may cite D1 as its predicate, and later still D3 may cite D2, and so on, in an endless process of iterative change that has become known as “predicate creep.”

The FDA has also established two alternative, less burdensome 510(k) pathways through which certain devices may reach the market. Under the Spe­cial 510(k) Program, manufacturers introducing certain “well-defined” modifi­cations to their own devices may satisfy the substantial equivalence requirement by complying with established design control procedures. As originally im­plemented, only changes that did not alter a device’s intended use or its funda­mental technology were eligible for the Special pathway. But in a 2019 guidance the FDA announced it would instead determine eligibility by examining whether the changes were “well-established” and whether the results of the modification could be sufficiently evaluated on the basis of a summary or risk analysis sub­mitted to the Agency. The FDA has a stated goal of processing Special 510(k) submissions within 30 days, compared with the goal of 90 days for Traditional 510(k) submissions.

Under the Abbreviated 510(k) Program, devices may be cleared based on compliance with FDA guidance documents, with device type-specific special controls, or with voluntary industry consensus standards. The FDA stated its belief that the Abbreviated (510) pathway would increase the predictability of premarket review and regulators’ expectations, and would facilitate entry of de­vices onto the market. The Agency’s target for review time is the same as with Traditional 510(k) clearances, 90 days.

In a Final Guidance released in 2019, the FDA expanded the Abbreviated Program through the creation of the “Safety and Performance Based Path­way.” Citing its obligations under the least burdensome principle, the guid­ance noted that “in some cases, it may be more burdensome for a submitter to conduct testing against an appropriate predicate device to demonstrate equiva­lence . . . than to demonstrate their device meets appropriate performance crite­ria established by FDA.” To date, the Agency has issued guidance documents covering the performance criteria and testing methodologies for nine device types.

Some new devices will deviate from already-marketed devices to a degree sufficient to preclude the use of those already marketed devices as predicates. Under the original regime created by the MDA, such devices would have been legally assigned to Class III risk statues and the manufacturers of such new de­vices would have been required to submit a full PMA application. Clearly, though, some truly new devices might be low or intermediate risk. For such devices, compliance with general controls, or with general and special controls, would be sufficient to provide a reasonable assurance of safety. In such cases, manufacturers of new post-amendment devices may avoid the need for an ex­tensive PMA submission by using the “De Novo” pathways. Under the mech­anism established in the original MDA, the manufacturer of a new device that received a “not substantially equivalent” determination by the FDA was permit­ted to petition the Agency to reclassify the device to Class I or Class II. In 2012, the FDA Safety and Innovation Act added a mechanism through which a manufacturer, who determines before a 510(k) submission that no substantially equivalent predicate exists, may petition the Agency to classify the device as a Class I or Class II device. These devices, once assigned a Class I or II risk classification, can serve as the predicate for later 510(k) submissions.

FDA regulations for De Novo requests require manufacturers to submit de­tailed information about the device and its regulatory history, proposed special controls, and nonclinical and, in most cases, clinical trial data. In contrast to 510(k) submissions, which are most often cleared without clinical evidence of safety and effectiveness, De Novo submissions are typically supported by one or more pivotal clinical studies. The De Novo pathways were used infre­quently in the decade after they were first established in the FDA Modernization Act of 1997, with the Agency granting an average of 4.5 approvals per year. Over the past decade manufacturers used these pathways more frequently, with an average of 25.6 De Novo approvals per year. The De Novo pathways, how­ever, continue to account for a small proportion of devices reaching the U.S. market relative to 510(k) devices: according to the FDA’s 510(k) database, the Agency has cleared an average of 3026 devices per year through the 510(k) pathway over the past decade. The relatively infrequent of use of the De Novo pathways may be due in part to the relatively burdensome requirements, which some claim approximate the burdens imposed by the PMA process.

The FDA does not have express statutory authority to simply rescind 510(k) clearances, even for devices that have been shown to be dangerous. In the past, the Agency has claimed it possesses the inherent authority to engage in a timely reconsideration of its decisions. However, in Ivy Sports Med., LLC v. Burwell, the D.C. Circuit rejected this claim, holding instead that a provision in the MDA that authorized the Agency to use a rulemaking process to reclassify devices in response to newly acquired information provided the only mechanism available to the FDA. In recent years, the Agency has used this mechanism to reclassify 1,477 devices such as pelvic mesh from Class II to Class III, and to require manufacturers to obtain PMA approval or to remove their devices from the mar­ket.

 

Critics have argued that this statutory structure and the associated regula­tions and FDA implementation practices result in compromised device safety. And critics have argued that many of the specific features of the 510(k) pathway reviewed here are root causes of this compromised safety. Part II explores these criticisms and the reform proposals they have spawned in detail.

II. Criticisms of the 510(K) Pathway: Literature Review and Hypothesis Formulation

Judging how well the 510(k) pathway strikes the balance between safety and innovation requires a robust understanding of how well the pathway serves each of these goals. Clinicians and academic physicians, legal scholars, courts, public safety advocates, the medical device industry, and even the FDA itself have developed a robust set of criticisms of the FDA’s record of and ability to ensure the safety of 510(k) devices. The objective of this Part is to catalogue these criticisms of the pathway’s function of ensuring safety and the suggestions for reform, drawing mainly from the medical and legal litera­tures, and to highlight the limitations of the empirical evidence that supports those criticisms and proposals.

The criticisms presented here can be sorted into two categories. One cate­gory encompasses a set of closely related general criticisms, that the 510(k) pathway fails to ensure that devices entering the U.S. market are safe. These criticisms are solely outcome focused, in that they do not attempt to discern the specific features of the 510(k) pathway that allow unsafe devices to reach the market. The other category encompasses a diverse set of criticisms that do iden­tify specific aspects of the statutes, regulations, and implementation of the 510(k) pathway that critics claim is at least partially responsible for the failure of the pathway to ensure device safety. These categories are presented in Parts II.A and II.B, respectively. The discussion in these sections highlights the lim­ited body of empirical evidence on which the critics have drawn. And setting the stage for the study presented in Part III, Section II.B also seeks to reformu­late as many of the specific criticisms as possible into empirically testable hy­potheses. Section II.C describes some of the proposed reforms to the 510(k) pathway.

A. General Criticisms of the 510(k) Pathway

At the highest level of generality, safety-oriented critics claim that the 510(k) pathway allows an undesirably large number of unsafe medical devices onto the U.S. market, resulting in widespread and severe harm to large numbers of people. These criticisms draw on a variety of methodologic approaches. Some are based on deductions drawn from the MDA’s statutory structure. For exam­ple, the Institute of Medicine’s 2011 report examined the statutory language of the MDA, finding that the Act contained no provision for the evaluation of the safety and effectiveness of Class I and Class II pre-amendment devices and yet permitted these devices to serve as predicates for new, post-amendment devices to reach the market. Regarding post amendment devices, the report concluded that “[t]he 510(k) clearance process was not designed in 1976 to evaluate the safety and effectiveness of new medical devices.” Thus, neither the pre-amendment nor post-amendment devices would ever be subjected to an individ­ualized assessment of safety unless a post-market problem arose, and all of these devices could serve as predicates for future devices. Combined with the fact that in most years the FDA clears more than 95% of 510(k) devices submitted for review, the pathway, according to many, is simply not designed to ensure safety.

Although this line of criticism is troubling, the recognition that the 510(k) process was not designed to ensure safety is distinct from finding that the 510(k) process fails in practice to ensure safety. The Institute of Medicine recognized the limitations in this deductive reasoning, noting that “[a]lthough the safety and effectiveness of individual preamendment Class II devices have not been sys­tematically reviewed, their continued use in clinical practice provides at least a level of confidence in their safety and effectiveness.”

Other criticisms have been based on compelling but isolated anecdotes. In an example from the medical literature, one study argued that the 510(k) path­way failed to ensure device safety based on an analysis of the DePuy ASR XL Acetabular Cup System hip prosthesis, which was recalled worldwide in 2010 because of an extremely high revision rate. In an example from the legal liter­ature, the dangers associated with a family of twenty-five power morcellators were used to support the claim that the 510(k) program failed to ensure device safety. Other critics have based their claims that the 510(k) pathway fails to ensure safety on the failures of and injuries caused by a single device type or by devices in a single specialty area, including otolaryngology devices, surgical mesh devices, knee joint replacements, and orthopedic foot and ankle de­vices. However, none of these criticisms have attempted to quantify the pro­portion of all 510(k) devices that are unsafe.

Other critics have attempted to ground their claims that the 510(k) pathway fails to adequately ensure device safety on a broader base of quantitative evi­dence. Dr. Diana M. Zuckerman and coauthors examined all 113 Class I re­calls the FDA issued between January 2005 and December 2009. Based on a finding that 71 percent of the recalls were for devices marketed through 510(k) pathway, these authors concluded that the 510(k) pathway was failing to ensure device safety. Professor Frank Griffin described the 510(k) pathway’s func­tion as “checkered at best,” drawing on a study of medical devices used by sev­eral surgical specialties that found 510(k) devices to be 11.5 times more likely to be recalled than PMA devices. Dr. William Maisel reported on the set of medical devices cleared through the 510(k) pathway between January 1, 1996, and December 31, 2009, using a data set provided by the FDA. Maisel found that over each of the three years following a 510(k) clearance, 1.6% to 1.9% of devices were subjected to an FDA recall. By six years post-clearance, 8.5% of cleared devices had been subjected to a recall. Although all of these em­pirical studies have significant methodologic limitations, the combined weight of the anecdotal and empirical criticisms is sufficient to raise a general concern over the safety function of the 510(k) pathway.

The most thorough empirical evaluation of the 510(k) pathway’s safety function to date was reported by Dr. Jonathan Dubin and colleagues in a 2021 JAMA article. These authors reported on a cohort of over 28,000 devices that were 510(k) cleared between 2008 and 2017. A total of 10.7% of these devices were recalled by the FDA. However, when the authors focused only on Class I recalls the rate was only 0.8%. The authors also examined high-risk devices approved through the PMA pathway, finding that PMA devices were 7.3 times more likely to be subjected to a Class I recall. In spite of the lower frequency of recall for 510(k) devices, the authors concluded that these devices are “a sig­nificant source of safety concern” because there are so many more 510(k) de­vices than PMA devices. Other investigators, who have focused on single technology spaces such as knee arthroplasty devices, have found that a higher percentage of devices cleared through the 510(k) pathway are recalled compared with devices approved through the PMA pathway.

Unfortunately, most of these studies used flawed methodologies. The study by Zuckerman and colleagues, while showing that 71 percent of all Class I re­calls were for devices marketed through 510(k) pathway, did not include the denominator of the number of 510(k) devices that were at risk of failure during the study period. As a result, Zuckerman’s findings do not provide an estimate of the proportion of 510(k) devices that contain flaws that endanger patients. Maisel’s study partially overcame this limitation by calculating a proportional risk of device failure. But this study was limited in two important ways. First, Maisel’s study used the occurrence of any FDA recall reported between January 1, 2003, and December 31, 2009, as the marker for the failure of the 510(k) pathway to ensure device safety. This methodology is overinclusive, in that most Class III and some Class II recalls are for trivial issues or relatively minor prob­lems that are confined to a small number of devices. As a result, the numerator of Maisel’s risk calculations is inflated, biasing the findings toward an overes­timation of the risks of device failures.

Second, Maisel calculated the proportional risk of device failure by using the total number of 510(k) clearances as the denominator in the analysis. Treat­ing each 510(k)-cleared device as a unique device ignores the limited amount of technological change that typically occurs with each successive modification. Indeed, a new subject device could be identical to its predicate, as where a man­ufacturer submits a new 510(k) simply because it plans to market an already-cleared device under a new name, or where one manufacturer seeks to market a device identical to an already-cleared device by another manufacturer. Fur­ther, many 510(k) devices may be versions of already-cleared devices with mod­ifications that are slight enough to question whether it makes sense to consider the subject and predicate to be different devices. Many critics have maintained that the 510(k) incentivizes manufacturers to make trivial changes solely for the sake of differentiating their devices from those of their competitors.

The problem with counting each 510(k) as a unique device can be made clear using the following hypothetical:

A manufacturer obtains 510(k) clearance for a multicomponent device, X0. Subsequently the manufacturer makes a significant change to the composition of one of the components and obtains 510(k) clearance for the modified device, X1, citing X0 as the predicate. The manufacturer subsequently makes three very minor modifications (which could even be minor changes to the labeling or packaging) for which it obtains 510(k) clearances for each (X2, X3, X4, each citing the previous cleared device as its predicate). Finally, the manufacturer makes another significant change, obtaining a 510(k) clearance for device X5.

Adopting Maisel’s approach and counting each 510(k) clearance as a unique device would yield a denominator of six. If a marker of failure (an FDA recall) occurs for device X5, the failure rate in this set of devices is calculated as 1/6, or 16.7% of all cleared devices. But from another perspective, devices X1, X2, X3, and X4 are not three separate devices; rather, they are so technologically similar that they should be considered one device. Thus, there were only three devices relevant to the analysis: the original device X0, the nearly identical de­vices X1, X2, X3, and X4, and the final device in the series, X5. The risk of failure in this analysis is 33.3 percent. As a result, counting all 510(k)-cleared devices as unique devices artificially inflates the denominator, biasing the calculated risk of device failure toward lower values. The combined effect of these two opposing biases is impossible to determine, limiting the reliability of Maisel’s calculations.

Even the criticisms of the 510(k) pathway in the Institute of Medicine’s influential 2011 report rest on limited empirical support. The committee gath­ered information through one public workshop on the legislative history of the 510(k)-clearance process and its then current structure, the structure of the med­ical-device industry and how it had been affected by domestic regulation, the regulation of medical devices globally, and consumer concerns. A second public workshop addressed post-marketing surveillance, adverse event report­ing, and several other topics of interest to the committee, such as risks associated with software in medical devices. The committee conducted extensive searches of the medical, scientific, and legal literature, reviewed FDA dockets containing Agency reviews of the 510(k) process, and reviewed other govern­ment reports, such as reports from the Government Accountability Office and the Department of Health and Human Services. The committee also contacted experts in the medical-device field. However, none of these sources appear to have provided a systematic, quantitative analysis of 510(k) devices, leaving Maisel’s study to supply the main body of empirical data.

This body of literature is difficult to synthesize in a coherent fashion. Clearly, some isolated devices and device types that were cleared through the 510(k) pathway have been unsafe. And some broader empirical studies, includ­ing Maisel’s estimate that 8.5% of 510(k) devices are unsafe, would suggest to many that a relatively high percentage of these devices are unsafe. But other studies have found that only 0.5% to 0.8% of 510(k) devices are unsafe, sug­gesting the opposite.

B. Criticisms of Specific Features of the 510(k) Pathway

Many authors claim to have identified specific attributes of the 510(k) path­way that permit unsafe devices to reach the market. The Institute of Medicine, as noted above, put forward one of the most fundamental of these criticisms, that the pathway is not legally structured to ensure safety. Others have criti­cized the MDA’s standard for devices in general and for 510(k) devices in par­ticular. Jonas Hines and colleagues at Public Citizen focused on the different statutory standards for drug and device evaluations: “Before a new drug can be marketed, the sponsor must show ‘substantial evidence [of effectiveness],’ whereas the sponsor of a new device need only demonstrate a ‘reasonable as­surance of . . . safety and effectiveness.’” Zachary Shapiro and coauthors like­wise focused their criticism specifically on the “weak standards” that govern the 510(k) process. And in a line that has frequently been repeated by courts and commentators, the U.S. Supreme Court characterized the 510(k) process as fo­cused not on safety, but rather on the equivalence of subject and predicate de­vices.

Commentators have also criticized the FDA’s implementation of the 510(k) framework, with some claiming that the Agency frequently adopts a “lenient interpretation” of the term same intended use.” For the FDA to find substan­tial equivalence, the MDA requires that the subject device must have the same intended uses as its predicate. The FDA permits manufacturers to change the indications for use so long as the intended uses remain the same. Unfortu­nately, the line dividing changes to the indications for use that remain within the original intended uses of the predicate device from those that represent a change to the intended use is often difficult to draw. Hines and coauthors cited the ex­ample of the ReGen Menaflex Collagen Scaffold, which was cleared based on substantial equivalence with legally marketed surgical meshes, which are used in a wide range of abdominal and pelvic procedures. The ReGen device, though, was indicated for replacement of weight-bearing cartilage in the knee. The authors concluded that the FDA used the indistinct boundary to clear what they described as a “novel device,” criticizing what they describe as the Agency’s permissive stance toward expanding the indications of already-cleared devices to new uses.

Critics have also argued that the FDA permits unreasonably large techno­logical changes in single 510(k) clearances. Professor Jordan Paradise pointed out that in 2012 most medical devices that used nanotechnology had reached the market through 510(k) clearance, even though those devices “exhibit[ed] new features, properties, and characteristics . . . raising questions about whether the FDA has appropriately allowed them clearance under the 510(k) process.”

Hines and coauthors cited the example of a transcranial magnetic stimula­tion device, for which the manufacturer sought 510(k) clearance based on a claim of substantial equivalence to electroconvulsive therapy devices. They concluded that the 510(k) pathway’s inclusion of devices incorporating such large technological differences leads to “devices acting as predicates for mark­edly dissimilar devices,” in essence allowing changes in technology that are too large for the predicate to provide an assurance of safety for the subject de­vice.

These are important criticisms, to which this Article will return later. How­ever, they are also criticisms that are not amenable to empirical testing. Based on a systematic reading of the medical and legal literatures, this section presents many of the specific criticisms of the 510(k) pathway that may be empirically testable as well as a discussion of the empirical evidence (if any) on which those criticisms rely. Each criticism is also formulated as an empirically testable null hypothesis, in anticipation of the study presented in Part III.

1. Limited Clinical Trial Evidence Demonstrating Device Safety

A key tenet of the modern drug and device regulatory regimes is that pre­market clinical trials—in particular, statistically robust, double-blinded, ran­domized clinical trials with preset endpoints—are of central importance in establishing the safety and effectiveness of new medical products. But as dis­cussed above, the FDA has only limited authority to require clinical trial data as a condition for clearing 510(k) submissions. And as has been frequently ob­served, the FDA has been hesitant to exercise the authority it does possess. As a result, fewer than 15% of 510(k) submissions contain clinical trial data. Excluding in vitro diagnostic tests, only 8% of submissions for 510(k) clearance contain clinical trial data. According to the Institute of Medicine, “There is no consistent approach for how the FDA determines the need for clinical data, the type of such data, and the manner in which such data, if available, are inte­grated into the decision-making process.”

Authors in the medical literature have frequently claimed that this lack of clinical trial data in 510(k) clearances compromises device safety. In a broad critique of all device approval pathways, Hines and coauthors identified eight general weaknesses, one of which was the infrequent requirement of clinical trial data. Brent Ardaugh and colleagues, in a 2013 Perspective published in the New England Journal of Medicine, pointed out the lack of any clinical stud­ies demonstrating the safety of a failed metal-on-metal hip prosthesis and ninety-five of its predicate devices which had received 510(k) clearances over a span of five decades. They concluded that requiring clinical studies could have prevented thousands of injuries from a technology for which safety had never been proven. And drawing on the results of an empirical study of med­ical device recalls over the five year period spanning 2005 and 2009, Zuckerman and colleagues criticized the lack of clinical trial requirements for most 510(k) clearances: “Clinical trials and other more rigorous premarket data collection required in the PMA process but not the 510(k) process could uncover design flaws or manufacturing flaws before a device is sold.”

Legal scholars have also claimed that the infrequent requirement of clinical trials in the 510(k) context compromises device safety. Examining the danger of the intraoperative spread of deadly cancer cells by power morcellators, Jenya Godina stated that the 510(k) process “was simply not designed to include the kind of rigorous, data-driven study that could have unveiled the risks of morcel­lators at the clearance stage.” And Professor Frank Griffin highlighted the paucity of clinical trial evidence in the 510(k) submissions of several implanta­ble orthopedic devices. Professor I. Glenn Cohen, along with physician coau­thors, traced the problem of insufficient clinical trial requirements to “the shortcomings of the 510(k) pathway and its downstream consequences [which] are attributable to congressional legislative action.”

One notable feature of most of these criticisms is that they dealt with only a single type of medical device. Broad quantitative evidence linking the FDA’s limited authority to require clinical trials for 510(k) clearances and the Agency’s hesitancy to use that limited authority to compromised 510(k) device safety is lacking. Yet despite this lack of broad-based empirical evidence, many have called for expanding the FDA’s authority to require clinical trials and for the Agency to use its authority more frequently. At the extreme, some have urged that all 510(k) clearances require clinical evidence of safety.

Advocating a narrower approach, the Institute of Medicine’s report dis­cussed a more limited reform to clinical trial requirements proposed by the FDA: “CDRH [Center for Devices and Radiological Health] proposed develop­ing guidance defining a subset of Class II devices, called ‘Class IIb,’ devices, for which clinical information, manufacturing information, or potentially addi­tional evaluation in the postmarket setting would typically be necessary.”

Thus, a large volume of existing criticism supports expanding the role of premarket clinical testing of 510(k) devices. But there are considerations that potentially undercut these criticisms and weigh against such proposals. It is pos­sible that clinical trial data would add little to the assurance of safety provided by the existing requirements of substantial equivalence and compliance with the general and relevant specific controls. This might be so because the clinical data submitted to the FDA is insufficient to establish safety. It might also be true if, as some have argued, most 510(k) devices are safe, and clinical data would improve the safety of a very small subset of 510(k) devices.

Requiring clinical trial evidence of safety and effectiveness for all 510(k) submissions would represent a dramatic change from the FDA’s current practice and might slow the rate of innovation. And it might paradoxically decrease de­vice safety if the burden of conducting clinical trials for every modification in­centivized manufacturers to refrain from modifying their cleared devices. Thus, calls for broadening the FDA’s authority to require clinical trials and for the Agency to exercise that authority should be informed by robust quantitative ev­idence. Unfortunately, such data are practically nonexistent.

To begin to develop such robust data, the study presented in Part III will test the following null hypothesis:

H1a: Devices that are cleared with clinical trial data are not safer than devices that are cleared without clinical trial data.

Additionally, it can be postulated that if clinical trial data establish the safety of a device, the same data might exert a protective effect on subsequent generations of devices. Because such devices are substantially equivalent to their predicates, which had clinical trial evidence of safety, and because the modifications cannot make too much of a technological leap, the next generation of devices might also be safe. This effect might even carry on for several gen­erations.

To test whether clinical trial data assure 510(k) device safety downstream, it is useful to establish a simple shorthand in which devices are labelled using the term “Gen Sn Device,” in which n represents the number of generations the device is removed from a presumably safe device based on the fact that its clear­ance was supported by clinical trial data. Thus,

Gen S0 Devices are cleared devices whose 510(k) submissions included clini­cal trial evidence.

Gen S1 Devices are cleared devices whose 510(k) submissions did not include clinical trial evidence but which cited a Gen0 Device as a predicate.

Gen S2 Devices are cleared devices whose 510(k) submissions and whose pred­icate(s) did not include clinical trial data but at least one of whose predicate(s) cited a Gen S1 Device as a predicate.

Using this terminology, the possibility that clinical trial data ensure the safety of more than one generation of device can be tested:

H1b: The combined cohort of Gen S0 and Gen S1 devices are not safer than other devices.

H1c: The combined cohort of Gen S0, Gen S1, and Gen S2 devices are not safer than other devices.

2. Potential Downstream Effects of Unsafe Devices—The “Bad Predicate” Effect

Under the statutory structure of the 510(k) pathway, a new device may be cleared for the market based on its substantial equivalence to an already-cleared predicate device. However, the predicate device may not have been—in fact, most likely had not been—evaluated for safety. Likewise, the predicate device’s predicate and that device’s predicate, going back possibly to a pre-amendment device, may never have been evaluated for safety.

This pattern raises a situation converse to that of devices with clinical trial evidence of safety: If the original device was unsafe, might not all the devices having that unsafe device in their predicate ancestries be unsafe? And, if a man­ufacturer introduced a change to an already marketed, safe, device that rendered the new device unsafe, might not all the devices having that unsafe device in their predicate ancestries be unsafe?

Two recent pieces of scholarship focusing on surgical mesh devices mar­keted for pelvic reconstruction surgeries illustrate this criticism. In the medical literature, Jeremey Rosh and coauthors examined a cohort of surgical mesh de­vices marketed for use in pelvic reconstruction surgeries. After noting that pre-amendment devices cited as predicates were likely not evaluated for safety, the authors observed that flawed technology in one cleared device can lead to many unsafe devices being cleared: “Forty years of 510(k) clearances based on substantial equivalence claims has resulted in complex networks of medical de­vice ancestries. These connections reflect the interdependent relationships be­tween marketed devices and indicate how adverse events from one device may cascade to related devices.”

In a recent criticism in the legal scholarship, William Chanes Martinez pre­sented the example of Boston Scientific’s ProtoGen Sling, a surgical mesh de­vice used in female pelvic reconstruction surgeries. Although it was recalled three years after its clearance, the FDA ultimately cleared at least 61 devices that included the ProtoGen in their predicate ancestries. None of the submis­sions were accompanied by clinical trial evidence. Eventually, in the face of overwhelming evidence of harm created by vaginal mesh devices the FDA re­classified them as Class III devices and required their manufacturers to submit PMA applications, including clinical trial data.

The Institute of Medicine noted this issue in its 2011 report:

[A]ny unsafe or ineffective devices are embedded in the system and as both a legal and a practical matter may be used as predicates for new devices until the predicates are removed from the market. It may be difficult for the FDA to remove devices from the market because it has no systematic way to identify them.

The FDA lacks explicit statutory authority to simply rescind a 510(k) clear­ance, and courts have rejected the Agency’s attempts to rely on a theory that the MDA provided “inherent reconsideration authority.” Lacking authority to simply reconsider a 510(k) clearance based on new information, the FDA has relied on its statutory authority under 21 U.S.C. § 360c(e) to reclassify a rela­tively modest number of devices from Class II to Class III, which effectively removes them from use as predicates. This process, however, is unwieldly, as § 360(c) requires the agency to engage in a process that includes publishing any proposed reclassification in the Federal Register, submitting the proposal to the appropriate device classification panel, considering public comments, and pub­lishing a final order explaining the public health benefits and risks of the device and the rationale for why general and special controls fail to provide an adequate assurance of safety. In effect, then, because unsafe devices may serve as the predicates for generations of newer devices and because the FDA possesses only limited authority to remove these devices, a single unsafe device that is cleared may render many later generations of devices that include it in their predicate ancestries unsafe.

However, no systematic evidence of this effect has been reported. This leaves many important questions unanswered. First, does this effect exist? Sec­ond, if the effect does exist, how strong is it? After all, given the complex nature of 510(k) device relationships, including devices that cite (and combine the technological features of) many predicate devices, any dangerous technological feature in one device might quickly be mitigated in subsequent generations. And third, assuming that such an effect exists, how long does it persist? If manufac­turers can improve their devices through iterative changes, it may be that a dan­gerous device that is modified a number of times (or even once) may no longer be dangerous. Indeed, manufacturers may be motived by market forces and the threat of tort liability to make such changes.

A shorthand similar to that developed above is useful to frame the hypoth­eses for testing this criticism:

  • A Gen U0 Device is a cleared device that was unsafe.
  • A Gen U1 Device is a cleared device that cited a Gen U0 device as a predicate.
  • A Gen U2 Device is a cleared device that cited a Gen U1 device as a predicate.

To test for a potential downstream effect of unsafe devices, the following null hypotheses can be tested:

  • H2a: Gen U1 devices are not less safe than other devices.
  • H2b: The cohort of Gen U1 and Gen U2 devices are not less safe than other devices.

3. Short Review Times

The 510(k) pathway was designed in part to provide shorter premarket re­view times than the rigorous PMA pathway. Section 510(k) of the MDA re­quires manufacturers to report to the FDA at least 90 days before introducing a device into commerce. The Act imposed no time limit within which the FDA was required to respond, but if the Agency did not respond to the notification within 90 days the manufacturer was free to market the device. The SMDA al­tered the statutory landscape by requiring the FDA to make a substantial equiv­alence determination before a company may market a device. The FDA maintains the 90-day period as its goal for rendering findings of substantial equivalence or no substantial equivalence, in effect self-imposing what in prac­tice functions as a soft deadline.

Hines and colleagues, drawing on FDA-reported review times for PMA and 510(k) submissions, included shorter review times as evidence that the latter pathway is inadequate to ensure safety. The authors cited an internal Agency memorandum in which the CDRH stated that it “does not attempt to address all of the issues [that] would be answered in a PMA in its review of 510(k)s.” The Institute of Medicine also noted that “FDA staff report that review times did not allow sufficient review of complex issues.”

To test whether short review times compromise device safety, the following null hypothesis can be tested:

H3: Devices with review times in the shortest quartile are not more likely to be unsafe than all other devices.

4. Use of the Special and Abbreviated 510(k) Pathways

As discussed in Part I above, the FDA created two alternative 510(k) path­ways, which are designed to ease the premarket burdens on the manufacturers of certain devices. The Agency has recently expanded the Abbreviated pathway through its Safety and Performance Based Pathway program.

Maisel found that devices that had been cleared through the Special 510(k) pathway were overrepresented in the set of recalled devices: 34.2% of devices recalled between 2003 to 2009 had been cleared through the Special 510(k) pathway, while 22.3% of devices that were not recalled had been cleared through the Special 510(k) pathway, p < .0001. Too few devices had been cleared through the Abbreviated pathway for a reliable analysis. The Institute of Medicine noted that Maisel interpreted these findings as evidence of “a signal that may warrant further investigation as to whether there is something about the special 510(k) process that increases risk.”

To test whether the use of the Special and Abbreviated 510(k) pathways compromise device safety, the following null hypothesis can be tested:

H4: Devices cleared through the Special and Abbreviated pathways are not less safe than devices cleared through the Traditional 510(k) pathway.

5. Predicate Age

Clinicians and the FDA itself have raised concerns about the age of predi­cates that manufacturers have used in 510(k) submissions. In Maisel’s study, devices whose youngest predicate had been cleared within the preceding five years had “a slightly higher recall rate.” Although no information about the magnitude and the statistical significance of this finding was provided, the find­ing is consistent with evidence that recalls of 510(k) devices occur more fre­quently in the first three years after a device is first cleared. That is, because flawed devices tend to manifest their failures early in their life cycles, their use as predicates very early in their life cycles might occur before the failures are recognized.

Others, by contrast, have expressed concerns about the use of older devices as predicates. A recent law review article advocated a ban on the use of predi­cates that are more than ten years old. The FDA has recently expressed con­cern about the use of older devices as predicates. In late 2018, then Commissioner Scott Gottlieb announced that the FDA would seek public com­ment on a proposal to forbid the use of predicates more than 10 years old. Although Gottlieb’s statement claimed that devices cleared in the distant past were safe, one obvious concern is that older technology may be less safe than newer technology. Ultimately, the Agency did not adopt a formal ban. But Gottlieb’s comments highlighted concerns over the use of old technology as a starting point for new 510(k) devices.

To test the effect of the age of device predicates, that following hypotheses can be tested:

H5a: Devices for which the interval between the clearance date of the predi­cate (or whose youngest predicate where more than one predicate is cited) and the clearance of the subject fall within the shortest quartile are not less safe than other devices.

H5b: Devices for which the interval between the clearance date of the predi­cate (or whose oldest predicate where more than one predicate is cited) and the subject fall within the longest quartile are not less safe than other devices.

6. Repeated Modification Without Evidence of Safety: Predicate Creep

Most of the innovation of FDA-regulated medical devices occurs through a process of small, iterative changes that are made to existing devices. We typi­cally think of this process as salutary—through constant modification, manu­facturers continuously improve their products, providing safer and more effective devices. But if the rate of technological change outstrips the ability of the regulatory system to ensure safety, the accumulation of small iterative changes may lead to a product that is technologically remote from its precursor and that poses unacceptable risks to patients.

The Safe Medical Devices Act of 1990, which expanded the legally ac­ceptable predicate devices to include post-amendment devices, resulted in a 510(k) regime in which a new device can cite an already-cleared device, after which a later, newer device can cite the new device, and so on. By allowing an unlimited sequence of iterative changes to a device while also permitting “significant change[s] in the materials, design, energy source, or other features of the device from those of the predicate device,” some 510(k)-cleared de­vices will diverge substantially from earlier devices. Under the 510(k) frame­work, a new device (“D1”) may be approved despite having technological differences from its predicate device (“D0”). Predicate creep arises because of iterative 510(k) approvals: a newer device (“D2”) can be approved based on substantial equivalence to D1 in spite of technological differences between D1 and D2. Through dozens of iterations, device D25 or D50 may incorporate tech­nology that is radically different from the original predicate, D0. Even if D0 had been subjected to a thorough safety evaluation, which is frequently not the case for 510(k) devices, D25 or D50 have not.

Standard casebooks on FDA law have described the occurrence of predicate creep in 510(k) devices, noting that iterative modification, “especially if carried through several generations, may lead to the marketing of new devices that bear little resemblance to any pre-amendment products.” Many commentators in the legal literature have discussed the possible dangers associated with predicate creep. Focusing on the problems and costs associated with certain artificial hip prostheses, Professor Frank Griffin criticized the statutory structure that per­mits the process of iterative change: “The cumulative design changes associated with predicate creep can lead to devices with little resemblance to the original predicate in a long ‘predicate chain,’ which means the approved device is likely only as safe and effective as the weakest link in the chain.”

Commentators in the medical literature have also focused on the possible dangers of predicate creep. Doctor Joseph Ross and colleagues argued that one causal factor that led to the recall of a thrombectomy catheter was the iterative process of change (which they termed “device creep”). Doctors Eli Adashi and Katina M. Robison, with coauthor I. Glenn Cohen, wrote in a recent JAMA article focusing on power morcellators that while some “reference predicates may have been cleared decades ago. . . . others are quite dissimilar to the device under review by dint of ‘predicate creep.’”

In its 2011 report, the Institute of Medicine elaborated on the safety risk that the iterative process creates: “Prior 510(k) clearances are legally binding on the FDA when making 510(k)-clearance decisions. Thus, any unsafe or ineffec­tive devices are embedded in the system and as both a legal and a practical mat­ter may be used as predicates for new devices until the predicates are removed from the market.”

Despite the amount of criticism over the risks created by predicate creep in the 510(k) context, there is no quantitative evidence linking predicate creep to harm. Rather, these criticisms have been based almost entirely on deductive rea­soning and on examples drawn from single device types.

To test whether predicate creep compromises 510(k) device safety, it is helpful to use the terminology established in Section II.B.1 above. Using this terminology, the following hypothesis can be tested:

H6: There is no difference in the safety of Gen S0, Gen S1, and Gen S2 devices.

7. Multiple/Split Predicates and Reference Devices

Another criticism of the 510(k) pathway’s safety function has focused on the use of “multiple” or “split” predicates. The term multiple predicates refers to a 510(k) submission in which a manufacturer of a complex device cites two or more already-cleared devices as predicates. Each predicate is used to estab­lish substantial equivalence to a different technological feature that was incor­porated into the subject device. However, no predicate is cited (and likely none exists) in which all of the technological features of the subject device have been combined.

As I have argued elsewhere, combining different technologies may create risks that none of the individual technologies create, making these risks difficult if not impossible to foresee. In their 2013 New England Journal of Medicine article, Ardaugh and colleagues examined the 510(k) “ancestry” of an artificial hip prosthesis, the DePuy ASR XL Acetabular Cup System, which was recalled worldwide in 2010 because of the extremely large number of patients harmed by the breakdown of the device’s components. The 510(k) for the ASR hip prosthesis cited no predicate in which all of the different components had been combined into one device. The authors identified the manufacturer’s use of multiple predicates to combine technological features of six different devices into the new device as a root cause of the ASR’s compromised safety.

The term split predicates refers to 510(k) submissions in which a manufac­turer cites one (or more) already-cleared devices to establish substantial equiv­alence of the technological features and one (or more) other devices to establish substantial equivalence of the intended uses. In a 2014 guidance, the FDA re­sponded to criticisms over its allowance of split predicates, recognizing that “the use of a ‘split predicate’ is inconsistent with the 510(k) regulatory standard.” The guidance stated the Agency’s intention to no longer accept 510(k) submis­sions with split predicates.

However, the dangers created by allowing manufacturers to use multiple predicates was not addressed by the FDA’s disavowal of split predicates. The 2014 guidance permits manufacturers to continue to cite multiple predicates in certain circumstances: “when combining features from two or more predicate devices with the same intended use into a single new device, when seeking to market a device with more than one intended use, or when seeking more than one indication for use under the same intended use.”

Further, the 2014 guidance document permits manufacturers to cite addi­tional devices as “reference devices.” The role of reference devices is pur­portedly limited to a role in “support[ing] scientific methodology or standard reference values.” The guidance states that a reference device can only be cited after the manufacturer has demonstrated that the new device has the same intended uses as the predicate device and either has the same technological char­acteristics or different characteristics which do not raise new questions of safety and effectiveness. But the line between using a device to establish substantial equivalence and to support scientific methodology or reference values is not at all clear, as the Agency’s own guidance demonstrates. The first illustrative ex­ample of a reference device that the FDA provided in the 2014 guidance was of a new knee prosthesis that used a coating that had never been used in knee pros­theses in the past. The coating in the example had been used in hip prostheses, in which important safety questions such as biocompatibility, strength, abrasion, and so forth had already been assessed. The reference rule would permit the manufacturer to rely on the function of the coating in hip prostheses “to assist with the characterization of the coating on the new device.” This scenario is virtually indistinguishable from the manufacturer’s use of multiple predicates in the ASR hip prosthesis that was the focus of the Ardaugh study. Thus, al­though the 2014 guidance may bring the Agency’s practices into nominal con­formity with the statutory standard of substantial equivalence, whether it sufficiently addressed the safety concerns that were raised remains unproven.

There is a limited body of empirical evidence supporting the claim that the use of multiple predicates compromises device safety. Maisel’s 2010 study found that the use of a large number of multiple or split predicates (6 or more) increased the risk of a device recall (p=0.003). However, given the methodo­logical problems with the Maisel study, these limited findings provide limited support for criticisms of the use of multiple devices as predicates in 510(k) de­vices.

The potential impact of the use of multiple, split, and reference predicates may be tested by the following hypotheses:

H7a: Devices that cite more than one device as a predicate are not more unsafe than devices that cite only a single predicate.

H7b: Devices that cite one (or more) devices as a predicate and one (or more) devices as a reference are not more unsafe than devices that cite only a single predicate.

C. Reform Proposals

These criticisms of the 510(k) pathway regarding safety have spawned nu­merous proposals for reform, all of which would to some extent destabilize many aspects of the overall device regulatory regime. Some of the proposals whose impact on that regime would be the most limited have focused on how the FDA uses its existing statutory authority. These include calls for the FDA to more frequently require clinical trial data before granting 510(k) clearances. The assumption underlying these calls is that clinical trials will detect safety risks. The FDA itself proposed a sub-regulatory approach in which CDRH would develop guidance defining a subset of intermediate-risk devices, called “Class IIb” devices, “for which clinical information, manufacturing infor­mation, or potentially additional evaluation in the post-market setting would typically be necessary.” Others have urged the FDA to channel more pre­market device evaluations through the more rigorous PMA pathway by assign­ing them a Class III risk designation. The FDA has recognized its authority to do so, noting that the indistinct boundary between changes that would trigger the requirement for a PMA and those that may be made through a 510(k) might lead the Agency to require a full PMA application. This suggests that the Agency has significant latitude to channel the premarket evaluation of modifi­cations through the PMA pathway. However, this authority may be limited in contexts where earlier devices of the same generic type had been assigned to a lower risk category and thus regulated under the 510(k) pathway.

Historically, many proposals have focused on the decades-long delay by the FDA in meeting its statutory obligation to classify all devices that had been on the market at the time the MDA took effect. Some pre-amendment devices would ultimately remain Class III devices, while others would be reclassified as Class I or II. Once the classification for a device type was finalized, the FDA was to order the manufacturers of devices remaining in Class III to submit for­mal PMA applications, including clinical trial data. But the Agency was slow to classify many pre-amendment devices and was slow to order submissions of PMA applications for many device types. Over the duration of this process, many critics urged the FDA to complete the device classification effort as a means of subjecting more devices to the rigorous PMA safety evaluation. The Agency finally completed the reclassification process in 2019, rendering con­cerns over these so-called “Class III 510(k) devices” irrelevant moving forward.

Proposals that would have more expansive impacts on the device regulatory regime have urged regulatory or statutory changes to address specific problems that critics claim to have identified with the 510(k) pathway. Commentators have proposed that Congress statutorily expand the FDA’s authority to require clinical trial data before granting 510(k) clearances, adopt a more stringent standard for clearance, and eliminate the allowance of the use of predicate devices that have different technologies than the subject device. Commenta­tors have also called on the FDA to promulgate regulations that would end the use of multiple predicates and limit the age of predicate devices.

A set of proposals with even more far-reaching ramifications urges more fundamental changes to the medical device regulatory scheme. The 2011 Insti­tute of Medicine report proposed eliminating the 510(k) pathway, concluding that “the FDA’s resources would be put to better use in obtaining information needed to develop a new regulatory framework for Class II medical devices and addressing problems with other components of the medical-device regulatory framework.”

This approach has been advocated by others as well. In a recent JAMA article that described the 510(k) pathway as leaving a “deadly legacy,” Doctors Eli Adashi and Katina Robison and Professor I. Glenn Cohen went beyond the Institute of Medicine’s strong reform proposal. After echoing the Institute’s urging that “Congress would do well to enact an altogether new public law that will replace the 510(k) process outright,” these authors proposed that “Congress could emulate the FDA drug approval program replete with an investigational new device application for the conduct of clinical trials to be followed by a new device application, the efficacy and safety of which would be assessed by an expert FDA Public Advisory Committee.” Other critics have proposed elim­inating FDA premarket evaluation of medical devices entirely, and to rely in­stead on independent third parties to certify new devices, or on a “sharp and efficient post-market surveillance system.”

Each of these proposals, even the most limited, carries the potential to sig­nificantly disrupt some aspects of medical device regulation. And some carry the potential to disrupt the entire medical device regulatory regime. But these proposals rest on a thin layer of quantitative empirical evidence. One reason for the paucity of such evidence arises from the methodological commitments of many of the authors in the legal literature, who tend toward statutory and doc­trinal analysis, deductive reasoning, and so on. Many of the articles in the med­ical literature have focused on the 510(k) devices in the specialty area of their authors.

Another reason for the paucity of empirical support arises from perceived difficulties in obtaining the data needed to perform quantitative analyses. The Institute of Medicine, in its 2011 report, presented a highly pessimistic view of the possibility of empirical study:

About 120,000 510(k) submissions have been cleared over the past 35 years. . . . Today, CDRH cannot reconstruct the “piggy-backing” of devices without a manual review of perhaps thousands of files. Even if a computerized database allowed easy access to the history, the agency would have to review every decision manually to identify questionable ones. The cost of the exercise would be staggering; the benefit would be, it is hoped, small in terms of iden­tifying devices that should not have gotten to the market by a 510(k) clear­ance.

However, since the Institute of Medicine’s report, three significant devel­opments cast doubt on the pessimistic conclusion about the feasibility of a broad empirical assessment of innovation under the 510(k) pathway. First, medical scholars have demonstrated the utility of a methodology called regulatory an­cestry, in which data that are publicly available from the FDA’s websites is used to reconstruct the web of subject-predicate device relationships in limited tech­nology spaces. Using this methodology, scholars have traced the technological development of artificial hips and several types of surgical mesh, in the service of criticizing the pathway’s failure to adequately ensure device safety. Second, the data about medical devices that are available through the FDA’s websites has grown more robust and easily accessible. And third, the ability to automate the acquisition of the necessary documents and to extract the necessary infor­mation has become more readily available. Taken together, these developments support the claim that empirical study of the 510(k) pathway—how well it en­sures safety (and how it facilitates or stifles innovation)—is feasible and can provide useful information that should inform any serious discussion of reform.

III. A Pilot Empirical Study of Device Safety Under the 510(K) Pathway

This Part presents a pilot study that employs a methodology developed in the medical literature, to challenge the Institute of Medicine’s pessimistic con­clusion on the feasibility of empirical study of the 510(k) pathway. Using the information contained in the 510(k) summaries of a limited set of medical de­vices, the study empirically tests the hypotheses that were generated in Part II and thus facilitates an assessment of whether the adopted methodologies, if scaled for use on a much larger data set, could support or refute many of the common criticisms of the pathway. The pilot study is also intended as a means to estimate the cost of such a study in terms of time, effort, and money.

The study examines all medical devices that the FDA has cleared for the intended use of removing blood clots (“thrombus”) from the arteries that supply blood to the brain (the “neurovasculature”) of patients experiencing an acute ischemic stroke. Interest in physically extracting thrombus from the neuro­vasculature arose from limitations in what was, in the late 1990s and early 2000s, the state-of-the-art treatment for acute ischemic stroke: the use of clot-busting, or so-called “thrombolytic” drugs such as recombinant tissue plasmin­ogen activator (rt-PA) to dissolve the thrombi. Unfortunately, the administra­tion of rt-PA markedly increased the risk of a potentially fatal hemorrhagic stroke. Further, in clinical practice very few patients were candidates for thrombolytic therapy. Moreover, thrombolytic therapy failed to prevent last­ing neurologic damage in half to two-thirds of patients who received rt-PA. These and other shortcomings of thrombolytic therapy led physicians to seek other therapeutic modalities.

One such modality was mechanical. The FDA had already cleared a number of devices for use in retrieving foreign bodies in various parts of the vascular system. These foreign bodies were typically the result of medical misadven­tures—guidewires and vascular catheters that had fractured, stents that had failed to deploy in a stable position, and so on. Driven by clinical need, physi­cians began to use these devices off-label in patients experiencing acute is­chemic strokes who either failed or were ineligible for rt-PA treatment. In response, the FDA created a regulatory space specifically for devices intended for this use. By channeling new devices through the NRY product code (and the latter-added POL code), the FDA provided itself with the ability to tailor its regulations and premarket assessment to the specific risks posed by these uses.

This technology space (the “NRY/POL” space) has features that make it ideal for a pilot study: The space is manageable in size, consisting of 85 devices (84 cleared through the 510(k) pathway and 1 by De Novo classification), and is relatively young, thus avoiding difficulties in obtaining information about older devices. On the other hand, the space is large enough that conducting the pilot study could provide useful information on the feasibility of more compre­hensive studies. These features facilitated testing my claim that the Institute of Medicine’s conclusion on the infeasibility of empirical study of the FDA is no longer sound.

The study had two additional purposes. First, the study was intended to characterize how well the 510(k) pathway functions to ensure the safety of the devices in the NRY/POL technology space. This is essentially a descriptive en­deavor, including an attempt to estimate how often these 510(k) devices present unacceptable risks to patients. The study was also intended to determine whether the methodologies employed here yield an estimate that is consistent with earlier studies using different methodologies.

Second, the study was designed to test whether the specific aspects of the pathway that critics have claimed adversely affect safety are actually correlated with less safe devices. The aim was to begin to construct a nuanced understand­ing of how the 510(k) pathway functions to ensure device safety through the testing of the hypotheses formulated in Section II.B above. The study admittedly replicates fault of earlier studies of the 510(k) pathway used to support reform proposals, in that it focuses on a single technology space. Because of this, and because the main purpose of the study was to validate the utility and reliability of the employed methodologies, the intention was to treat the substantive study findings as hypothesis generating.

Throughout the discussion, the term technology space (or simply “space”) refers to devices that are intended for use in removing thrombi in acute strokes and that are identified by the product codes (NRY and POL) the FDA has as­signed based on this intended use. A manufacturer who obtains 510(k) clear­ance for its first device bearing one of these codes is said to enter the space. Once a manufacturer has obtained a 510(k) for such a device, the manufacturer is said to be “in the space.” Devices, including predicate devices, with different or more general intended uses (and thus with different product codes) are said be “outside the space.”

A. Methods

I constructed a database of all 510(k)-cleared devices bearing the product codes NRY and POL (n = ­­85). The primary data source was the FDA’s 510(k) Premarket Notification Database. I manually downloaded the 510(k) clear­ance letters for all NRY and POL devices, updating the database periodically, most recently on May 31, 2022. Thus, the study period extended from the date of the first 510(k) clearance in this space on August 11, 2004, through May 31, 2022. A second data source was the FDA’s device recall database, which was used to confirm the occurrence of recalls for the NRY/POL devices.

For each device I extracted data about the subject device, its predicate(s), and any reference devices from the 510(k) summary. This information included the dates of the 510(k) submission and the date of the clearance, the manufac­turer, whether the clearance was the manufacturer’s first entrance into the NRY/POL technology space, the type of 510(k) (Summary or Statement: Tra­ditional, Special, or Abbreviated), the reason for the 510(k) submission (demon­strating substantial equivalence to support a design, material, or process change; expanded indication), whether and what type of clinical trial data were submit­ted, the predicates that were cited as well as their clearance date and their man­ufacturer, reference devices that were cited and their manufacturers, FDA recalls (including their date and classification), and the changes that were made from the predicate device(s). For devices for which a 510(k) summary was not available on the FDA’s website (n=2), I submitted FOIA requests to obtain the summaries. The Agency provided the summaries 113 days after the requests were submitted. The data were assembled in an Excel spreadsheet.

Describing the NRY/POL technology space and testing several of the hy­potheses generated in Section II.B requires knowledge of the subject-predicate relationships between the devices in this space. To facilitate this analysis, I adopted a methodology used by scholars in the medical literature to generate qualitative criticisms of the 510(k) pathway. Since 2013, medical journals have published a growing number of “ancestry studies” (variously referred to as “510(k) ancestry,” “regulatory ancestry,” and “predicate ancestry” studies) of devices cleared through the 510(k) premarket notification pathway. Ancestry studies use data available from publicly accessible FDA databases to construct a network model of devices akin to a genealogical tree, linking each new device to its predicates.

The first use of this methodology I have located was published in 2013. In a New England Journal of Medicine article, Brent Ardaugh and col­leagues traced the ancestry of one particular model of hip prosthesis, the DePuy ASR XL Acetabular Cup System, through a total of 95 devices that had been 510(k)-cleared earlier. The DuPuy hip had six immediate predicates, each of which had from one to six predicates, and so on, reaching back through six gen­era­tions. Based on their 510(k) ancestry, the authors concluded that the path­way was seriously flawed, having permitted a complex device onto the U.S. market “that was never shown to be safe and effective,” and argued that the high failure rate of the device could have been identified if the FDA had required a clinical trial of the ASR device.

Nasim Zargar and Andrew Carr created an ancestry map of seventy-seven surgical meshes which the FDA cleared over a three-year period. They de­scribe a dense network of subject-predicate relationships between a total of 477 devices, within which the study population of seventy-seven devices were em­bedded. Zargar and Carr demonstrated that 510(k)-cleared devices that had been recalled for “design and material related flaws” had served as predicates for multiple subject devices and multiple subsequent generations of devices. Carl J. Heneghan and colleagues and Jeremy Rosh and coauthors constructed predicate ancestries of sets of meshes that are used for pelvic reconstruction surgeries. Both groups focused on the safety risks created by predicate creep.

These studies demonstrate that ancestry studies provide a powerful tool for studying the function of the 510(k) pathway. The work done to date has demon­strated the promise of this technique in qualitatively assessing the safety of de­vices cleared through the 510(k) pathway. In the study presented here, I use ancestry study methodology to facilitate quantitative analysis. To assist in the analysis, I used network analysis software (UCINET and NetDraw) to visualize this technology space as a network in which each node represents a device and each connection between nodes represents a subject-predicate relationship. Adobe Illustrator was used to generate a graphic display of this network (see Figure 1).

To determine the proportion of devices cleared through the 510(k) pathway that are unsafe, a simple fraction was used:

Number of cleared devices that are not safe/
Total number of cleared devices

Unfortunately, neither the numerator nor the denominator of this fraction is easily (if at all) determinable. In an earlier work, I have pointed out some of the difficulties in establishing the numerator of this fraction in the context of PMA devices, including serious problems with the underreporting of device failures and injuries and the poor quality of information in the MAUDE and other data­bases. The same problems exist in the 510(k) context. In accordance with most other investigators, in that previous work I adopted a surrogate marker of device failure: an FDA-declared Class I recall. Most, but not all, commentators agree that this marker is underinclusive—that many flawed devices will not be subjected to a Class I recall. Thus, using Class I recalls as a marker of the failure of the 510(k) pathway to ensure safety will underestimate the number of device failures and will skew the analysis toward an overly optimistic view of the safety function of the pathway. However, this approach seems to strike the most acceptable balance between under- and over-inclusiveness while avoiding the danger of investigator subjectivity that would attend an approach in which the investigator determined which recalls were markers of dangerous devices. Thus, throughout this study, the occurrence of a Class I recall is used as a sur­rogate marker for an unsafe device.

Determining the denominator of this fraction is also difficult. Not all de­vices that obtain 510(k) clearance are marketed, and it is often not feasible to determine how long devices that were marketed remained at risk of failure. Further, simply using the number of 510(k) clearances as the denominator ig­nores the small incremental changes to several successively modified devices, some of which should be considered a single device. In the PMA context two studies have addressed this problem by treating all successive modifications of an originally approved PMA device as a single device for the purposes of cal­culating a recall rate. However, this approach is not useful in the 510(k) con­text: unlike the single lines of descent that characterize each PMA device, every 510(k) device can serve as the predicate for multiple subsequent devices, creat­ing complex, branching descent patterns that render a simple grouping of de­vices impossible. Given these difficulties, it appears reasonable to continue to use the standard approach in which every 510(k) clearance is counted as a device at risk of recall. This will, of course, skew the calculated fraction toward underestimating the true rate of device failure, compounding the tendency to­ward underestimation that the use of Class I recalls for the numerator creates. Thus, any calculated failure rate will underestimate the true magnitude of the failings of the 510(k) pathway from a safety perspective.

Statistical comparisons were made using Microsoft Excel, Social Sciences Statistics, and Minitab. Continuous variables were compared using unpaired t-tests with unequal variance. Dichotomous variables were compared using Chi-squared tests. A p value of less than .05 was considered statistically significant.

B. Results

1. Descriptive Findings

The NRY/POL technology space was created by the 510(k) clearance of the Merci Retriever, made by Concentric Medical, on August 11, 2004. Since then, eleven additional companies have entered this technology space, market­ing a total of 85 devices (84 through 510(k) clearances and 1 through the De Novo pathway) that use a variety of mechanical means (snares, coils, etc.) to physically withdraw thrombi as well as aspiration techniques, which remove thrombi using suction.

Figure 1 displays the devices in the NRY/POL space as a network, with each device represented as a square or circular node, connected to each of its predicates by a line. Arrows point from the subject to the predicate device. The vertical axis is based on time, with the earliest cleared devices at the bottom of the figure. Manufacturers are displayed along the horizontal axis, grouped as noted in the figure legend by the relationships between the different companies.

 

Figure 1. Network Visualization of 510(k)-Cleared NRY and POL Devices

Figure 1: Network Visualization of NRY and POL 510(k)-cleared devices as of May 31, 2022. Concentric, Stryker, and InNeuroCo are grouped together because these companies formed relationships through acquisition and distribution agreements. Medtronic was grouped with MicroTherapeutics because the former owns and oper­ates the latter. Time is represented on the vertical axis, with nodes representing more recent clear­ances located toward the top of the figure. Lines represent a subject-predicate relationship, with the ar­row pointing from the subject to the predicate device. Black nodes represent devices that have been subjected to a Class I recall. Circu­lar nodes represent devices with clin­ical trial data in their 510(k) clearances. Diamond shaped nodes represent predicate devices with product codes other than NRY and POL.
Figure 1. Network Visualization of 510(k)-Cleared NRY and POL Devices

Figure 1. Network Visualization of 510(k)-Cleared NRY and POL Devices

 

Although twelve companies have obtained clearances to market the 85 de­vices in this space, three (Penumbra, Concentric, and Micro Therapeutics) dom­inated the space, having obtained 22, 18, and 18 clearances respectively. These companies also acquired or signed distribution deals with several of the smaller participants in this space. Thus, the three dominant companies were associated with 22, 32, and 19 of the cleared devices in this space, respectively, accounting for 86% of the devices in the space.

Ten of the 510(k)-cleared devices (11.9%) in this space were subjected to a Class I recall. All of the recalls arose from catheters that exhibited an in­creased risk of tip or shaft fracture, separation, or damage, which could result in embolization of device fragments deeper into the cerebral blood vessels. In total, the recalls involved a total of 41,523 units worldwide.

The FDA had determined the underlying cause for the device problems that led to three of the recalls; for the other seven the cause remained undetermined. One of the recalls was related to a problem with a specific lot of a raw material for a guidewire used to position the catheter; the Agency at­tributed the recall to a “component change control” issue. Assuming that this problem would not have been prevented by increased premarket scrutiny, the recall rate for 510(k)-cleared devices in the NRY/POL space attributable to problems that could have been prevented by more stringent premarket scrutiny was as high as 10.7%.

Four manufacturers had devices recalled, with two manufacturers account­ing for 80% of all Class I recalls. Of these, one was the largest participant in the space (Penumbra), which had 5 of its 22 cleared devices (22.7%) recalled. The other was a small participant (Imperative), which had 3 of its 4 cleared devices (75%) recalled. No other manufacturer had more than one device recalled. The mean interval between a 510(k) clearance for a device and a Class I recall for that device was 516 ± 415 days. The intervals ranged from 63 to 1,196 days. All but one recall was issued within three years of the device clearance.

To evaluate the possibility that companies with greater experience in the technology space produced safer devices, a post-hoc linear regression analysis was conducted to test whether the number of cleared devices predicted the per­cent of those devices recalled on a manufacturer-by-manufacturer basis. There was no significant correlation between the number of a manufacturer’s cleared devices and the percentage of those devices that were recalled, R2 = .01, p = .74.

Clinical trials were presented in 13 (15.5%) of the 510(k) submissions, in­cluding randomized clinical trials in 6 submissions, single arm trials in 3, real world data and observational reports in 2, and retrospective subgroup analyses of earlier trials in 2. In addition, randomized trial data were included in the sub­mission for the one de novo device in the NRY/POL space. Simulation studies were not counted as clinical trial data in the analysis, given their closer relation­ship to bench testing. Clinical trial data were more common in submis­sions marking manufacturers’ first entry into this technology space, with 33% of first-time clearances containing clinical data compared with 12.5% of subse­quent clearances. This finding shows that the FDA more frequently requires clinical trial data for manufacturers’ first entry into the NRY/POL technology space. However, given the small sample size, this apparent difference did not reach statistical significance, Χ2 (1, N=84) = 3.412, p = .065, and should not be ex­trapolated to a broader cohort of 510(k) devices.

The use of multiple or split predicates was common, occurring in 34 (40.5%) of clearances. The use of reference devices in 510(k) submissions for NRY/POL devices began in 2016; since then, 18 cleared submissions cited at least one reference device in addition to a predicate. In total, 47 of the 510(k) clearances (56.0%) cited more than one device (either multiple predicates or one or more predicate(s) plus one or more reference devices) to support a finding of substantial equivalence.

Of the 510(k)-cleared devices in this space, 66.7% (56 of 84) served as predicates for later 510(k) clearances. The one de novo device also served as a predicate in later 510(k) clearances. In total, seven of the twelve manufacturers obtained clearances for devices that later served as predicates. Devices that served as predicates were cited by subsequent devices a mean of 2.5+/-1.5 times, ranging from 1 to 7 times. The more 510(k) clearances a manufacturer had in this space, the more times its own devices were cited as predicates, with a linear regression of predicate cites versus 510(k) clearances statistically significant with an R2 = .939, p < .001. This is consistent with the practice of manufacturers almost exclusively to cite their own devices as predicates.

2. Empirical Testing of Specific Criticisms of the 510(k) Pathway

This Section reports the results of testing the hypotheses generated in Sec­tion II.B. The formulation of the hypotheses was updated to incorporate the use of Class I recalls as the marker of unsafe devices.

H1a: Devices that are cleared with clinical trial data (Gen S0 devices) are not less likely to be subjected to a Class I recall than devices that are cleared without clinical trial data.

In the NRY/POL technology space there was no evidence that 510(k)-cleared devices with clinical trial data were more safe than 510(k) devices that were cleared without clinical data. For Gen S0 devices (which were 510(k) de­vices submitted with clinical trial data, n = 13), two (15.4%) had a Class I recall, compared to 11.3% of devices (8 of 71) cleared without clinical trial data. Given the sample size, the difference between the two groups is not statistically significant, Χ2 (1, N=84) = 0.178, p = .67, and the null hypothesis cannot be rejected.

Hypotheses H1b and H1c examined the question of whether clinical trial data might exert a protective effect that would manifest itself over subsequent gen­erations of devices. For H1b, the risk of a Class I recall for the cohort of Gen S0 and Gen S1 devices (which cited a Gen S0 device as a predicate) was compared to the risk of Class I recall for all other devices. Of the Gen S0 and Gen S1 device cohort (n = 38), four (10.5%) were recalled, which was not significantly differ­ent from the other devices, for which six out of 46 (13.0%) were recalled, Χ2 (1, N=84) = 0.126, p = .72. Carrying the analysis one generation further, comparing the recall rates of the cohort of Gen S0, Gen S1, and Gen S2 devices with all other devices led to similar findings. For the 54 devices in the former category, 5 (9.3%) were recalled, whereas for the 30 devices in the latter category, 5 (16.7%) were recalled, Χ2 (1, N=84) = 1.009, p = .32. Thus, over two to three generations of devices in the NRY/POL space, there was some evidence of a protective effect of clinical trial data. However, the failure to reach statistical significance indicates that this finding cannot be generalized to the entire 510(k) device cohort.

H2a: Gen U1 devices are not more likely to be subjected to a Class I recall than other devices.

H2b: The cohort of Gen U1 and Gen U2 devices are not more likely to be sub­jected to a Class I than other devices.

The inclusion of unsafe devices in the ancestry of 510(k) devices was asso­ciated with compromised safety in the NRY/POL space. Devices that were sub­jected to a Class I recall (Gen U0 devices) were cited as predicates by 18 later Gen U1 devices. The proportion of Gen U1 devices subjected to a Class I recall was 22.2%, compared with 8.8% of devices that cited only non-recalled devices as predicates, Χ2 (1, N=84) = 2.487, p = .11. The nearly three-fold difference is not statistically significant and thus cannot be generalized to the cohort of all 510(k) devices, although this may be a function of the small sample size.

Network visualization of the subject-predicate relationships in the NRY/POL space provides additional support for rejecting the null hypothesis in that space. As seen in Figure 1, of the ten Class I recalls in this data set, a ma­jority (n=6) occurred in two clusters, each of which involved three devices. In one cluster, Class I recalls occurred in three successive generations of devices (K183043, K202182, and K210996), while in the other two recalled devices both cited the same recalled predicate (K191946 and K202251 both cited K190010 as their predicate).

Carrying the analysis through to the next generation, there were 28 Gen U2 devices. Of these, 21.4% were subjected to a Class I recall, compared with 7.0% of devices that did not have a recalled device as a predicate or the predicate’s predicate, Χ2 (1, N=84) = 3.632, p = .057. Again, while failing to reach the adopted threshold for statistical significance, this finding suggests that there may be a correlation between recalled devices and subsequent generations of devices with those recalled devices in their predicate ancestries, and that the hypothesis should be tested in a larger data set.

H3: Devices with review times in the shortest quartile are not more likely to be subjected to a Class I recall than all other devices.

The mean review time for the 510(k) clearances in this space was 124 ± 91 days, well over the FDA’s goal of 90 days. More than half (46) of all clearances had review times in excess of 90 days, with the longest review times running well over one year (454 days). There was no clear trend toward a decrease in review time over the seventeen-year span studied, despite the FDA’s repeated claims that review times would be shortened.

There was no correlation between shorter review times and compromised device safety in this cohort of devices. Devices in the shortest quartile of review times (review time < 57 days) had a recall rate of 14.3%, compared with a recall rate of 11.1% for all other devices, Χ2 (1, N=84) = 0.151, p = .70.

H4: Devices cleared through the Special and Abbreviated pathways are not more likely to be subjected to a Class I recall than devices cleared through the Traditional 510(k) pathway.

None of the devices in this technological space were cleared through the Abbreviated 510(k) pathway. Manufacturers made frequent use of the Special 510(k) pathway: 27 (32.1%) of the clearances were through the Special path­way. As would be expected, none of these submissions contained clinical trial data and all cited only devices by the same manufacturer as predicates. Review times were significantly shorter for devices cleared through the Special 510(k) pathway, 41 ± 19 days versus 164 ± 84.5 days for devices using the Traditional pathway, p < .001.

There was no evidence that devices cleared through the Special 510(k) pathway were less safe than devices cleared through the Traditional pathway. For devices cleared through the Special 510(k) pathway, 3 (11.1%) were sub­jected to a Class I recall. For devices cleared through the Traditional 510(k) pathway, 7 (12.3%) were subjected to a recall, Χ2 (1, N=84) = 0.024, p = .88. This finding appears to conflict with Maisel’s analysis, in which devices cleared through the Special and Abbreviated pathways were more likely to be recalled. However, because of the small sample size in the current study, the apparent lack of a difference between the risk of recall for devices cleared through the Traditional and the Special 510(k) pathways should not be generalized to the larger cohort of 510(k) devices.

H5a: Devices for which the interval between the clearance date of the predi­cate (or whose youngest predicate where more than one predicate is cited) and the clearance of the subject fall within the shortest quartile are not more likely to be subjected to a Class I recall than other devices.

H5b: Devices for which the interval between the clearance date of the predi­cate (or whose oldest predicate where more than one predicate is cited) and the subject fall within the longest quartile are not more likely to be subjected to a Class I recall than other devices.

There was no evidence that the use of very young or very old predicates was associated with compromised device safety. To evaluate the question of whether younger predicates increased the likelihood of recall (which Maisel found in his 2010 study), a data set was constructed in which devices citing multiple predicates were assigned a single predicate, which was the youngest of all the predicates cited in the 510(k)-clearance letter. For devices citing only a single predicate, the age of that predicate was used. The cleared devices whose youngest predicates fell into the youngest quartile had a 9.5% recall rate, com­pared with a 14.5% recall rate for all other devices, Χ2 (1, N=84) = 0.151, p = .70. Unlike Maisel’s earlier study, in this population of devices younger predi­cate age was not associated with a higher risk of recall.

To evaluate whether citing older predicates increases the likelihood of re­call, a data set was constructed in which devices citing multiple predicates were assigned a single predicate, which was the oldest of all predicates cited in the 510(k). For devices citing only a single predicate, the age of that predicate was used. Devices whose predicate in this subset were in the oldest quartile had a recall rate of 9.5%, compared with 14.5% for all other devices, Χ2 (1, N=84) = 0.151, p = .70. Thus, including older technology in a 510(k) device in this tech­nology space did not appear to correlate with a higher risk of recall.

H6: Predicate Creep: There is no difference in the risk of Class I recall for Gen S0, Gen S1, and Gen S2 devices.

If predicate creep adversely impacts device safety, later generations of de­vices after an initially safe device would be expected to have a higher risk of recall. However, in the set of NRY/POL devices the findings were that 14.3% of Gen S0 devices (which had clinical trial data), 8% of Gen S1 devices, and 5.9% of Gen S3 devices were subjected to a Class I recall, Χ2 (1, N=56) = 0.715, p = .70. Thus, there is no evidence that predicate creep adversely affected safety over three generations in this small data set. However, this finding cannot ex­clude the possibility that more than two iterative changes are necessary for evi­dence of predicate creep to emerge. A larger database might facilitate testing this hypothesis.

H7a: Devices that cite more than one device as a predicate are not more likely to have a Class I recall than devices that cite only a single predicate.

H7b: Devices that cite one or more devices as a predicate and one or more devices as a reference are not more likely to have a Class I recall than devices that cite only a single predicate.

Devices that cited multiple predicates were not more likely to have a Class I recall than devices that cited only a single predicate. Of the 34 devices that cited multiple predicates (H7a), three had Class I recalls (8.8%), while 7 of the 50 devices (14%) that cited only a single predicate had recalls, Χ2 (1, N=84) = 0.517, p = .47. Devices that cited either more than one predicate or a predicate plus one or more reference devices (H7b) were nearly twice as likely to have a Class I recall: Of devices citing multiple predicates or predicates plus refer­ences, 7 of 47 (14.9%) had a recall, compared with 3 of 37 (8.1%) devices citing only a single predicate had a recall. However, this result was not statistically significant, Χ2 (1, N=84) = 0.909, p = .34, again possibly related to the small sample size.

IV. Study Interpretation

The study presented in Part III was intended to determine whether the reg­ulatory ancestry and other methodologies that were used could generate infor­mation that would support or refute many of the common criticisms of the pathway and to estimate the cost of a much larger study in terms of time, effort, and money. A second purpose was to provide descriptive information about how often 510(k) devices present unacceptable risks to patients, and a third purpose was to begin to develop a nuanced understanding of the safety function of the 510(k) pathway. The study findings are interpreted in relation to these purposes in Sections IV.A, B, and C, respectively.

A. The Feasibility and Utility of Empirical Study of the 510(k) Pathway

The pilot study presented in Part III supports the claim that, contrary to the Institute of Medicine’s pessimistic conclusion in its 2011 report, empirical study of the 510(k) pathway can provide reliable and useful information about 510(k) devices and the 510(k) pathway, with a reasonable investment in time, effort, and money. And the study suggests that the methodologies employed are scala­ble and could provide a means to evaluate the entire cohort of 510(k) devices that have been cleared in the past fifteen to twenty years.

Regarding the reliability of the information provided, the data collected ap­pear to completely represent the devices in the NRY/POL space and the findings are not inconsistent with prior studies of the frequency of 510(k) device recalls. To assess whether the use of the FDA’s website and FOIA requests facilitated the capture of all relevant information, medical, general media, and industry publications were reviewed for mentions of other devices used for the extraction of clots for the neurovasculature of stroke patients. None of the discussions of devices and recalls in these sources mentioned cleared devices that were not included in the data set. Thus, there is no evidence that the study failed to capture any 510(k) devices in the NRY/POL space and any Class I recalls involving those devices.

To assess whether the findings were reliable, the percentage of NRY/POL devices that were recalled was compared to earlier studies of recalls of 510(k) devices. These prior studies, which used a variety of methodologies, reported a range of recalls from 0.5% to 10.7%. At the low end of the range, Professor Ralph Hall reported that less than 0.5% of devices that were cleared between 2005 and 2009 were subjected to a Class I recall over that period of time. However, Hall’s methodology underestimated the true percentage of 510(k)-cleared devices that are unsafe. First, as discussed earlier, using Class I recalls as a marker of unsafe devices will underestimate the true percentage. And second, Hall’s methodology examined only a short time period for many of the devices. Devices that were cleared in later parts of the study period (in 2008 and 2009) were on the market for two years or less, meaning that those devices might well have been recalled after the close of the study period. Thus, Hall’s study provides only an estimate of the lower end of the range of the percent of 510(k) devices that are unsafe, and almost certainly represents a best-case scenario.

Dubin’s study found that 0.8% of devices cleared over a ten-year period were subjected to a Class I recall. This study also found that 10.7% of cleared devices were subjected to any recall. As discussed above, by counting all re­calls, many devices that were considered unsafe likely were not systematically flawed, because Class III and many Class II recalls are for issues that do not reflect a systematic device flaw. This approach provides an estimate of the upper end of the range of probabilities that 510(k) devices are unsafe. Maisel’s study also counted all recalls as evidence that a device was unsafe, and thus provided an estimate of the upper bound; Maisel’s study found that 8.5% of 510(k) de­vices are unsafe.

The instant study found a Class I recall rate for 510(k) devices of 10.7 to 11.9%, which is higher than the upper bound found in the earlier studies (the overall recall rate in Dubin’s study). The higher rate in the instant study may appear to cast doubt on the reliability of the study findings. However, several factors likely combine to explain this seeming discrepancy. The higher rate in the present study may simply reflect statistical uncertainty arising from the small sample size. The higher rate may also arise from better source data—the FDA’s online databases may more robustly capture the numbers of recalls that have occurred in recent years. And the higher recall rate may also reflect a selection bias, in that the devices in the NRY/POL space may represent a cohort of de­vices that, while legally classified as intermediate risk, in fact present a higher risk owing to the conditions under which they are used. Therefore, it appears reasonable to conclude that the finding that 10.7 to 11.9% of 510(k) devices are unsafe is likely within the anticipated upper and lower boundaries and is thus reliable.

Regarding the production of useful information, the study showed that many of the criticisms of the pathway can be formulated as empirically testable hypotheses. These criticisms include the limitations on the FDA’s authority to require clinical trial data in 510(k) pathway submissions, the potential dangers created by predicate creep, and the use of multiple predicates, among others. Although the small sample size limited the conclusions that can be drawn about larger cohorts of 510(k) devices, the study showed that questions over whether important features of the pathway compromise device safety can be studied em­pirically.

The methodologies employed in the study presented here also have important limitations. First, normative questions, such as whether the current regulatory regime is striking the appropriate balance between ensuring safety and fostering innovation, are not amenable to empirical testing. However, the data provided by studies such as the one presented here can provide important quantitative information about the pathway’s function on both sides of the safety/innovation equation. Second, determining how often FDA-regulated medical devices pre­sent unacceptable risks remains an imprecise effort. The use of Class I recalls as a surrogate means that the true proportion of devices that are unsafe will be underestimated and that some correlations between specific aspects of the pathway and compromised device safety may go undetected. Third, some as­pects of the pathway are not amenable to empirical testing because no compar­ator group is available and no useful surrogates can be identified, even if quantitative data could be generated. Thus, some of the specific safety criticisms discussed in Part II above (e.g., that the MDA’s statutory structure was not de­signed to ensure safety and that the substantial equivalence standard for 510(k) devices is insufficiently stringent) are not promising subjects of empirical study. Finally, the study revealed that the regulatory ancestry methodology may pro­vide only limited insights into the downstream effects of clinical trial data or an unsafe device in the predicate ancestry web. The second generation of devices (Gen U2) that included an unsafe device (Gen U0) in their subject-predicate an­cestries comprised over half of all devices in the data set. Carrying the analysis one or two generations further would likely leave too few devices in the cohort without an unsafe predicate in their ancestry to permit statistically meaningful comparisons. Such a finding, though, is important on its own: Reliance on tech­nologies in unsafe predicates would be almost universal.

In addition, using the regulatory ancestry methodology to construct a com­plete map of the subject-predicate relationships may omit certain devices and relationships. This may occur because the FDA sometimes reclassifies 510(k) devices to Class III because of information that is obtained post-clearance indi­cating a lack of safety. The manufacturers of these devices must then submit a PMA application. These devices will no longer appear in the FDA’s 510(k) da­tabase, which would leave gaps in the predicate ancestry map. Although there were no reclassified devices in the NRY/POL space, any larger study would need to capture these devices and their relationships to permit an accurate as­sessment of the proportion of devices that were recalled and to facilitate the kind of hypothesis testing preformed in the instant study.

Finally, regarding the time, effort, and cost of empirical study, the Institute of Medicine’s pessimistic conclusion is undercut by the pilot study presented here. The data set analyzed in the study was assembled almost entirely from the FDA’s publicly available databases using the Agency’s web-based search en­gines to obtain the 510(k) summaries for the devices in the NRY/POL space. The accessible summaries contained this information in easily extracted formats for the devices cleared since 2008. The summaries for some device cleared be­tween 2004 and 2007 were less robust, sometimes referring to predicates only by a device trade name without an associated 510(k) number. This necessitated further research in a few cases. The summaries for two of the 84 devices (2.4%) that were cleared under the 510(k) pathway were not available through the FDA’s online databases, necessitating requests under the Freedom of Infor­mation Act. Although the FDA states that FOIA requests for information about 510(k) devices will take eighteen to twenty-four months, the Agency pro­vided the summaries in response in 113 days. Likely this was because a request for summaries is simple and there was no need to redact information such as trade secrets in those summaries.

Thus, in spite of the limitations discussed above, the endless volume of crit­icism and the many calls to reform or even to eliminate the 510(k) pathway highlight the importance of empirically informed policy analysis. The infor­mation provided by the study presented in Part III can be useful for determining whether the reform proposals have a solid grounding. Based on the experience of conducting this pilot study, it seems likely that the methodologies used can be scaled for much larger studies. Although manually downloading and extract­ing the necessary information was moderately time consuming, one recent re­port and an ongoing project suggest that most if not all of this effort may be automated. Based on these considerations, the Institute of Medicine’s con­clusion that empirical study of the 510(k) pathway is not feasible is no longer supportable.

B. Using the Study Findings to Describe the 510(k) Pathway

The Institute of Medicine noted that a complex web of subject-predicate relationships exists between 510(k) devices, and prior work in the medical liter­ature using the regulatory ancestry methodology has reinforced this concept in several technology spaces. The devices comprising the NRY/POL technology space are likewise characterized by a dense and complex set of subject-predicate relationships. Figure 1 (shown above) displays this web of relationships in the NRY/POL space. There were 145 subject-predicate relationships between the 85 devices in the technology space. And because one-third of devices were never cited as a predicate, the mean number of predicates cited for each subject device was 1.7 ± 1.2.

The dense interlinking of devices was, however, limited in one important way. The introduction of new devices took place largely through a process of manufacturers modifying their own devices—and using their existing devices as predicates—without citing other manufacturers’ devices as predicates. To the extent that citing a device as a predicate reflects the incorporation of the predi­cate device’s technology, iterative evolution occurred predominantly through a process in which each manufacturer drew upon and modified its own technolo­gies; there was little evidence of technology “borrowing” from one manufac­turer by another. Thus, manufacturers tended to develop their technologies in parallel, with minimal crossovers between companies. The data used in this study do not facilitate a determination of whether this is due to the risk of patent infringement liability, or to manufacturers’ greater familiarity with their own technologies and products, or to other causes.

Consistent with prior reports, clinical trial data were included in only a small number of 510(k) clearances in the NRY/POL space, amounting to less than 16% of the total. And randomized clinical trials, which are considered the gold standard, were included in just 7% of all clearances. Clinical trial data were more commonly found in manufacturers’ first entry into this technology space, with 33% of first-time clearances containing clinical data compared with 12.5% of subsequent clearances. This difference fell just short of statistical significance and thus cannot be extrapolated to broader cohorts of 510(k) devices. However, this finding shows that the FDA more frequently requires clinical trial data for manufacturers’ first entry into at least one technology space.

The finding of a 10.7 to 11.9% recall rate in the study presented here adds qualified support to those who have espoused the general criticism of the 510(k) pathway, that it fails to adequately ensure device safety. Studies by Hall and Dubin have reported a far lower rate of Class I recall for 510(k) devices. Hall reported that less than 0.5% of devices were subjected to a Class I recall over the time period from 2005–2009. Based on the findings of the study, Hall concluded that reform of the 510(k) pathway could improve the safety of just 0.22% of devices. Other authors have also argued that the 510(k) pathway adequately ensures device safety and strikes a desirable balance between safety and innovation.

Importantly, though, the use of Class I recalls underestimates the scope of the dangers posed by devices in this technological space. As discussed earlier, the use of Class I recalls as a surrogate marker of dangerously flawed technology is a compromise that knowingly underestimates the number of devices that cause broad and serious harm. Equally important, recognizing the close similar­ities that 510(k) devices have to their predicate ancestry and to the devices that descend from them, the use of the total number of 510(k) clearances as the de­nominator in risk calculations underestimated the true risk. Thus, the actual pro­portion of 510(k) clearances that have allowed devices with unacceptably poor safety assurances may be several multiples of the risk found here. The instant study suggests that, at least for certain technology spaces, the proportion of 510(k) devices that are unsafe is closer to or even higher than the top end of the range of estimates in the prior literature. The results of this study provide sup­port for the frequently stated general criticism that the 510(k) pathway fails to adequately ensure safety.

Two other important observations are supported by analysis of the fre­quency of recalls. First, consistent with earlier work on the 510(k) and PMA pathways, recalls tended to occur relatively early in the device life cycle. In the instant study, Class I recalls occurred at a mean of less than two years (516 ± 415 days) after the FDA granted the 510(k) clearance, and no Class I recalls occurred more than 3.3 years after clearance. These findings are consistent with earlier studies, including Maisel’s study, in which recalls occurred within the first four years after 510(k) clearance. These findings are also consistent with findings on recalls involving PMA devices. Combined with the earlier stud­ies, the instant study suggests that flawed device technologies tend to manifest themselves relatively early in a device’s life cycle. Although this might be reas­suring—flawed technologies might be recognized before too many patients had been exposed—many new devices are rapidly adopted into clinical practice, leading to large numbers of patients being exposed before the flaws can be rec­ognized.

Second, although there were twelve manufacturers in the technology space by the end of the study period, 80% of Class I recalls were for devices made by just two manufacturers. This clustering was not associated with lack of experi­ence of the manufacturers. This suggests that the FDA should focus closely on companies whose devices are recalled, especially when the percentage of that company’s devices recalled exceeds a certain threshold.

C. Developing a Nuanced Understanding of the Safety Function of the 510(k) Pathway

One purpose of this study was to begin to construct a nuanced understand­ing of how the 510(k) pathway functions to ensure device safety, by subjecting commonly stated criticisms of the pathway to empirical testing. In Section II.B, many of these criticisms were formulated as null hypotheses whose rejection is necessary to conclude that those criticisms are supported by the study. Due at least in part to the small sample size, the study did not support the rejection of any of the null hypotheses. However, the study did provide evidence suggesting that one of those criticisms is supported and that one might be supported. The study also suggests that many of the other specific criticisms of the 510(k) path­way are not well founded. For all of the hypotheses, testing in a larger data set is warranted.

The study provided evidence suggesting that the clearance of one unsafe device may lead to the clearance of several generations of unsafe devices that have the original unsafe device in their predicate ancestries. In the taxonomy created in Section II.A, this was referred to as the bad predicate effect. In the NRY/POL space, the majority of the Class I recalls occurred in two clusters, in which recalled devices had direct subject-predicate relationships. And the sta­tistical analysis, while failing to reach statistical significance at the p=.05 level, indicated a strong trend, with a threefold increase in risk for Class I recall in devices whose predicates or whose predicates’ predicates had been recalled. If confirmed in a larger study, this would suggest that the FDA should be provided with express statutory authority that would streamline the revocation of 510(k) clearances of a recalled device and other devices downstream. This would make it easier for FDA to remove devices in response to evidence of a lack of safety and authority to remove devices downstream from recalled devices.

The study also found that the risk of Class I recall is nearly doubled for devices that cite multiple predicates or a predicate plus a reference device (14.9% versus 8.1%). However, this result was not statistically significant, again possibly related to the small sample size. This finding should be evaluated in a larger study. If such a study indicated an increased risk, the FDA on its own or at Congress’s direction should renounce the practice of allowing multiple pred­icates and reference devices in 510(k) clearances.

By contrast, the study suggests that many specific criticisms that are fre­quently levelled against the 510(k) pathway are not well founded. The absence of clinical trial data, on which many critics have focused, was not correlated with a lower risk of recall. Further, although a downstream protective effect of clinical trial data was found in the subsequent two generations of NRY/POL devices, this finding did not achieve statistical significance.

These findings have several potential explanations. First, the sample size might be too small to detect an effect. Second, the finding might reflect the FDA’s ability to identify Class II devices that present higher risks but might also reflect poorly on the quality or quantity of clinical trial evidence the FDA re­ceives. This might arise from the FDA’s statutory obligation to apply the least burdensome principle. However, if confirmed in larger studies, these findings would indicate that the FDA’s limited authority to require clinical trial data was not a significant cause of the pathway’s poor safety performance and would challenge many reform proposals geared at expanding the Agency’s authority.

The study also fails to support the frequently stated assumption that predi­cate creep compromises the safety of 510(k) devices. If predicate creep ad­versely impacts device safety, each generation of devices that succeeded a device with clinical trial evidence of safety would be expected to have a higher risk of recall. However, the study found a nonsignificant trend in the opposite direction: devices with clinical trial data had a 14.3% risk of recall, whereas devices that cited those devices as predicates had an 8% recall rate, and the next generation of devices had a 5.9% recall rate. This finding may simply reflect the fact that two generations of small, iterative change is not enough to lead to the technological divergence that compromises device safety. The finding does, however, raise the possibility that concerns over predicate creep in 510(k) de­vices may be overstated and that this question should be tested in a larger data set. If confirmed in a larger study and over more generations, this finding would undercut proposals to require clinical trial data for more (or even for all) de­vices.

The study also failed to support calls to restrict the age of predicates cited in 510(k) submissions. No correlation was observed between younger predicates and the occurrence of a Class I recall. Further, there was no correlation between older predicates and the occurrence of recalls in the current study. The FDA has recently suggested that it may bar the use of predicates more than ten years old through regulatory means. This would have had no effect in the technology space examined here, because the oldest predicate was just over 9.5 years old. These findings suggest that focusing on predicate age is not likely to improve 510(k) device safety.

And in spite of the logical appeal of claims that shorter review times for 510(k) submissions make it impossible for the FDA to address all relevant safety concerns, shorter review times did not correlate with the occurrence of recalls in this study. It is possible that this is actually evidence of a systemic problem: perhaps all review times under the 510(k) program are too short to allow a full assessment of complex safety issues. But within the range of review times ob­served in this study, no effect was observed for shorter review times. Further—and contrary to the findings in Maisel’s study—the use of the Special 510(k) pathway, itself a more limited form of review, was not correlated with the oc­currence of recalls.

In interpreting the study, the small sample size and the fact that the devices in the NRY/POL space are likely among the highest risk of devices that the FDA classifies as intermediate risk, Class II devices, prevent the drawing of conclu­sions about larger cohorts of 510(k) devices. But the findings, if confirmed in a larger study, suggest that the 510(k) pathway does permit large numbers of un­safe devices to reach the U.S. market. If confirmed, the findings would also suggest focused changes to the 510(k) pathway that would improve safety while limiting the innovation-hampering effect of increased regulation. Congress could further expand the FDA’s authority to remove unsafe devices and their progeny from the market and to require safety data for several generations of devices whose subject-predicate chains can be traced back to those unsafe de­vices. And the FDA or Congress could act to eliminate the use of multiple pred­icates and reference devices in 510(k) submissions. But the findings, if confirmed, would also indicate that other features of the pathway, including the FDA’s limited authority to and practice of requiring clinical trial evidence of safety, the dangers posed by predicate drift, and the use of young or old predi­cates, do not compromise device safety. Such findings would support incorpo­rating (or at least not avoiding) these features in a future regulatory regime.

Conclusion

Striking an optimal balance between ensuring the safety of and permitting or even facilitating the innovation of medical devices is critically important. How well the 510(k) pathway, through which the majority of devices reach the U.S. market, strikes this balance is fiercely contested: some argue that the path­way is too lax, allowing many unsafe devices onto the market while others argue that the pathway is overly stringent, stifling innovation. Critics who have fo­cused on safety have identified many specific aspects of the statutory provisions, the regulations, and the FDA’s implementation of the 510(k) pathway as root causes of the pathway’s failure to ensure medical device safety. And they have proposed reforms that range from tinkering with the FDA’s implementation to changes in the existing regulations and statutes to far-reaching reconstructions of the medical device regulatory scheme. Unfortunately, the empirical support for most of these criticisms and proposed reforms is woefully limited.

But this need not be the case. The pilot study presented here demonstrates that empirical testing of the validity of many of the safety criticisms that have been levelled at the 510(k) pathway is feasible. The study validates a set of methodologies that can be employed in such an endeavor. The study also pro­vides some support to those who claim that the proportion of 510(k) devices that are unsafe is unacceptably high. Further, the study raises serious questions about some of the most common criticisms of the pathway. As a result, the study calls into question a number of commonly proposed reforms. Considering the stakes, empirical assessment of the 510(k) pathway is not, as the Institute of Medicine concluded a decade ago, an exercise that would offer a small benefit at a stag­gering cost. Rather, the future of the 510(k) pathway can and should be deter­mined by empirically supported answers to carefully crafted questions.

    Author