The past few years have brought a paradigm shift in privacy and data protection laws. Simultaneously, the drivers for data-driven healthcare research and innovation are more apparent than ever in the midst of the current COVID-19 pandemic.3 To meet public health needs, healthcare companies must find ethical, scalable and practical approaches to data use to advance innovative technologies and products. Without the building blocks for a clear, practical and well-established framework on which healthcare companies can rely when using data, well-intentioned privacy laws will unnecessarily stifle the development of and patient access to novel and personalized healthcare products.4 This article sets forth the elements to build a framework to appropriately navigate the complex web of oft-conflicting privacy and data protection laws and regulations, enabling healthcare companies to harness the mechanisms already built into relevant laws to drive patient-focused innovation and research while simultaneously acting as honorable data stewards in the furtherance of data protection principles.
Challenges to Data-Driven Healthcare Research and Innovation
Rapidly changing technology, and the pervasiveness of data breaches and data misuse affecting both corporations and individuals, have driven a paradigm shift in privacy and data protection laws. In the wake of newly enacted laws, such as the General Data Protection Regulation (GDPR)5 and the California Consumer Privacy Act (CCPA),6 other similar laws have been proposed, amended or enacted around the world that often mimic or borrow heavily from these trendsetting laws,7 although these new laws are sometimes “restyled” by incorporating significantly different provisions, such as data localization.8 In countries where privacy laws are applied at both a federal (or national) and state (or local) level, each separate jurisdiction may develop its own legal construct, thereby further complicating the complex patchwork of laws applying to personal data. For instance, in 2019 25 U.S. states and some U.S. territories introduced and/or passed over 150 different privacy laws.9While ushering in important and necessary data protection improvements, such as increased accountability requirements, strengthened data subject rights and expanded enforcement authority, these new laws have also created (or, perhaps reinforced) a complex minefield of requirements that introduce a number of challenges to the ability of healthcare companies to conduct data-driven research throughout the world and among diverse patient populations. For example, the definition of deidentification (or its equivalent),10 which frequently differs from one country to another (or from one state to another in the United States),11 makes it difficult to utilize one single deidentification concept. This lack of harmonization impedes development of comprehensive and responsible data strategies, particularly for smaller to midsize companies.12 Beyond the laws themselves, implementing interpretations from regulators, authorities, or - in the context of clinical trials - ethics committees (or institutional review boards) result not just in individual derogations, but also in varying and often inconsistent interpretations that do not sufficiently uphold the intended balance between data protection against other fundamental rights, such as to health and healthcare that are directly dependent on innovation.13 These factors not only impact healthcare companies’ ability to conduct research to meet the public need for data-driven and novel healthcare products, but also their ability to provide suitable and timely treatment to individuals.14
An over-reliance on traditional data processing models may also no longer be suitable in light of the shifting legal requirements. As an example, consent-based data processing models have become increasingly problematic in the healthcare space. This stems in part from the perennial debate about how broad consents can be, particularly when data is used for secondary purposes. It can be even more problematic when consent as a basis for data processing gets confused with consent as the safeguard for human rights, as traditionally occurs in healthcare settings (e.g., informed consent). When used as a legal basis, consent must meet high standards that are not easily attainable for a number of reasons; among them the potential power imbalances between patient and healthcare provider (or study sponsor),15 or the resulting disqualification from receiving healthcare or participating in a study if the data subject withholds consent. Consent also implies a data subject’s ability to withdraw from the past, present and future use of the subject’s data, which may not be possible for the healthcare company to effect where it requires use of that same data to comply with its legal obligations.16 For example, a laboratory may be precluded from immediately honoring a data subject’s erasure request, as it may be required to maintain the personal data for a period of time to retain its certification and/or licensure as a laboratory or meet other legal and/or regulatory obligations.17 Thus, consent as a model under recent laws may be inherently in conflict with the record keeping requirements prevalent in healthcare, and particularly in research.18
Recent initiatives that rely on new types of data to establish safety and efficacy in healthcare products have added another challenge. Increased regulatory expectations for healthcare companies to collect and utilize real world data (RWD) to engage in postmarket surveillance activities, combined with ever-increasing openness to accepting RWD in submissions for product clearance and approval,19 is seemingly at odds with the principles of certain data protection regimes. It is well recognized that RWD can provide a more accurate picture of how different medical interventions and products may impact an individual and the public as compared to traditional surveillance methods.20 While regulators charged with reviewing and considering RWD in relation to such products increasingly rely on and require access to such data, there is no clear, universal path for healthcare companies to appropriately collect, retain and use such data for these purposes, nor to manage data subject requests that might limit processing. Of some benefit to resolving this conflict is the fact that “[u]nlike clinical trials ... the element that spins RWD into ‘gold’ is not per se the individual patient records, but the efforts taken to curate, aggregate, and analyze a large volume of data.”21 Use of RWD in healthcare research and for regulatory purposes requires a number of value-enhancing tasks (such as curating, and/or aggregating/deidentifying the data, cross-linking, and analyzing), some of which are the very same tasks necessary to ensure a data subject’s right to privacy is not at issue.22 Only in limited circumstances would RWD necessarily need to retain some of its identifiable elements, such as for safety or public health purposes. A robust, curated data set requires access to a significant amount of both structured and unstructured data to be viable. Without a strong privacy framework to support the appropriate collection, use and processing of such data, no healthcare company can reach that point.
Building Blocks to “Level” the Need for Healthcare Innovation and Potential Privacy Risks
>Healthcare innovation is sustained by and dependent on the continued availability of patient data to meet researchers’ needs. Indeed, “personal data has a profound and often understated impact on many aspects of healthcare.”23 In corollary, scientific and healthcare research are and have been a critical component of advancing society requiring a commensalism between individual rights and the interests of public health. There is a false dichotomy currently posed where healthcare companies must choose between the privacy rights of individuals and their ability to advance innovation. However, these two options need not be juxtaposed, as both can be accomplished in an ethical and responsible way within the framework of existing privacy laws. Healthcare companies already operate in a very heavily regulated environment, where key controls such as data minimization and data subject access are common practices and embedded into existing policies and procedures. Furthermore, patients are largely supportive of sharing their data for research purposes, including to healthcare companies.24
Reaching this crucial balance is not something that healthcare companies can do alone; it requires buy-in from and cooperation with other stakeholders, including regulators and patients. It also requires recognition of the role that healthcare companies take on as ethical data stewards with legitimate and necessary needs for data access that differentiate such companies and data uses from others.25 To ensure that the privacy and data protection rights of individuals do not come at the cost of their health and care, efforts should be made by regulators and legislators to harmonize new data protection requirements with existing privacy frameworks or to incorporate exemptions or safe harbors for data uses into any new laws where similarly sufficient protections are already in place.
The following sections will set forth the building blocks to form a framework that can be built upon existing privacy and data protection laws to enable ethical data use in the furtherance of innovation and public health.
Define Healthcare Research to Promote Ethical Data Use
Research is and will continue to be the cornerstone to innovation for healthcare companies. The increasing dependence on and importance of secondary data use raises the question of how much data processing should be permitted in the name of research.26 This can be even more challenging when healthcare companies must depart from more traditional areas of data use (e.g., clinical studies; postmarket surveillance) that are safeguarded by controls and industry standards to protect the rights, dignity and safety of human research participants and patients. Those same mitigating factors are often not present in the ever-increasing avenues of secondary research where institutional safeguards, such as ethics committees or institutional review boards, and standards, like the U.S. Federal Policy for the Protection of Human Subjects (i.e., the Common Rule), Good Clinical Practice (GCP) Guidelines or other human subject protection requirements, may not necessarily operate or apply. The underlying concern that the definition of “research” may be stretched too far has hindered implementation of existing laws and the development of new fit-for-purpose laws, simultaneously preventing widespread adoption of research-enabling provisions.27 These concerns ignore the fact that even for-profit or commercial companies performing research may be doing so in the interest of the public good, and not necessarily to the detriment of individual privacy rights.
One way to integrate more complex data use in healthcare research in balance with the individual right to privacy is to develop an industry-wide, global definition of “healthcare research” that is practical, but still provides reasonable limits to data use, including any secondary or further data use. Such a definition is currently lacking. For example, despite the provisions in GDPR that give deference to research,28 there is no definition of the term at the European Union (EU) level.29 This is comparable in the United States, where the definition of “research” under the Common Rule differs with definitions at the state level and even under FDA, and is limited in application.30 Employing a solid, harmonized definition of healthcare research, whether as an industry or in collaboration with regulators, can address some of the concerns of regulators and authorities in implementing research-related components of laws, like GDPR. A definition of healthcare research31 should encompass applied and fundamental research activities and support technological advancement, while incorporating ethical standards. Any attempt to define the term would necessarily require the input of relevant stakeholders, including healthcare companies, to ensure that the ultimate output is fair, actionable and practical, but also capable of being distinguished among the different types of research that may be classified under such a term. By employing a clear and practicable definition of healthcare research, and creating a companion method to remove identifiers from the personal data that are unnecessary to meet stated scientific purposes, healthcare companies may be able to utilize the research exemptions already existing in a number of privacy and data protection laws to their full potential.
Apply Practical but Strong Forms of Anonymization / Deidentification
When data has been rendered so that the individuals associated with the data can no longer be identified, that data is typically not in scope of applicable privacy laws. The specific methods of achieving this deidentification vary almost as frequently as the definitions for the terms describing these activities, i.e., “anonymization” / “deidentification.”32 There is a lack of consistency under applicable privacy laws in establishing the point at which the risk of re-identification has sufficiently dissipated. Adopting a clear, practical, and harmonized approach to deidentification supports data-driven research and healthcare innovation, while promoting the principle of data minimization, a concept already embedded in the privacy requirements applicable to healthcare companies. These techniques may also serve additional purposes; for instance, as a control mechanism to meet some of the challenges of cross-border data transfers, particularly those between the EU and United States, which was made more cumbersome and opaque in light of the recent decision of the Court of Justice of the European Union (CJEU) in “Schrems II”.33
Instead of setting forth a specific and prescriptive method, a practical approach to deidentification should look at re-identification risks relative to the entity (or entities) responsible for the data. An approach of this sort, rather than one that only considers deidentification with no possibility of re-identification, is supported by the CJEU’s Breyer decision,34 and permits a more principled and ethical approach to deidentification in general. The expert method under HIPAA is more akin to this risk-based approach than the safe harbor method;35 but does not go far enough in that the method inherently creates resource, process, and access challenges that do not necessarily support a sustainable path for all proposed data uses by healthcare companies. Instead, healthcare companies should be able to create and systematize a consistent and repeatable approach that can be performed in-house, with the appropriate data protections in place. Factors that should contribute to a risk analysis should consider the existence of lawful means of re-identification, implementation of technical and organizational measures (TOMs) to protect and safeguard the data (e.g., contracts and downstream limitations, and, where done properly, internal firewalls), and mitigating controls to sufficiently reduce the risk of re-identification. A risk-based approach to deidentification can be layered with specific requirements to meet local law; for example, the requirements for either method of deidentification under HIPAA. These mitigating factors, even when placed on top of strongly pseudonymized data (i.e., data of which indirect identifiers have been modified/encrypted/hashed) should render data sufficiently anonymous under GDPR and other similar standards, including HIPAA deidentification. This framework is not static, however, and must be subject to frequent re-assessments in light of factors such as new technology, new data (including additional commingling of data), publication of said data, or other external factors - some of which may not be within the control of the healthcare company.
Adopt More Scalable Data Use Models Beyond Consent
Consent is not the magical unicorn of data processing and should not be used as the bandage to fix data processing problems.36 The misconception that consent is superior and should be preferred over the other legal bases set out in GDPR (and similar laws)37 has led to an over-reliance on consent for data processing. As discussed above, this is problematic when the technical, black letter requirements for consent are extremely difficult to meet. Additionally, broad consent models create challenges in secondary data use cases where the future purpose(s) or data processing are unknown at the time the individual provides consent, such as for biobanks or downstream/retroactive data analytics.38 The writing is on the wall that while consent may be the more established and preferred model, consent should be what organizations rely on when other legal bases are unavailable and not a default or “catch-all”.39 Thus, legislators should either adopt alternative options to support scalable, responsible data use models for healthcare research, such as increased adoption or acceptance of alternative bases for data processing (like healthcare research exemptions or adopting models similar to the institutional review board waiver model). Alternatively, requirements for the use of consent should be revisited with the specific purpose of refitting consent to include secondary data use, such as by accepting the use of broad consent or providing model language better suited to permit further use.
Utilize Existing Privacy Constructs that Provide a Sufficient Basis for a Risk-based Framework
In countries where some form of comprehensive data protection laws is already in place, the discussion can turn from implementation of existing law to whether additional laws are needed to govern specific types of data or data uses. This reconstructive approach fatally ignores the idea that existing legal frameworks may already accommodate some - but not all - application of even novel data types and use. Instead, regulating authorities and stakeholders should jointly draft guidance for implementing existing mechanisms in specific settings (like healthcare), focusing such guidance on enabling these data use cases without creating barriers that impede efforts to put the guidance into effect. One example of such a collaboration is the European Medicines Agency (EMA) policy on publication of clinical data (Policy 0070).40 Drafted with healthcare industry input, the policy provides guidance for implementing a risk-based approach to anonymization and was found to be sufficient for providing “adequate privacy protection for patients.”41
Another alternative would be to expand existing laws or regulations (or implementing guidance) governing healthcare companies and/or healthcare-related data to accommodate the novel business types and ways that healthcare companies interact with data and data subjects. There are a number of healthcare-related research activities that do not squarely fit existing privacy or data protection laws, but where the healthcare companies engaging in these activities nevertheless abide by those requirements to meet internal or industry standards. An example in the United States is where a healthcare company does not meet HIPAA’s classic definition of a “covered entity” or “business associate,” nor does it handle “protected health information,” but nevertheless meets the various other elements of HIPAA in the conduct of its operations, including research. Even if these “HIPAA-adjacent” entities meet the law’s requirements, they are unable to directly avail themselves of some of the advantages of being in scope of HIPAA,42 and must instead rely on other privacy frameworks.43 Similarly, whereas the Common Rule applies only to federally funded research,44 many healthcare companies in the United States broadly follow Common Rule requirements without the added benefit of the flexibility provided under that framework, such as potential exemption from other applicable laws.45 Finding an appropriate way to formally include these additional and perhaps peripheral activities under the umbrella of existing law or regulation even when such laws and regulation do not directly apply (i.e., when the research activities and operations of the healthcare company meets the requirements in all but name), can also contribute to building this risk-based framework without adding to the already complex patchwork of laws.
The hindrance for adopting this approach comes down to the need for healthcare companies and researchers in general to earn the trust of regulators, consumers, and the general public over how and why they use personal data. Healthcare companies will also need to demonstrate that the data they use and share are not subject to types of mass surveillance at issue in Schrems II. With deidentification, the concern - bolstered by a number of sometimes conflicting studies46 - is that the benefitting party or a downstream recipient could in the future attempt to re-identify the data. This concern, however, ignores the other mechanisms in place to prevent such behaviors and the obligations on companies to continue monitoring existing TOMs. Acknowledging and addressing this fundamental trust issue is an imperative foundation that healthcare companies must build to construct a suitable risk-based framework to support data use in healthcare research.
Trust as the Cornerstone to Building a Framework to Facilitate Healthcare Research and Innovation
Private industry has trust issues when it comes to data use, something that is reflected in major decisions by courts and regulators,47 and evident in the frequency of articles in the media describing data misuse. This apprehension to trust is perhaps somewhat warranted. If society is expected to entrust healthcare companies with its sensitive health data, these companies must demonstrate that they can act responsibly with data.
Protect Data and Reduce Re-identification Risk with Strong Technical and Organizational Measures
The implementation of strong TOMs is key to ensuring the protection of personal data and, ultimately, to gaining and maintaining the trust of the public. While adequate technical and security controls are already required by most privacy laws, and help to mitigate risk,48 the failure to have adequate controls - or use of puffery to describe those controls - can seriously hinder a company’s ability to earn trust. Technical measures to safeguard data from accidental or unlawful destruction, loss, or unauthorized access by third parties and other privacy-enhancing techniques should be risk-based, monitored and updated based on changing technology, and can include deidentification practices, blockchain technology, pseudonymization and encryption.49 These measures can also reduce the risk of or prevent data re-identification.
Internal data governance practices are also crucial for proper data management, and can enable responsible secondary data use and data sharing. The implementation of data governance boards, for example, can help facilitate such practices and even help reduce the risk that deidentified data is recombined to facilitate re-identification. These practices should be adopted or strengthened by healthcare companies to foster trust because it puts companies in a position to set and enforce their own standards of data management, ultimately requiring companies to be more mindful of why, how, when, and what data they process.
Introduce Industry Level Accountability Efforts and Strengthen Internal Accountability
Many existing privacy laws and standards (such as the GCP Guidelines) associated with healthcare companies already impose accountability requirements that are enforceable against them. Such measures may include requirements to maintain records of processing, conduct privacy impact assessments, and handle personal data in accordance with established principles. This may not be enough, however, to gain buy-in from the general public and legislators for widespread data-driven healthcare research and product development.
Industry-level accountability measures, such as self-certification frameworks, offer a middle ground that limits overregulation while allowing healthcare companies to gain an additional layer of “trustworthiness.” Certification programs and codes of conduct are solutions that can address this longer term, so long as they are properly administered and have the backing of regulators.50 An enforceable self-regulatory framework may also be a viable longer-term solution and can perhaps assuage concerns that some have with so-called HIPAA-adjacent data processing in the absence of new or amended regulations.51 If done properly and in conjunction with the right stakeholders, these tools may also serve to mitigate some of the concerns raised by the CJEU in Schrems II, particularly given that many data processing activities conducted by healthcare companies are less likely to be the target of government surveillance. Industry-based frameworks, like the Payment Card Data Security Standards (PCI-DSS) in the finance industry, have been generally successful.52 There are some general frameworks (e.g., ICO 27001) and industry-specific frameworks (e.g., HITRUST) that could cover some data processing activities by healthcare companies; however, these frameworks do not cover all of the data processing activities in scope for such companies and may not be scalable globally.53 In the absence of a suitable alternative, it is also not a practical solution to have healthcare companies wait until these measures exist. In the absence of any readily available solutions of this nature, healthcare companies may need to rely on the other measures.
Promote Public Awareness and Education as a Means to Bolster Transparency
Transparency is a key data protection principle codified universally in privacy laws. Transparency is also an important factor for building and attaining the trust of the public and regulators, as the absence of information about data use can lead to unwarranted speculation. For data-driven healthcare research, this translates to increasing awareness about how and why data will be used and data subjects’ rights regarding their data. Transparency, therefore, is a good tool for data subjects and regulators to hold companies accountable for data misuse, and thus a building block for any framework that enables healthcare research and innovation.
However, achieving transparency about data use in healthcare research is challenging, particularly with secondary data use cases where it may be difficult to provide adequate notice to data subjects. This is even more true with regard to RWD, where the data is not collected from the data subjects themselves. It may be more prudent to explore novel means of providing adequate information to data subjects, rather than relying on traditional methods. For example, GDPR provides an exception to the notice requirements when notice is impossible or would involve a disproportionate effort.54 When the processing of data is conducted for healthcare research purposes, researchers might be able to rely on this exception to enable secondary use or use of banked tissue samples, provided that appropriate safeguards are implemented. Promoting acceptable non-traditional notice offers healthcare companies a viable avenue to ensure that the public understands how researchers are using such data, as well as the benefits of such research and data use. This can be accomplished, for example, through public education and awareness campaigns, whether organized by a single healthcare company or industry organizations. The COVID-19 pandemic has created a storyline that could provide a more tangible message to help the general public, legislators, and regulators understand the importance of all sorts of data and data use cases in healthcare research, and not just the data traditionally associated with clinical trials. Guidance from regulators on how to properly disseminate information in this manner, as well as how to partner on such efforts, could also add clarity and ensure that information provided to the public is adequate and non-misleading, and is generally in the furtherance of public health.
Conclusion
Data-driven healthcare research and privacy should not be seen as polarizing forces, but rather complementary and necessary components of healthcare innovation that can be bridged by responsible and ethical data use practices. Building a responsible framework that enables scalable and ethical data-processing should not be a Herculean task given that many of the tools needed to slay the proverbial Hydra can be found in existing laws, standards and practices. Other such tools can be forged from the long history healthcare companies have as a trusted data steward in many of their more traditional data processing activities. Given the criticality of continued innovation in the healthcare space, true collaboration between the impacted stakeholders is required to ensure the synergies found in leveraging a risk-based framework to approach data-driven healthcare research and innovation. Healthcare companies need to work both independently and as an industry to earn the trust of regulators and the public, while lawmakers and regulators need to support a framework that promotes consistency and scalability while allowing for an acceptable level of risk. At the end of the day, a suitable, risk-based framework can help the future needs of patients while protecting their current interests and rights with respect to their data.