chevron-down Created with Sketch Beta.
April 26, 2023

A New Frontier

Human Subject Research Ethics in an Artificial Intelligence World

By Thomas Salazar

Click here for the audio version of this article

Introduction

In 2017, Stanford University researchers Michal Kosinski and Yilun Wang published their research paper titled “Deep Neural Networks Are More Accurate Than Humans at Detecting Sexual Orientation from Facial Images.” Their research involved training a machine-learning algorithm—more specifically, a deep neural network—to categorize human faces according to sexual orientation. To achieve this, the researchers obtained the photos of over 14,000 individuals from online dating profiles who self-identified as pursuing homosexual or heterosexual romantic relationships. Half of the pictures represented self-identifying homosexual individuals, while the other half represented self-identifying heterosexual individuals. Notably, the researchers trained the algorithm with only Caucasian faces. The algorithm training included Kosinki’s face as the prototypical White male, Kosinki’s girlfriend as the prototypical White female, Barack Obama’s photo as the reference for a “Black” face, and a stock photo labeled as “clearly Latino.” Concurrent to the algorithm learning pattern recognition in the facial features of selected photos, Kosinski and Wang asked human research participants to “use the best of [their] intuition” to distinguish the photos of those self-identifying as homosexual from the photos of those self-identifying as heterosexual.

The results of the research showed a clear trend: the algorithm was significantly better than humans at detecting sexual orientation. While humans were only slightly better than a coin-flip at detecting sexual orientation, the algorithm achieved an accuracy of 83% for women and 91% for men. Given these results, the researchers went a step further and claimed a possible biological link between facial structure and sexual orientation. Kosinski and Wang’s claims garnered extensive press coverage and opposition from LGBT activist groups for perpetuating stereotypes about sexual orientation and creating a “dangerous and flawed” algorithm. In response, Kosinski asserted that the study was ethically sound because it was “approved by [Stanford’s] IRB,” the institutional review board tasked with ethical review of all human subject research conducted at Stanford. But does obtaining institutional review board (IRB) approval necessarily mean this artificial intelligence (AI) study was ethically bulletproof?

Modern human subject research ethics developed under the guidance of several 20th century instruments, including the Nuremberg Code, the World Medical Association’s Declaration of Helsinki, and the Belmont Report. These guidelines focus heavily on the concepts of informed consent, risk assessment, voluntary withdrawal, and data privacy. However, traditional application of these concepts does not adequately account for the dangers of AI research with human subjects, which include risks that go beyond the umbrella addressed by existing regulations. Further, current regulatory requirements for what constitutes human subject research and which studies require ethical board review do not cover the burgeoning body of AI research studies using human subject data. This regulatory gap results in little to no ethical oversight for most research projects involving AI and human subjects. Given the increasing prevalence of AI human subject research, it is time to re-evaluate longstanding principles in research ethics. Notably, IRBs—supplemented by local, state, and national policies—provide a workable framework for ensuring the ethical integrity of AI human subject research.

Origins and Evolution of the Ethical Review of Human Subject Research

While human subjects participating in research today enjoy the protection of regulatory safeguards, these protections have not always been in place. During the course of the 1946 Nuremberg Trials, the world learned of the highly unethical experimentations Nazi and Japanese officials conducted on concentration camp prisoners during World War II. Many prisoners were subjected, without consent, to medical experiments that often resulted in death or severe disability. In response, the Nuremberg Trials court enacted the Nuremberg Code in 1949, which was the first international document outlining basic ethical principles for conducting human subject research. The Nuremberg Code primarily focused on the principles of voluntary participation in research and informed consent. In 1964, the World Medical Association followed suit by establishing the Declaration of Helsinki, providing ethical recommendations for doctors who conduct biomedical research. The Declaration of Helsinki mandated minimizing risk for research subjects, providing additional protections for vulnerable participants, obtaining informed consent from all research participants, and maintaining the privacy and confidentiality of research subjects. The United States never expressly adopted the provisions of the Nuremberg Code or the Declaration of Helsinki; however, their overarching principles manifest in U.S. regulations adopted for the protection of human subjects in research.

Before the United States developed robust guidelines for research ethics, research abuse in the U.S. was rampant. In the Tuskegee Syphilis Study, researchers denied African American participants treatment to syphilis for decades in order to study progression of the disease. Another notable study is the one known as the Milgram experiment, which involved participants administering seemingly painful and deadly electric shocks to other participants at the researcher’s request. These research abuses became highly publicized and resulted in the enactment of the 1974 Research Act. In turn, the 1974 Research Act established the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. The 1974 Research Act also led to the creation of IRBs across the country, serving as ethical committees for the review of human subject research. The commission published the Belmont Report in 1979, providing the IRBs a framework for the ethical review of research with human participants. The Belmont Report outlined three basic ethical principles in human subject research: respect for persons, beneficence, and justice. Subsequently, in 1991, the Department of Health and Human Services (DHHS) joined other federal agencies in adopting a set of uniform ethical guidelines enshrining the principles from the Belmont Report: this regulation became known as the Common Rule.

At its core, the Common Rule is a set of federal policies defining the organizational, procedural, and ethical requirements for the review of human subject research. Under the Common Rule, the goal of an IRB is to formally review and monitor human subject research to ensure subject safety, rights, and welfare are adequately protected. An IRB must include members with scientific expertise to review research studies, as well as non-scientific stakeholders and community members unaffiliated with the institution. The Common Rule further grants IRBs the authority to approve, require modifications to secure approval for, or disapprove research activities prior to research onset. In order to secure IRB approval, researchers must demonstrate that their study conforms with the requirements of the Belmont principles—respect for persons, beneficence, and justice. When appropriate, IRBs may require researchers to periodically submit study updates at regular intervals, a process known as continuing review. The latest amendments to the Common Rule occurred in 2018; however, the Food and Drug Administration (FDA), the agency responsible for oversight of medical research, has not yet adopted the 2018 version of the Common Rule.

Artificial Intelligence and Human Subject Research

The 21st century has witnessed unprecedented changes in the digital world, artificial intelligence, automation, and globalization. AI algorithms today can serve as a customer service resource for understaffed companies. They can diagnose post-partum depression by analyzing Twitter posts and detect pancreatic cancer based on Google searches. They can expose signs of credit card fraud and determine which individuals are at higher credit fraud risk. AI algorithms are ubiquitous in the transportation industry, where they play an integral part in navigation and route planning. They have been used in the criminal justice system for sentencing offenders, and in journalism for writing compelling news stories. But how much of the AI development process undergoes ethical scrutiny? How much of AI research falls under the Common Rule regulations?

AI Human Subject Research is Fundamentally Different Than Traditional Human Subject Research

When IRBs originated, the paradigms of human subject research were clinical trials and behavioral studies. Research like the Tuskegee Study and the Milgram experiment required participants to undergo experimentation in person and directly bear the burden of any potential risks. Once the research study concluded, so did the risks for the participants. IRBs and the Common Rule were developed in response to these paradigms, with a focus on maximizing the benefits of the research while minimizing the potential risks that may arise for participants. Thus, the relationship between research participants and human subject research regulations can be characterized as an ethical contract. While human subjects bear the burden of research risks, society benefits from the generalizable knowledge scientific research contributes. In turn, society protects research participants from undue risk and unreasonable harms by advancing regulatory safeguards. Within the realm of biomedical and behavioral research, the ethical principles delineated in the Common Rule and the Belmont Report have effectively protected the safety, rights, and welfare of human subjects for the last 50 years.

Unlike most biomedical and behavioral research, AI human subject research does not require participants to personally undergo experimentation—only human data is needed. Accordingly, the rise of AI research means that human subjects are turning into data subjects. Unlike the tangible dangers endured by human subjects in traditional research, the dangers to data subjects in AI research are less well-defined. Further, the infinite supply and availability of accessible, multi-use data means that anyone can be a data subject. This is fundamentally different from traditional human subject research, where human subjects normally opt into participating in research and the parameters of their involvement are more concrete. Further, the value of being a data subject in AI research continuously increases over time. Whereas generalizing from a human subject pool to the population only requires a statistically significant quota, the predictive model of AI research demands a continuous influx of information from data subjects.

This transition from human subject to data subject also changes the ethical contract between research participants and society. Because the goal of AI human subject research is to produce models that predict human behavior, society still receives the primary benefits of the research. However, the harms of AI human subject research are distributed between human participants and society, with minimal risks to data subjects. Further, while traditional human subject research risks are directly caused by research methods and occur upstream from results, AI human subject research risks generally occur downstream from results. These fundamental differences between traditional and AI human subject research set the stage for potential regulatory gaps in the ethical oversight of AI research.

Defining the Scope of AI Human Subject Research

AI comes in many different flavors and textures. For this reason, there is currently no comprehensive definition of AI: as AI continues to evolve, so does its definition. Today, AI encompasses a broad spectrum of software, including machine learning, deep learning, natural language processing, and neural networks. Within this spectrum, AI software can be developed through various techniques. For example, AI can be rules-based or data-based. In rule-based approaches, the AI algorithm will produce a pre-defined output based on the programmer’s rules, such as an AI that alerts a physician when a patient’s heartbeat decreases below a certain level. By contrast, data-based AI learns from data inputs and does not require pre-defined rules to reach a conclusion; one example is an AI algorithm that alerts a physician that cardiac arrest in a patient is imminent based on recognizing patterns from training datasets. AI algorithms can also be supervised or unsupervised, which refers to whether the algorithm learns using pre-labeled data to reach a particular outcome. Another important feature of AI software is that it ranges between locked and adaptive algorithms. While locked algorithms provide the same output to a particular input each time, adaptive algorithms continuously learn and may yield different outputs over time. Given the diversity of AI, defining the scope of AI human subject research can be a daunting challenge. However, the Common Rule provides a good starting point for developing a concrete definition of AI human subject research.

For an AI project to constitute human subject research, the project must involve human subjects and it must also be research. The 2018 revision of the Common Rule defines a human subject as a “living individual” about whom a researcher obtains, uses, analyzes, or generates “information or biospecimens through intervention” or “identifiable private information.” Most AI projects will fall under the “identifiable private information” prong of the human subject definition, since this prong covers research use of existing datasets. Identifiable private information is further defined as “private information for which the identity of the subject is or may readily be ascertained by the investigator or associated with the information.” The Common Rule then defines research as “systematic investigation, including research development, testing, and evaluation, designed to develop or contribute to generalizable knowledge.” Because most AI algorithms create models that predict societal behavior, their application correspondingly “contribute[s] to generalizable knowledge.” Based on these regulatory definitions, AI human subject research thus involves algorithms analyzing identifiable private datasets to understand and model humans or human conditions.

Dangers of AI Human Subject Research

Dangers to Data Subject

AI human subject research poses two significant dangers: risks to data subjects and risks to society. The risks to data subjects, like the risks to subjects in traditional human subject research, generally relate to obstacles in the consent process, subject privacy, and risk assessment. Informed consent is the foundation of modern research ethics, requiring participants to receive material information about the study from researchers, comprehend this information, and voluntarily agree to participate in the research. In the world of AI research, the informed consent process is muddled. Many AI studies analyze human-focused datasets already “on the shelf” before the research begins or involve data that straddles the line between public and private, providing researchers opportunity to forego the informed consent process. For example, a Canadian study analyzing biometric data gathered images from five million individuals attending Canadian shopping malls without obtaining consent. Another example is research-use of social media data. Notably, many social media users do not equate public posting to providing implicit consent for the data’s use in AI research. While AI researchers often rely on social media privacy agreements as a surrogate to informed consent, these agreements often go unread. Another obstacle in the informed consent process is transparency and understandability of disclosures. Given that the majority of AI algorithms are “black boxes,” analyzing data in ways researchers do not anticipate, research subjects may not receive sufficient initial disclosure to provide effective informed consent. Finally, AI human subject research presents issues with subject withdrawal: once an algorithm trains on human subject data, how can data subjects withdraw from the research?

Data privacy risks present the most tangible threat to data subjects in AI research. While anonymizing data is the minimum requirement to protecting data privacy, data subject re-identification is becoming increasingly common. One notable breach occurred in the “Tastes, Ties, and Time” study on the dynamics of social networks, where researchers publicly released “anonymized” student data that was quickly re-identified despite the researchers’ best efforts to protect subject privacy. The ability of AI algorithms to extract information and patterns from multiple datasets redefines the meaning of “deidentified.” Notably, AI algorithms can be subject to model inversion and model extraction attacks, using the outputs of an algorithm to recreate the data used to train the model. The expectation that human subject data will be repurposed and combined with third-party datasets in AI research further amplifies the risk of re-identification. Another potential data privacy hurdle in AI research is data ownership. It is common for social media data, employment data, and location data to reveal information about multiple individuals, often without their consent. Consider an AI study analyzing Twitter responses to a celebrity’s tweet: whose data is being analyzed, and who owns this data?

When it comes to risk assessment in the world of AI research, the Belmont principle of beneficence dictates that researchers must do no harm, maximize the possible benefits, and minimize the possible harms to data subjects. However, we are still in the early stages of grasping the true nature of the dangers AI research poses to data subjects. The risks of traditional human subject research to human subjects did not become apparent until decades of research abuses transpired. Because harms to data subjects are more abstract, it may be difficult to properly minimize the possible harms of AI research in the research design process. Consequently, the opacity of these dangers will likely result in data subjects being harmed in ways they did not perceive when opting into the research. One example involves training an algorithm on survey responses from potential homeowners and subsequently relying on the algorithm to deny a data subject’s mortgage loan. Notably, the world of data science has historically been disconnected from the ethical burdens associated with research on human subjects. Until a conceptual metric is developed for assessing the benefits and risks of research in a digital world, research abuse to data subjects in AI research will run rampant.

Dangers to Society

Perhaps the most publicized harms of AI human subject research are the dangers to society. Particularly, application of AI algorithms often results in downstream loss of opportunity, economic loss, social detriment, and loss of liberty to individuals and groups not associated with the research. Loss of opportunity refers to harms in the public domain, such as housing, education, and healthcare. For example, AI-based systems are commonplace in the buying, selling, and financing of homes. Similarly, hospitals use AI algorithms in telehealth to triage patients according to their health risk. Economic loss refers to harms that “primarily cause financial injury or discrimination in the marketplace for goods and services.” One notable example is AI algorithms that calculate credit risk, which can affect an individual’s opportunity to buy a house or get a loan. Social detriment refers to harms affecting an individual’s standing in the community. Algorithms can create filter bubbles, selectively promoting certain types of information to individuals in social media. Loss of liberty refers to harms that restrain an individual’s freedom. In criminal law applications, AI has been used in sentencing, prison management, and parole determinations. Because these differential opportunity and access harms can be both individual and collective, AI human subject research has the potential to restructure society as a whole.

In addition to presenting societal dangers based on which field an algorithm operates in, AI human subject research poses risks that hinge on how the algorithm is ultimately applied. The rise of AI research coincides with the rise of AI snake oil: AI purporting to predict human behavior or conditions it could not possibly predict. One example comes from employer hiring practices, where companies rely on AI algorithms to assess job candidate suitability from a 30-second video. AI human subject research is also subject to harmful dual use and misuse. For example, the United Arab Emirates, a country where homosexuality is a crime, could apply the algorithm from the sexual orientation study (informally known as the “gaydar study,” discussed in the introduction) to police citizens or tourists during the customs process. As an example of algorithmic misuse, consider hospital application of an algorithm to detect early stages of lung cancer but only trained on data from individuals with late stages of lung cancer. The tendency to interpret AI models as causative rather than correlative presents another danger of AI application. In the gaydar study, the researchers hypothesized the presence of a genetic causal link between facial structure and sexual orientation based on the algorithm’s accuracy at detecting sexual orientation from pictures. However, AI research at its core is pattern recognition and correlation interpretation; causation claims undermine the integrity of the scientific process. AI algorithms can also be inaccurate and flawed.  For example, Google’s image recognition algorithm labeled photos of people of color as “Gorillas.” If an algorithm produces false negatives or false positives, who is liable for the resulting harms?

On a more holistic level, the rise of AI human subject research necessarily implicates certain seemingly inevitable dangers: algorithmic bias, environmental harms, and labor displacement. Despite research design efforts to reduce algorithmic bias, AI algorithms are only as equitable as the data used to train them. For example, studies show that AI mortgage lending systems charge non-Whites higher borrowing rates than Whites for the same loan.   In a society plagued by the pervasive effects of systemic, institutional, and structural discrimination, the dangers of algorithmic bias will remain a challenge for AI human subject research. Algorithms also pose environmental harms: AI research contributes to increased carbon dioxide emissions, exacerbating global warming. Further, AI research is an “energy intensive” and time-consuming process, presenting resource allocation and resource utilization challenges. Lastly, AI human subject research will likely result in a major overhaul of the employment industry. For example, AI algorithms have been shown to perform as well or better than trained radiologists at providing quantitative assessments of clinical images. While AI automation signals the dawn of a new technological revolution, perhaps the employment industry will adapt to complement advances in AI technology.

Regulatory Limitations to the Ethical Review of AI Human Subject Research

In the United States, IRBs oversee the ethical review of all human subject research under the requirements of the Common Rule and supplementary regulations. For example, the FDA imposes additional requirements for human subject research involving drugs and clinical devices. Accordingly, AI human subject research must comport with the requirements of the Common Rule and additional regulatory guidelines, where applicable. The question remains: do current regulatory standards adequately safeguard data subjects and society from the dangers of AI human subject research? Based on guidance from the Common Rule, the umbrella of AI human subject research covers algorithms analyzing identifiable private datasets to understand and model humans or human conditions.

The “Private” Problem

To fall under the purview of the Common Rule, AI research must analyze private datasets; public datasets are not considered human subjects. However, there is a blurred line between what constitutes a public dataset versus a private dataset. The Common Rule defines private as “information about behavior that occurs in a context in which an individual can reasonably expect that no observation or recording is taking place, and information that has been provided for specific purposes by an individual and that the individual can reasonably expect will not be made public.” This definition of “private” is not concrete, but rather describes a reasonable individual’s expectations about data privacy. There is no regulatory guidance on whether social media data or purchasable data, to name two contentious examples, are private or public. Because IRBs interpret the definition of private differently across institutions, this ambiguity can lead to forum shopping and, in certain cases, AI research completely foregoing IRB review. Further, some datasets are neither public nor private, but rather falling within the purview of privacy agreements that identify applicable uses and restrictions. Most notably, the private/public distinction in the Common Rule stems from the assumption that research on publicly available datasets does not pose informational harm to data subjects. Yet, AI human subject research on “public” datasets can nonetheless be very risky to data subjects and society. One example is New York City Taxi & Limousine Commission’s public release of cab ride information in 2013, from which researchers could detect which drivers were devout Muslims and predict the home addresses of high-profile celebrities. Accordingly, the Common Rule private/public dichotomy is not only anachronistic in light of recent advances in AI research, but also too ambiguous to provide useful guidance for IRB ethical oversight.

The “Identifiable” Problem

AI human subject research must also involve identifiable datasets to fall under the requirements of the Common Rule, but are datasets every truly anonymous? The Common Rule defines identifiable as “private information for which the identity of the subject is or may readily be ascertained by the investigator or associated with the information.” Today, data subject identity can be readily ascertained from a “de-identified” dataset by combining multiple data sources. In one example, a research collaboration between the U.S. and China used machine-learning data triangulation to re-identify data subjects from a physical activity survey stripped of conventional identifiers. Notably, the 2018 Common Rule drafters recognized the meaning of identifiable is constantly shifting, requiring regular examination—at least every four years—of the notion of identifiability. Because AI research on “de-identified” human subject data can still pose significant harms to data subjects and society, compliance with existing regulations is insufficient.

The “Research” Problem

The Common Rule defines research as “a systematic investigation, including research development, testing, and evaluation, designed to develop or contribute to generalizable knowledge.” Within the realm of human subject research, the generalizable knowledge requirement refers to knowledge that can provide insight about humanity or human conditions. Thus, to fall under the purview of the Common Rule, algorithms analyzing human datasets must aim to advance knowledge about humans or human conditions. Conversely, AI research solely focused on product development does not constitute human subject research. The problem is that AI algorithms often serve both purposes: product development and human behavior predictive analytics. In 2012, the Facebook Contagion Study manipulated Facebook news feeds with positive and negative content to influence their users’ mood. The study aimed to improve the Facebook news feed experience for users, while also exploring how “emotional states can be transferred to others.” Studies with dual aims, like Facebook’s, can abuse regulatory loopholes to forego the requirements of the Common Rule. Further, the Common Rule generalizability requirement invites a dispute about whether AI research, which involves predictability rather than generalizability, fits within this definition. Arguably, because AI algorithms can forecast the likelihood of human and societal behavior, generalizability and predictability are analogous in this context.

Additional Regulatory Limitations

Perhaps the most significant limitation to IRB ethical review of AI human subject research is the Common Rule requirement that “the IRB should not consider possible long-range effects of applying knowledge gained in the research as among those research risks that fall within the purview of its responsibility.” This limitation means most of the dangers of AI human subject research—societal and group harms—are often invisible to IRB ethical review. One exception is algorithmic bias. The Belmont principle of justice requires IRBs to determine whether a study’s subject selection is equitable, which ameliorates the danger of algorithmic bias by requiring diverse datasets. Notwithstanding, the downstream harms limitation allows questionably ethical research, such as the gaydar study, to pass IRB muster.

Another regulatory limitation concerns the Common Rule “Exempt” categories, which denotes certain modalities of research as exempt from the regulatory requirements of the Common Rule and IRB approval because they are deemed low risk. Notably, the Common Rule exempts surveys, interviews, data collection from benign behavioral interventions, and secondary research uses of data from regulatory requirements. Since much of AI research involves social media, online survey, and medical record data, the Common Rule “Exempt” categories blur lines of authority and accountability. The Common Rule also expressly provides that the regulations only apply to federally funded or supported research. However, it is customary for academic institutions to apply the Common Rule requirements to all research, regardless of funding source. Finally, adaptive AI algorithms present unique regulatory challenges to the ethical review of AI human subject research. While the Common Rule provides a mechanism for the periodic and continuing review of previously approved research, there is currently no regulatory guidance on how IRBs should approach the ethical review of adaptive AI algorithms.

Short-Term and Long-Term Solutions to Bridge the Gap in Ethical Oversight of AI Human Subject Research

AI human subject research poses significant dangers to data subjects and society; however, current regulatory guidelines do not properly account for these dangers or the diversity of AI research on human-based datasets. Notably, the Common Rule allows much of AI research to forego IRB ethical review and approval. In order to move forward, regulatory action must specifically address the factors that induced the current regulatory gap in ethical oversight of AI human subject research.

Short-Term Solutions

While traditional human subject research ethics rely on principles from the Nuremberg Code, the Declaration of Helsinki, and the Belmont Report for guidance, no analogous overarching ethical documents exist in the world of AI research. Instead, national and international entities have independently developed ethical guidelines for AI research in response to the growing ethical concerns associated with the rise of AI technology. Within these diverse guidelines, some common themes emerge: transparency, beneficence, justice, accountability, and data privacy. Moving forward, in order to cultivate public trust in AI research, the DHHS should release a new set of ethical guidelines for AI research unifying these common themes. In doing so, DHHS would remove some of the burdens from the IRB ethical review process and integrate them into the research design process.

Another way to ensure researchers evaluate the hazards of their algorithms during the research design process is to require research education on AI ethics. Historically, researchers in the data science domain have not received robust ethical training because data science typically did not involve human subject research. However, the rise of AI research signals a new age for data science. Researcher education should focus on methods for anonymizing data, privacy-by-design features, recognizing upstream and downstream algorithmic harms of AI human subject research, and safeguarding the integrity of the scientific process. The CITI program, a comprehensive training system for research entities, offers the leading course on AI human subject research ethics. As AI research continues to boom, researcher education will prove critical in maximizing the benefits of AI human subject research while minimizing any potential harms.

Research institutions can also advance AI ethics in the short term by establishing AI academic centers. One example is Stanford’s Institute for Human-Centered Artificial Intelligence (HAI), which has a mission to “advance AI research, education, policy and practice to improve the human condition.” Another example is Google AI, aiming to apply algorithms in new domains and ensure equitable access to the benefits of AI research. Crucially, AI academic centers could establish binding institutional policies that go beyond the requirements of the Common Rule to bridge the current regulatory gaps in AI human subject research. Concurrent to institutional establishment of AI academic centers, agencies should release guidance documents to aid IRBs in the ethical review of human subject research. The leading agency in this regard is the FDA, regulating algorithms designed to “treat, diagnose, cure, mitigate, or prevent disease or other conditions.” The FDA has released several guidance documents for the ethical review of AI research under its purview, the latest of which proposes a comprehensive framework for regulating clinical algorithms. Notably, FDA risk-classification of AI devices is based not on the immediate potential risks to data subjects, but rather the downstream risks posed by the algorithm in clinical settings. Accordingly, FDA guidance recognizes that the critical risks of AI clinical research occur downstream from the study, justifying a departure from the anachronistic limitations in the Common Rule.

Long-Term Solutions

In the current regulatory landscape, IRB review of AI human subject research merely serves a compliance function. The current regulatory limitation in the Common Rule means the IRB stamp of approval does not necessarily designate that an AI study is ethical. If IRBs only serve as compliance boards, then the regulations do not carve out a prominent role for IRBs in AI human subject research. However, we should consider IRBs as serving both compliance and ethical review functions. When a research topic or methodology transgresses public norms and expectations, even when compliant with the regulations, IRBs should ensure there are adequate ethical justifications for the research moving forward. Further, as ethical review boards, IRBs should develop a framework for ethical review of AI research outside the narrow scope of the regulations. After all, IRBs represent a response to decades of research abuses and an effort to restore public trust in the research enterprise.

Even when solely considering the compliance function of IRBs, the IRB model provides a useful framework for the ethical review of AI human subject research. IRB review involves an authoritative and deliberative process, including the ability to disapprove projects before research onset. Consequently, IRBs can prevent or limit the adverse impacts of unethical AI algorithms. IRBs also comprise diverse experts and stakeholders, making recruitment of AI expertise for board discussions a feasible endeavor. Notably, the Common Rule provides a mechanism for IRBs to re-evaluate algorithms if changes to the algorithmic model lead to changes in risk profile. While IRBs currently lack a comprehensive set of guidelines for how to navigate the AI research sphere, the Belmont principles should be the floor—not the ceiling—guiding the IRB review process.

However, IRBs cannot be effective in the ethical review of AI research unless core concepts in the Common Rule are redefined. Particularly, the definition of human subject, the private/public distinction, the downstream harms limitation, and the “Exempt” categories require further scrutiny in light of advances in AI technology. If the human subject definition only includes identifiable datasets, then the regulations should address safeguards and methodologies to ensure data subject anonymity. Similarly, the regulations should provide a more concrete definition of private datasets given the ongoing debate about whether purchasable datasets and social media datasets are private or public. Since AI research poses unique downstream harms to society, the Common Rule downstream harm limitation should carve out an exception for AI research or be removed altogether. Lastly, the DHHS should revise the Common Rule “Exempt” categories to allow formal IRB review of AI human subject research involving surveys, interviews, benign behavioral interventions, and secondary datasets. The “Exempt” categories constitute research modalities normally thought to pose minimal risk to human subjects; given the well-documented harms of AI research to both data subjects and society, the notion of minimal risk with respect to the Exempt categories needs to be reexamined.

IRBs often work in conjunction with ancillary review committees for “approvals that are in addition to IRB approval of human subject research and that are required by institutional or funding entity policy(ies) or by regulation, statute, or law.” Ancillary review committees originate from agency guidelines and generally evaluate controversial research methods. For example, Institutional Biosafety Committees (IBCs) review research involving recombinant DNA and were created by the National Institutes of Health (NIH) in response to public concerns about genetic engineering. Similarly, the Radiation Safety Committee is another NIH-based ancillary board ensuring research subjects are not exposed to more radiation risk than necessary. The Office of Human Research Protections (OHRP), the DHHS office responsible for minimizing human subject research harms, should release guidelines establishing ancillary review committees for AI research. An AI ancillary review committee would comprise experts in the AI field to readily examine data sources, data lifecycles, historical bias, and desired model outcomes in AI research. Accordingly, IRBs and AI ancillary review committees could work in tandem to protect the rights and welfare of data subjects and society in AI human subject research.

Supplementary to the institutional ethical review process for AI human subject research, legislatures could enact additional protections for data subjects. Data privacy legislation is especially critical when considering the opacity of risk assessment in the digital world. Recently, state legislatures from Colorado and Virginia enacted Privacy Acts aimed to provide data subjects more control over their personal data. However, both legislative efforts exempt research covered under the Common Rule and FDA regulations from the statutory data privacy requirements. Congress could also enact a nationwide Data Privacy Act, similar to the efforts of the European Parliament in establishing the General Data Protection Regulation (GPDR). GDPR mandates privacy notices to data subjects for any lawful processing of data subject information; in some instances, GDPR even requires data subject consent. AI developers under the purview of GDPR must also explain in plain language how algorithms will analyze data subject information. These measures ensure data subjects are adequately informed of their fundamental rights and freedoms when it comes to use of their data. Further, the transnational nature of GDPR guarantees that data protection laws are consistent across EU member states. Thus, Congressional action to ratify GDPR-esque regulation would advance the integrity and transparency of AI research throughout the United States.

Perhaps the most holistic approach to regulating AI human subject research will require creating an agency for AI algorithms. Agencies have an institutional advantage over legislatures: they are more flexible, can be limited or broadened, can specialize on a specific topic, and can recruit appropriate expertise. Further, agencies can create federally uniform regulation independent of political pressures. Notwithstanding these advantages, agencies suffer from accountability and transparency issues. The evolving nature of AI technology means AI policies should be durable or readily amendable. While amending agency regulations can be a long and arduous process, the agency model allows for periodic policy examination in response to advances in AI technology. Accordingly, an agency for algorithms would strive to find the right balance between promoting scientific innovation and protecting the rights and welfare of data subjects and society.

Conclusion

The rise of AI research has profound implications for the human subject research ethics landscape. As underregulated AI research continues to boom, so do the potential harms to data subjects and society. However, progress cannot be made at the expense of public welfare. The current research ethics regulatory gap signals an opportunity for local institutions, state, and national governments to collaborate on developing durable AI policies and procedures. Particularly, the IRB model—supplemented by local, agency, and congressional action—provides a workable framework for the ethical review of AI human subject research. Moving forward, AI regulation must strike a proper ethical posture to ensure the rights and welfare of data subjects and society remain a priority and not an afterthought.

    Thomas Salazar

    Chief of Research Oversight and Compliance, Travis Airforce Base, Fairfield, CA

    Tom Salazar is the research oversight and compliance officer at Travis Air Force Base’s Clinical Investigations Facility in Fairfield, California. He provides oversight to the human protection and animal welfare research programs at Travis Air Force Base and serves as the in-house advisor to the Clinical Investigations Facility on all collaborative research development and research compliance matters. Mr. Salazar can be reached at [email protected].

    Entity:
    Topic:
    The material in all ABA publications is copyrighted and may be reprinted by permission only. Request reprint permission here.