chevron-down Created with Sketch Beta.
January 10, 2025 Feature

Generative AI and Copyright Law: Current Trends in Litigation and Legislation

Jim Rosenfeld, Bianca Chamusco, and Kathleen Farley

Generative artificial intelligence (AI) is causing courts and legislators to reassess basic tenets of copyright law: What level of human involvement in the creation of an AI-generated work makes it copyrightable? Can the providers of large language models (LLMs) be held liable for infringement based on the input of millions of copyrighted works to train their systems, or on the output of works similar to previously inputted copyrighted works? Do current laws adequately protect against the replication of a performer’s voice to create new musical or other works? Clear answers to these questions have not yet emerged but are in the judicial and legislative works.

This area is evolving quickly. New lawsuits are filed every week. Scores of legislative proposals are pending, and many have been passed into law. California’s governor signed 18 AI-related bills into law on September 29, 2024. While this article does not purport to cover every case or piece of legislation—any effort to do that would be incomplete and quickly become outdated—it provides a high-level snapshot of the current state of affairs, identifying key issues, trends, and developments in courts and legislatures in the United States, particularly those of interest to media and copyright lawyers.

First, we summarize the key trends and issues in current litigation. Courts are grappling with (1) whether and when AI-generated works are copyrightable, (2) whether training generative AI models on copyrighted works infringes the works, (3) whether the models themselves could be infringing derivative works, (4) whether AI-generated output infringes the copyright of works used to train that model, and (5) whether AI generation of works violates legal proscriptions on the removal or alteration of copyright management information.

Second, we look at legislative trends. While federal legislation is still under consideration, state legislatures have passed laws addressing (1) the use of deepfakes and other digital replicas in (a) election-related communications, (b) sexually explicit materials, and (c) virtual performances created from actual artists’ voices and identities, as well as (2) requiring disclosure of AI training methods and materials.

Recent Case Law

Pending cases have focused on copyrightability, infringement and fair use, and the removal of copyright management information.

Copyrightability: Does Copyright Law Protect AI-Generated Work?

Although “the law is clear that copyright protection in the United States is limited to works of human authorship,” plaintiffs have tested the boundaries of what exactly fits that description. As part of an ongoing attempt to secure copyright registration for the AI-generated image A Recent Entrance to Paradise, the plaintiff in Thaler v. Perlmutter appealed the district court’s grant of summary judgment in favor of the Copyright Office, which in 2019 had refused to register the work because it was generated by machine. While the D.C. Circuit has not yet ruled on Thaler’s challenge as of the date of this article, it seems unlikely to disturb the decades of precedent recognizing human creativity as central to copyright.

A similar case, Allen v. Perlmutter, presents a closer question. Challenging the Copyright Office’s denial of copyright registration in the award-winning (and partially AI-generated) image Théâtre D’opéra Spatial, Allen compared his use of the AI tool Midjourney to a film director asking a cameraman to shoot multiple takes of a scene. Allen described how he iterated his prompts more than 600 times to generate an image matching his artistic vision.

As generative AI models like Midjourney, Stable Diffusion, and Stability AI explode in popularity, the line between human- and machine-generated work becomes ever blurrier. However, a 2023 Copyright Office policy guidance document offers insight into how courts are likely to view the human authorship requirement in the age of AI. Emphasizing that the inquiry will necessarily proceed on a “case-by-case” basis, the guidance asks whether the “traditional elements of authorship”—that is, literary, artistic, or musical expression or elements of selection and arrangement—were “actually conceived and executed” by a human being or by machine.

When an AI model receives a prompt from a human user and produces complex literary, visual, or musical works in response, the traditional elements of authorship are controlled and executed by the AI, and the work is not eligible for copyright registration. But where the human user exercises creative control over the expressive elements of the AI model’s output, copyright registration may be available.

Infringement: Issues Arising in Recent Cases

Does Training Generative AI Models Infringe Copyright?

Direct infringement by reproduction

Most plaintiffs in AI-related copyright litigation assert direct infringement claims against AI platforms under the theory that AI models infringe their copyrights because the models are trained on copyrighted works. Plaintiffs in these cases generally allege that a given AI platform accessed copyrighted materials and made unauthorized copies of them before feeding them to the AI for training purposes, thereby implicating the copyright owner’s exclusive reproduction right.

For example,in Tremblay v. OpenAI, Inc., the plaintiffs are book authors who claim “the reason ChatGPT can accurately summarize a certain copyrighted book is because that book was copied by OpenAI and ingested by the underlying OpenAI Language Model . . . as part of its training data.”

Central to that charge is the idea that ChatGPT can produce summaries because it “retains knowledge of particular works” it copied. The allegations in the closely watched New York Times Co. v. Microsoft Corp. litigation are similar: The New York Times claims that defendants’ LLMs “were built by copying and using millions of The Times’s copyrighted news articles,” and that is why their AI models can generate output that recites The New York Times’s content verbatim. This direct copyright infringement theory has already survived a summary judgment motion in an early case involving content “scraped” from the legal research platform Westlaw.

The ultimate success of these direct infringement claims likely will depend on fact-intensive analyses of how generative AI systems work—a subject about which the public knows surprisingly little. Some defendants and scholars have argued that generative AI models do not retain the original materials used for training; instead, the models “digest” them in order to learn how human language functions, much like a human absorbs information when reading a book. They argue, therefore, that any copying of copyrighted material may well be transitory, incidental, and therefore noninfringing.

But the defense that defendants seem most likely to raise is fair use. That inquiry considers four factors: (1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copyrighted work.

The sparse AI caselaw to date yields little in the way of guidance on these factors, leaving AI platforms a collection of “long-standing and widely accepted precedents” that may or may not generalize to the generative AI context. AI advocates point to Authors Guild v. Google, Inc., a Second Circuit decision holding that the Google Books digitalization project was a fair use. There, Google made digital copies of “tens of millions” of copyrighted books and scanned them into a searchable database. The advocates claim that the case stands for the legality of the wholesale ingestion of copyrighted works as long as the use is sufficiently transformative—as the Google Books search function was held to be.

Others are less sure that this precedent translates to generative AI. The Copyright Alliance maintains that the Second Circuit’s holding was limited to the facts of that case and influenced in part by the steps Google took to secure the books it scanned into its database. Here, unlike in Authors Guild, the Alliance argues, generative AI is not providing “factual information about the copyrighted works,” such as how many times the word “whale” appears in Moby Dick. “Instead, most generative AI reproduce and draw on the expressive elements from the copyrighted works as part of a process that results in works that would often act as market substitutes for the training materials[.]”

Definitive answers about AI fair use may not arrive anytime soon. And the denial of the parties’ cross-motions for summary judgment in Thomson Reuters Enter. Centre GmbH v. Ross Intel. Inc. implies that there may not be a categorical answer. The decision suggests that whether a given AI model’s use is “fair” may depend on contested facts, such as the method by which that particular model is trained, how much (if any) of the original copyrighted work is retained (and for how long), how the AI model functions, and the nature of its AI outputs.

Also relevant will be the economic “realities” of the market for the original copyrighted work versus the AI-generated output. A finding that AI-generated content usurps the market for the original—such as a news aggregator website that merely repackages or republishes content in a new format—will weigh against the use being transformative and against a fair use defense.

Generative AI models as infringing derivative works

Plaintiffs and commentators have posited that the use of copyrighted works to train generative AI models also may give rise to derivative use claims because AI models produce outputs that may be similar to the inputs on which they were trained. But can a generative AI model itself be considered an infringing derivative work? The answer isn’t clear.

The Copyright Act defines a derivative work as “a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization . . . or any other form in which a work may be recast, transformed, or adapted.” In Kadrey v. Meta Platforms, Inc., the authors Richard Kadrey, Sarah Silverman, and Christopher Golden argued that Meta’s unauthorized copying of their books to train LLaMA language models rendered the LLaMA language models themselves “infringing derivative works” because the “models cannot function without the expressive information extracted” from the plaintiffs’ books. Judge Chhabria of the Northern District of California rejected this theory, declaring it “nonsensical” to “understand the LLaMA models themselves as a recasting or adaptation of any of the plaintiffs’ books.”

But that may not be the end of it. Similar claims are in play in Authors Guild v. OpenAI, Inc. and Andersen v. Stability AI Ltd. In Andersen, Judge Orrick, also of the Northern District of California, denied the defendants’ motion for summary judgment on the plaintiffs’ derivative infringement theory. He ruled that whether “image generators allegedly trained on, relying on, and perhaps able to invoke copyrighted images” can be considered derivative works will depend on “what the evidence shows concerning how these products operate and, presumably, whether and what the products can produce substantially similar outputs as a result of ‘overtraining’ on specific images or by design.” This is yet another area where particularized facts about how individual AI models operate will be central to determining whether infringement has occurred during the training process.

Does AI-Generated Output Infringe Copyright?

Plaintiffs also commonly allege that the output of AI models infringes copyright. This is distinct from the claim that training AI models on copyrighted inputs constitutes infringement.

Compelling examples of seemingly infringing AI outputs, sourced from the Getty Images and New York Times lawsuits, have circulated online. However, issues remain to be litigated in those cases, including how the infringing examples were created and how common near-verbatim AI outputs actually are.

With so much still in flux, courts have so far resisted a categorical rule that AI-generated output necessarily infringes, either directly or as an infringing derivative work. The Kadrey court made clear that plaintiffs must allege substantial similarity between particular AI outputs and copyright-protected inputs to maintain a derivative infringement claim. It is not enough for plaintiffs to allege that because their books were duplicated in full as part of an AI model’s training process, all AI-generated content infringes; they must allege and ultimately prove that the model’s outputs “incorporate in some form a portion of” their protected works. In Andersen, the court similarly dismissed the plaintiffs’ derivative use claim for failure to allege substantial similarity.

As with infringement claims in the training context, expect fair use to play a major role in platforms’ defense of AI output. The recent controversy over the LLM search engine Perplexity demonstrates this point. In a blog post responding to News Corp.’s newly filed lawsuit against it, Perplexity couched its mission in the language of fair use: “We believe that tools like Perplexity provide a fundamentally transformative way for people to learn facts about the world.” Of course, Perplexity’s own marketing may undercut the strength of its defense: By encouraging readers to “skip the links” to the original source material, the startup may risk substantial copyright liability.

To our knowledge, no AI platform defendant has yet asserted an independent creation defense, but that also may be raised. Whether such a defense could prove successful is one more in a sea of open questions.

Does the Removal of Copyright Management Information Violate the Digital Millenium Copyright Act?

Plaintiffs in these cases also frequently allege that AI models violate Section 1202(b) of the Digital Millenium Copyright Act (DMCA) by stripping out copyright management information (CMI)—defined by the statute to include identifying information about a copyrighted work (such as the title, author’s name, and terms and conditions for use) that is “conveyed in connection with” the work. Section 1202(b) prohibits the intentional, unauthorized “removal or alteration” of CMI, in addition to the “knowing” distribution of unlawfully removed CMI and the “knowing” distribution of works from which CMI has been unlawfully removed. The statutory penalties can be staggering, entitling plaintiffs to anywhere from $2,500 to $25,000 for each violation.

Courts have yet to reach a consensus on the showing required to prevail on a Section 1202(b) claim. In the first class action to challenge LLMs, Doe 1 v. GitHub, Inc., the district court dismissed the claim because no plaintiff alleged that the AI-generated outputs were identical to any one plaintiff’s copyrighted software code. The court reasoned that there can be no liability for any removal of CMI occurring during the AI training process because failing to affix CMI to a new work is not “removal” under Section 1202. Several other district courts have followed Doe 1 in reading Section 1202 to include an identicality requirement.

But the law is far from settled. Acknowledging that no court of appeals has yet ruled on the issue and pointing to a growing split among district courts, on September 27, 2024, the Doe 1 court certified the dismissal of plaintiffs’ DMCA claims for interlocutory appeal in the Ninth Circuit. The outcome of the appeal may have repercussions for AI models’ processing of training inputs for years to come.

Legislation

Amid a plethora of AI-related legislation, those laws addressing various applications of deepfakes and training AI models may be of particular interest to media, entertainment, and IP lawyers.

Deepfakes

AI-generated “deepfakes,” which are images, videos, or audio recordings that depict real people doing or saying something that never happened, are a major legislative priority with a great impact on the practice of media law. Numerous state legislatures have passed measures to regulate various types of deepfakes, with a focus on those that (1) aim to influence political elections, (2) depict sexually explicit conduct, and (3) use entertainers’ voices or likenesses to create a fabricated performance.

Deepfakes and Election-Related Communications

Regulation of political deepfakes trended significantly upward in 2024. Fifteen states passed legislation on the topic that year, joining five states which had previously passed such laws. Nearly all of these legislative schemes require communications aimed at influencing upcoming elections to prominently disclose any use of AI-generated deepfakes rather than prohibiting deepfakes entirely. Louisiana instead flatly prohibits candidates and political committees from distributing material known to “make a false statement” about another candidate in the election via any means.

Most states require candidates aggrieved by the use of undisclosed AI in political communication to bring a civil action. However, seven states also allow the imposition of criminal penalties for election-related deepfakes. Although no majority rule has yet emerged as to whether the AI disclosures are required at all times, or only in the immediate lead-up to an election, nearly all states that impose the disclosure requirement on a limited basis apply it for the 90 days prior to Election Day.

An area to watch as political deepfake laws continue to develop is the regulation of AI-generated content made with the consent of the candidate depicted. Three states—Indiana, Minnesota, and Mississippi—define deepfakes so that AI-generated depictions of political candidates made with their consent are exempt from disclosure requirements. Under these regulatory schemes, AI-generated depictions of political candidates that bolster their reputations are permitted to be freely disseminated without disclaimers, while negative AI depictions must be properly labeled.

In other states where a political deepfake must carry a disclaimer even if the candidate consents to the representation, this imbalance in the identification of AI-generated images will not exist. The long-term benefits and drawbacks of these different schemes remain to be seen.

Deepfakes Involving Sexually Explicit Material

Sexually explicit deepfakes are another area of major legislative attention. Thirty-one states have enacted laws addressing obscene AI-generated images. Twenty-two states prohibit the dissemination of AI-generated images depicting explicit images of people over the age of 18 without their consent, and 19 states have explicitly clarified in legislation that their existing prohibitions on child sexual abuse materials extend to AI-generated images.

The consensus as to whether nonconsensual sexual deepfake images of adults constitute a civil wrong or a criminal offense is continuing to evolve. For example, New York has created a private right of action for individuals depicted in AI-generated sexual deepfakes and also has defined the nonconsensual distribution of such deepfakes as a criminal offense. The federal government has not enacted any laws specifically addressing sexual AI deepfakes involving either minors or adults.

In September 2023, the National Association of Attorneys General specifically called on Congress to expand the federal definition of child sexual abuse material to explicitly cover AI-generated images. Bipartisan legislation to establish a commission to assess the impact of AI on child exploitation offenses is ending before the House Judiciary Committee.

Deepfakes Involving Entertainers

Although comparatively fewer laws have been enacted addressing deepfakes in the context of artistic performances, the first serious legislative efforts in this area have only recently been passed and likely foreshadow similar efforts by other states. Tennessee was the first state to protect musicians from the unauthorized digital replication of their voice with the Ensuring Likeness, Voice, and Image Security Act (ELVIS Act), passed in March 2024. The law expands the state’s right of publicity protections to cover an individual’s voice, which is a personality right that many other states already have protected.

Notably, however, the ELVIS Act defines “voice” as “sound in a medium that is readily identifiable and attributable to a particular individual, regardless of whether the sound contains the actual voice or a simulation of the voice of the individual[.]” This definition includes AI-generated replicas of a musician’s voice even if an actual recording of their voice is never used. The act also creates a new kind of secondary liability targeting beyond those who actually use an individual’s protected personality rights.

Specifically, the ELVIS Act renders civilly liable anyone who “distributes, transmits, or otherwise makes available an algorithm, software, tool, or other technology, service, or device, the primary purpose or function” of which “is the production of a particular, identifiable individual’s photograph, voice, or likeness, with knowledge” that the use “was not authorized by the individual.” This provision is squarely targeted to prevent generative AI tools from producing unauthorized replicas of a specific individual.

Several months later, Illinois became the second state to ban the unauthorized use of digital replicas of a person’s likeness created via generative AI. The Illinois law also renders liable anyone who “materially contributes to, induces, or otherwise facilitates” infringement of a person’s likeness after having obtained actual knowledge that a violation is occurring, meaning creators of generative AI tools are liable for any right of publicity violation accomplished after the required notification.

Relatedly, Illinois and California also have enacted laws to protect performers from signing away rights to their digital replicas absent explicit consent and competent representation by legal counsel or a union representative during contract negotiations. California also goes a step further as the only state to specifically prohibit the use of digital replicas of deceased performers without the consent of the estate, bolstering the state’s existing recognition of a post-mortem right of publicity for 70 years after death.

A bill to create a federal intellectual property right protecting individuals’ image, voice, and likeness—known as the NO FAKES Act (Nurture Originals, Foster Art, and Keep Entertainment Safe Act)—has been introduced in both the U.S. Senate and the House of Representatives. This right to an individual’s voice and/or visual likeness could be licensed during an individual’s lifetime, but not transferred entirely, and would persist after death for up to 70 years. The rightsholder would be able to bring a civil action for violation of the NO FAKES Act and would be able to recover actual damages and any profits from unauthorized use of their digital replica.

The NO FAKES Act would create a safe harbor provision for internet service providers, which would not be liable if they removed or disabled an unauthorized digital likeness after receiving it. Certain uses covered by the First Amendment, such as bona fide news reports, would be exempt from liability. The current draft of the NO FAKES Act provides that it would not preempt any state or common law in existence as of January 2, 2025, meaning Tennessee’s and Illinois’s existing laws would survive even if the federal right of publicity is created.

Future Area of Regulation: Training Large Language Models

Laws requiring disclosure of AI training methods may be the next emerging area of regulation of interest to media lawyers. So far, California is the first and only state to pass such a law, which will take effect in 2026. The measure will require developers of generative AI models that are made available for Californians to make certain disclosures about the materials used to train these models.

Specifically, developers will have to disclose the sources or owners of datasets, how those datasets further the intended purpose of the AI model, a description of the types of data used, intellectual property considerations (including whether there are data protected by copyright, trademark, or patent, and whether the datasets were purchased or licensed by the developer), and privacy considerations such as whether the datasets include personal information or aggregate consumer information. This required disclosure of intellectual property considerations, in particular, may push some disputes over possible rights infringements by large language models out of the theoretical realm and firmly into a practical reality.

    Entity:
    Topic:
    The material in all ABA publications is copyrighted and may be reprinted by permission only. Request reprint permission here.

    Jim Rosenfeld, Bianca Chamusco, and Kathleen Farley

    Davis Wright Tremaine LLP

    Jim Rosenfeld, Kathleen Farley, and Bianca Chamusco are media and intellectual property attorneys at Davis Wright Tremaine LLP. Jim is a partner and Kathleen is an associate in the New York office, and Bianca is an associate in the Seattle office.