chevron-down Created with Sketch Beta.

The Brief

Spring 2024 | Revolutionizing Access to Justice

Initial Rulings in AI Copyright Litigation Provide Insight but Leave Questions Unanswered

Lisa T Oratz, Tyler Robbins, and D. Sean West

Summary

  • Generative AI’s rapid development has led to numerous lawsuits by authors, visual artists, programmers, and others who have raised copyright and additional claims.
  • Initial rulings highlight the importance for plaintiffs to include specific and detailed allegations in AI copyright claims.
  • Although many claims have been dismissed at early stages, courts have granted plaintiffs broad leave to amend to address the court’s concerns.
  • These cases provide insight into the evolving legal landscape surrounding generative AI and the types of claims that are likely to survive early procedural junctures but leave crucial questions unresolved.
Initial Rulings in AI Copyright Litigation Provide Insight but Leave Questions Unanswered
May Lim / 500px via Getty Images

Jump to:

The rapid development of generative artificial intelligence (AI) not only has presented new technological capabilities but also has introduced legal uncertainties as to how existing legal frameworks will apply to these new applications of AI. Last year saw a flurry of new lawsuits involving generative AI technologies, which primarily focused on copyright claims.

These lawsuits have taken a variety of approaches in the nature of the claims they raise. Some have taken a narrower approach and have limited their claims to copyright infringement based solely on the copies made in connection with training the AI models that power these tools. Other cases have taken a more expansive view and have also raised copyright claims based on the output generated by these tools, as well as a host of non-copyright claims, such as right of publicity, trademark infringement and dilution, unfair competition, unjust enrichment, negligence, privacy, invasion of privacy, and removal or alteration of copyright management information (CMI).

The most expansive cases have argued that the AI models themselves, as well as the output created using these models, are always infringing because they are derivative works of the training data, regardless of whether there is any substantial similarity between those items and the allegedly infringed works. A derivative work is a work that is “based upon one or more preexisting works,” and the right to prepare a derivative work is one of the exclusive rights granted to the holder of a copyright. Some plaintiffs have argued that the models (and their output) are derivative works of the data their models are trained on because they are “derived from” such training data and substantial similarity between them is not required. But this is a novel claim as courts typically require a finding of “substantial similarity” between an original work and the allegedly infringing work to sustain an infringement claim, including claims based on making a derivative work.

Courts have now started to address some of these issues, and these lawsuits have begun to yield some of the first initial rulings involving generative AI. In this article, we discuss the recent rulings on motions to dismiss in four of these cases: Andersen v. Stability AI Ltd.; Kadrey v. Meta Platforms, Inc.; Tremblay v. OpenAI, Inc.; and Doe 1 v. GitHub, Inc. We also discuss Thomson Reuters Enterprise Centre GmbH v. ROSS Intelligence Inc., an earlier-filed case in which the court recently entered a ruling on a motion for summary judgment. Each of these cases revolves around fundamental questions about the extent to which copyrighted works may be used to develop and improve machine learning algorithms without infringing upon the rights of the original creators.

Despite the similarities, each of these cases has a unique focal point that helps highlight different facets of the copyright issues posed by AI technologies. Together, they illustrate not only the broad scope of content types at stake—from visual art to code—but also the spectrum of creators and copyright holders affected. Andersen, for example, involves a group of independent visual artists who claim that works they had published on the internet, some of which were unregistered, were misappropriated for training image generation tools. In contrast, the plaintiffs in Kadrey and Tremblay are well-known authors who argue that the alleged use of their registered literary works to train large language models is copyright infringement. Meanwhile, the Thomson Reuters case is a dispute between two corporations over the use of materials from legal databases.

This initial set of decisions helps give shape to the evolving legal landscape surrounding generative AI and offers insight as to what types of claims are likely to survive these early procedural junctures. These decisions could provide insight into how the proceedings will unfold for the more recently filed cases involving generative AI technologies, such as New York Times Co. v. Microsoft Corp. and Nazemian v. NVIDIA Corp.

Andersen v. Stability AI

This case arose in January 2023, when a collective of artists filed a class action lawsuit involving three AI-powered image generation tools that produce images in response to text inputs: Stable Diffusion (developed by Stability AI), Midjourney (developed by Midjourney), and DreamUp (developed by DeviantArt). The plaintiffs asserted that the models powering these tools were trained using copyrighted images scraped from the internet (including their copyrighted works) without consent. The defendants filed motions to dismiss, and the U.S. District Court for the Northern District of California recently issued a ruling on these motions. As discussed in more detail below, the court dismissed most of the plaintiffs’ claims, with only one plaintiff’s direct copyright infringement claim against Stability AI surviving. The court granted leave to amend the complaint on most counts, and the plaintiffs have since filed an amended complaint, for which a renewed motion to dismiss is pending.

Note that, at the motion to dismiss stage, the plaintiffs are merely required to allege enough facts to state a plausible claim, and courts are required to accept such allegations as true and draw all reasonable inferences in favor of the plaintiffs. The truth of the allegations—and the availability of defenses such as fair use—are not resolved at the motion to dismiss stage.

Initial complaint. The plaintiffs brought several copyright-related claims, including direct copyright infringement, vicarious copyright infringement, and removal of CMI. As an initial matter, the court dismissed with prejudice the direct and vicarious copyright claims of two of the named plaintiffs because their allegedly infringed works were not registered with the U.S. Copyright Office (which is a requirement for initiating an infringement action in court). Only the direct and vicarious copyright claims of the remaining named plaintiff, Sarah Andersen, survived this initial inquiry because she registered some of her works and her claims were limited to those registered works.

Although the defendants argued that Andersen did not identify with specificity which of her registered works she alleged were used for training Stable Diffusion, the court allowed Andersen’s claims to proceed, finding that her searches on “ haveibeentrained.com,” a tool that allows searching of the LAION-5B and LAION-400M image datasets, plausibly established her allegation that her works were included in those datasets, which are alleged to have been used to train Stable Diffusion. However, as discussed below, the court ended up dismissing all but one of her copyright claims on other grounds (with leave to amend).

Direct infringement claims against Stability AI. The one claim that was not dismissed was the direct copyright infringement claim against Stability AI based on the copies made during training, as the court found that Andersen sufficiently alleged that Stability AI used her copyrighted works to train Stable Diffusion. While the defendants disputed the truth of those allegations, they did not dispute that the allegations were sufficient to survive a motion to dismiss.

Direct infringement claims against DeviantArt and Midjourney. In contrast, the court dismissed the direct copyright infringement claims against DeviantArt and Midjourney, with leave to amend, holding that the plaintiffs failed to allege sufficient facts regarding such defendants’ use of the plaintiffs’ copyrighted works for training purposes. The plaintiffs also alleged direct infringement based on (1) the distribution of Stable Diffusion (which they claimed contained compressed copies of the training data) and (2) the defendants’ AI tools and their output being infringing derivative works of the training data. However, the court concluded that the plaintiffs had not adequately pleaded plausible facts in support of those claims.

The court agreed with the defendants that the claims regarding compressed copies seemed to contradict the plaintiffs’ own explanation of how the diffusion model works and said that Andersen would need to define exactly what these alleged “compressed copies” were and provide plausible facts to support her theory. The order also asked Andersen to clarify how DeviantArt and Midjourney could be liable for direct copyright infringement when the plaintiffs alleged that they merely provided access to Stable Diffusion as a library. The court indicated that it was unclear whether these defendants could be liable for direct infringement if Stable Diffusion contains only algorithms and instructions that can be applied to the creation of images (that include only a few elements of copyrighted materials used to train the model). However, the court did not entirely rule out the possibility that the plaintiffs could sufficiently plead such a claim and noted that there might be stronger inferences about how and how much of Andersen’s protected content remains in Stable Diffusion if the plaintiffs could plausibly plead that the defendants’ AI products allow users to create new works by expressly referencing Andersen’s works by name.

The court did reject as implausible Andersen’s assertion that all images generated through the use of Stable Diffusion constitute infringing derivative works of every image used to train Stable Diffusion. The court’s ruling appears to indicate that the amended complaint would need to show that images generated through Stable Diffusion are substantially similar to Andersen’s works.

Vicarious infringement. The plaintiffs also alleged that the defendants were vicariously liable for infringing derivative works created by third parties’ use of the defendants’ products. The court dismissed the vicarious infringement claim against Stability AI because Andersen failed to sufficiently allege that Stability AI’s model outputs were infringing. Because Andersen’s “compressed copies” theory was not adequately pleaded to support a direct infringement claim, it could not serve as the basis for a vicarious infringement claim. The court also dismissed the vicarious infringement claims against DeviantArt and Midjourney because the plaintiffs did not allege claims of direct infringement against those defendants, noting that vicarious infringement necessarily requires an underlying act of direct infringement.

Removal of copyright management information. The plaintiffs also alleged that the defendants violated § 1202(b)(1) of the Digital Millennium Copyright Act (DMCA) by intentionally removing or altering CMI from the plaintiffs’ works and by distributing materials knowing that the materials had been removed or altered without authorization of the copyright owner. These claims also require that the defendants did such acts knowing, or having reasonable grounds to know, that they would induce, enable, facilitate, or conceal an infringement of copyright. The court noted that, at the pleading stage, the claimant must plead facts plausibly showing that the alleged infringer had this required mental state, and it dismissed the DMCA claim, with leave to amend, holding that the plaintiffs’ claims were wholly conclusory and did not provide the necessary details to support their allegation (including which specific CMI was allegedly removed and which of the defendants they contend removed it).

The court also dismissed, with leave to amend, the non-copyright claims brought against the defendants.

Right of publicity. The plaintiffs alleged in their complaint that the defendants violated their right of publicity both by using the plaintiffs’ names to promote the products and by allowing users to use the tool to generate work “in the style of” their names (which they claim are uniquely associated with their art and distinctive artistic style). However, the court noted that the plaintiffs retreated from the claims about violating their “artistic identities” and shifted the focus to claims based on the defendants’ alleged use of their names in advertising their products. The court dismissed these claims, with leave to amend, finding that the plaintiffs did not provide specific facts to plausibly allege that the defendants used the plaintiffs’ names to advertise, sell, or solicit purchase of the defendants’ products (or how use of the plaintiffs’ names in text inputs would produce images that would harm the goodwill associated with their names). The court noted again that the plaintiffs’ admission that no generated image was likely a “close match” for the plaintiffs’ works seemed to contradict the plaintiffs’ claims.

Unfair competition. The plaintiffs also alleged a number of unfair competition claims, including that users of the defendants’ products were deceived into believing that one of the named plaintiffs was either the originator of or approved or sponsored images generated by the defendants’ products, causing injury to the plaintiffs’ goodwill. Additionally, the plaintiffs alleged unfair competition in the defendants’ misappropriation and copying of the plaintiffs’ art for commercial gain without permission or attribution in a manner likely to deceive the public. However, the court dismissed these claims, with leave to amend. First, the court noted that to the extent such claims were based on copyright violations, they were preempted. Then, it found that to the extent they were based on Lanham Act violations, the plaintiffs did not adequately allege how such use caused a likelihood of confusion or deception regarding the origin of the generated outputs.

Breach of contract. With respect to defendant DeviantArt, the plaintiffs alleged breach of contract based on DeviantArt’s violation of its own terms of service (TOS) and privacy policy, to which plaintiff Kelly McKernan and unspecified “others” were alleged to have agreed. Among other things, the plaintiffs alleged that unspecified provisions in the TOS prohibited using content from DeviantArt for commercial purposes and that the defendants breached these terms by allowing Stability AI to use content from DeviantArt for commercial purposes. The court dismissed this claim, again due to a lack of specificity in the pleadings, noting that the plaintiffs did not allege that Stability AI is bound by the DeviantArt TOS or that the plaintiffs are third-party beneficiaries of the TOS such that the plaintiffs can sue to enforce such terms between DeviantArt and a third party. The plaintiffs were granted leave to amend but were told that they must “identify the exact provisions in the TOS they contend DeviantArt breached and facts in support of breach of each identified provision.”

Amended complaint. On November 29, 2023, the plaintiffs filed an amended complaint. The amended complaint bears little resemblance to the original complaint. The amended complaint adds seven new individual plaintiffs, all of whom have registered copyrights. The amended complaint also adds a new defendant, Runway AI (another generative AI tool company), and reframes many of the plaintiffs’ claims. The plaintiffs deleted the vicarious copyright infringement claim and replaced it with an inducement of copyright infringement claim. They also deleted their right of publicity and unfair competition claims and added two Lanham Act claims—false endorsement and vicarious trade dress infringement—and deleted their claim for declaratory relief. The amended complaint provides additional details about the plaintiffs’ claims that Stable Diffusion contains “compressed copies” or “protected expression,” and they provide quotes from academic papers and from Stability AI personnel, which they allege support their claim that the models contain protected expression.

Kadrey v. Meta Platforms

Shortly after the Andersen motion to dismiss ruling, another court in the Northern District of California ruled on a motion to dismiss in a class action case that was filed in July 2023 involving Meta’s LLaMA large language models (which the plaintiffs allege were trained on their books). In Kadrey, Meta moved to dismiss all claims except the one alleging copyright infringement based on unauthorized copying of the plaintiffs’ books for training purposes, and the court granted the motion, with leave to amend. At the same time, the court issued its decision on a motion to dismiss in another class action case, Chabon v. Meta Platforms, Inc., simply stating that Meta’s joint motion to dismiss was granted for the reasons given in Kadrey. Shortly thereafter, the judge consolidated the Chabon and Kadrey cases, and the plaintiffs have since filed an amended complaint in the consolidated case.

Initial complaint. In this case, the plaintiffs’ theory regarding the models being a derivative work is slightly different from the Andersen case (although it was filed by the same lawyers). Rather than argue that the LLaMA models contain compressed copies of the training data, the plaintiffs claim that these models are infringing derivative works because the models cannot function without the “expressive information” extracted from the plaintiffs’ books. In granting the motion to dismiss, the court rejected this argument as “nonsensical” and stated that “[t]here is no way to understand the LLaMA models themselves as a recasting or adaptation of any of the plaintiffs’ books.”

The court also rejected the plaintiffs’ theory that “every output of the LLaMA language models is an infringing derivative work” and that when third parties use the model, “every output . . . constitutes an act of vicarious infringement,” noting that the complaint offered no allegations that the output is infringing (i.e., that it is recasting, transforming, or adapting the plaintiffs’ books). Like in the Andersen case, the court rejected the notion that because the plaintiffs’ books were copied as part of the training process, the plaintiffs did not need to allege any similarity between LLaMA outputs and their books to maintain a derivative infringement claim. The court stated that to prevail on a theory that outputs constitute derivative infringement, the plaintiffs would need to allege (and ultimately prove) that the outputs “incorporate in some form a portion of” the plaintiffs’ books and concluded that the plaintiffs would need to prove substantial similarity. As no such facts were alleged here, the court granted the motion to dismiss these claims. As in the Andersen decision, the court granted the plaintiffs leave to amend their complaint.

The court also dismissed the plaintiffs’ claims under DMCA § 1202(b) because there were no facts alleged to support claims that the LLaMA models ever distributed the plaintiffs’ books (much less that they did so “without their CMI”) and their claims under DMCA § 1202(a)(1) because the plaintiffs did not plausibly allege that the LLaMA models are infringing derivative works.

The court dismissed the unfair competition claim because it found that to the extent it was based on the surviving claim for direct copyright infringement, it was preempted, and to the extent it was based on allegations of fraud or unfairness separate from the surviving copyright claim, the plaintiffs had not come close to alleging such fraud or unfairness. The court also dismissed the last two claims for unjust enrichment and negligence due to preemption.

Amended complaint. On December 11, 2023, the plaintiffs filed an amended complaint. The amended complaint contains only a single claim for direct copyright infringement based on copies made during the training of the models.

Tremblay v. OpenAI

Continuing the trend from the Andersen and Kadrey cases, the Northern District of California recently dismissed most of the claims in another generative AI class action lawsuit. Tremblay involves a class action by a group of authors against OpenAI for allegedly using their books to train its ChatGPT large language model (and has now been consolidated with two other lawsuits by authors against OpenAI). OpenAI filed a motion to dismiss all claims except the direct copyright infringement claim, and on February 12, 2024, the court granted the motion (with leave to amend) for all of the challenged claims, except for the plaintiffs’ allegation of unfair business conduct.

Vicarious copyright infringement. The plaintiffs alleged that every ChatGPT output is an infringing derivative work of their books and, thus, OpenAI should be held vicariously liable for those outputs because OpenAI has the right and ability to supervise the infringing activities of ChatGPT users and has benefited financially from those infringing outputs. The plaintiffs claimed that they did not need to allege substantial similarity because the defendants directly copied their books to train the language models. The court, however, disagreed and found that because the plaintiffs had not alleged that the outputs contained direct copies of their copyrighted books, they would need to show substantial similarity between such outputs and their books. In particular, the court noted that the plaintiffs’ base allegation that every output of ChatGPT is an infringing derivative work is insufficient to show direct infringement. Because a claim of vicarious infringement requires showing an underlying direct infringement to proceed, the court dismissed the plaintiffs’ claim.

DMCA § 1202(b) claims. The plaintiffs alleged that, by design, OpenAI removed CMI from the plaintiffs’ books during the process of training ChatGPT in violation of DMCA § 1202(b)(1). The court dismissed this claim, noting that the plaintiffs had not provided any facts to support their assertion. The court noted that even if the plaintiffs had provided facts showing that the defendants knew about the removal of CMI, they did not show how omitting CMI in the copies used in training gave the defendants reasonable grounds to know that the output would induce, enable, facilitate, or conceal infringement (which is required for a DMCA § 1202(b)(1) violation). The plaintiffs also alleged that OpenAI violated DMCA § 1202(b)(3) by distributing derivative works of their books (allegedly, ChatGPT outputs) without the CMI included. The court dismissed this claim as well, noting that DMCA § 1202(b)(3) requires a distribution of the original work (or copies of the work), and the plaintiffs had not alleged that the defendants did that—only that they distributed the outputs (which they claim are infringing derivative works).

Unlawful business practices and fraudulent conduct under unfair competition law. The plaintiffs alleged that OpenAI engaged in unlawful business practices and fraudulent conduct based on its violation of the DMCA. Because the court dismissed the underlying DMCA claims, it also dismissed these claims. The court noted that even if the plaintiffs were able to bring DMCA claims, the unlawful business practices claim would have failed because the plaintiffs had not shown any economic injury caused by the allegedly unfair practice, and the fraudulent conduct claim would have failed because the plaintiffs had not identified any allegedly fraudulent business practices outside of the dismissed DMCA claims.

Negligence. The plaintiffs raised a negligence claim under California common law, arguing that OpenAI owed them a duty of care to maintain the plaintiffs’ works once collected and ingested for training and to not use the plaintiffs’ works in a way that would foreseeably cause the plaintiffs injury. They claimed that OpenAI breached this duty by training AI models on those works. The court, however, rejected the plaintiffs’ argument, finding that the plaintiffs had not shown how OpenAI owed them a duty to safeguard the plaintiffs’ works or that OpenAI otherwise had a special relationship with the plaintiffs that created a duty of care.

Unjust enrichment. The court dismissed the plaintiffs’ claim that OpenAI had been unjustly enriched by using the plaintiffs’ books to train ChatGPT because it found that the plaintiffs did not allege that OpenAI unjustly obtained any benefits from them through fraud, mistake, coercion, or request, which are the typical grounds for an unjust enrichment claim.

Unfair conduct. Unlike the plaintiffs’ other unfair competition claims, the court allowed the plaintiffs’ claim based on OpenAI’s alleged unfair conduct to proceed. The court observed that California courts have interpreted the term “unfair” very broadly under California unfair competition law. Thus, the court concluded that if the plaintiffs’ allegations were true, the defendants’ conduct may constitute unfair conduct under California law. The direct copyright infringement claim remains as well, as the motion to dismiss did not include that claim.

Doe 1 v. GitHub

The Northern District of California also recently ruled on a second motion to dismiss in Doe, a class action lawsuit brought by a group of anonymous programmers against GitHub and OpenAI for allegedly using their code to train the models that power Copilot, GitHub’s AI programming assistant product. Unlike the other lawsuits discussed in this article, the plaintiffs in Doe did not bring copyright infringement claims. Rather, the plaintiffs brought, among others, claims for removal of CMI in violation of DMCA § 1202(b), for breach of contract, and under various negligence and unjust enrichment theories. Notably, the court also addressed issues of standing and copyright preemption when ruling on the motions to dismiss in this case.

When ruling on the defendants’ first motion to dismiss, the court denied the motion with respect to the contractual claims and most of the claims under DMCA § 1202(b) but granted the motion for the other claims. The plaintiffs renewed the claims discussed below in an amended complaint, and the defendants subsequently moved to dismiss all but the contractual claims. In its January 3, 2024, ruling on the second motion to dismiss, the court granted the motion except with regard to the defendants’ claim that some of the plaintiffs lacked standing to seek monetary damages.

Standing. The defendants argued that because half of the plaintiffs had not shown any instances in which Copilot output their code, those plaintiffs had not shown an actual injury and, thus, lacked standing to pursue monetary damages. The court agreed and dismissed with prejudice those plaintiffs’ request for monetary damages.

For the other half of the plaintiffs, the defendants argued that while those plaintiffs did allege that Copilot output similar or identical code to what was hosted in the plaintiffs’ GitHub repositories, the plaintiffs had not shown a past harm because the plaintiffs themselves had provided the inputs to Copilot that resulted in Copilot outputting their code. In one case, the prompt provided was the first few lines of the plaintiff’s code. In other cases, the prompts were not disclosed because the amended complaint was partially filed under seal.

The court disagreed, noting that a plaintiff “is not required to suffer an injury only inadvertently,” and the fact that the plaintiffs themselves had provided the input was irrelevant for the question of standing for monetary damages. The court also rejected the defendants’ arguments concerning the plausibility of the plaintiffs’ inputs and how frequently the plaintiffs’ code appears in other repositories, observing that while these arguments may be relevant for a damages amount, they were not relevant to whether the plaintiffs had standing to pursue monetary damages.

Notably, when ruling on the first motion to dismiss, the court found that both sets of plaintiffs had standing to seek injunctive relief (barring further infringement) in the event they prevail in the case. That earlier holding was based on the plaintiffs’ allegation that Copilot reproduces code from its training data “about 1% of the time.”

DMCA § 1202(b) claims. The plaintiffs alleged that Copilot removed or inaccurately reproduced CMI associated with their code in outputs. The court denied the defendants’ motion to dismiss with respect to this claim in its original ruling. However, in the second motion to dismiss, the defendants raised the issue again and asked the court to address their unresolved argument that DMCA § 1202(b) claims require a plaintiff to show that CMI is removed from or altered in an identical copy of the work. The court agreed, quoting sections of the plaintiffs’ amended complaint in which the plaintiffs emphasized that allegedly infringing Copilot outputs were often modified versions of the plaintiffs’ works, not exact copies. The court granted the motion to dismiss on this claim, with leave to amend, but observed that the plaintiffs are “unlikely” to cure this deficiency by alleging additional facts. The plaintiffs have asked the court to reconsider its dismissal of the DMCA § 1202(b) claims, arguing that the court failed to fully consider all material facts and legal arguments in the plaintiffs’ amended complaint.

Copyright preemption of state law claims. In their amended complaint, the plaintiffs brought a variety of claims under California law, including intentional and negligent interference with prospective economic relations, unjust enrichment, negligence, and unfair competition. The defendants argued that each of these claims is preempted by the Copyright Act, which expressly preempts state laws that offer copyright-like protection to works that are covered by the Copyright Act. The court agreed with the defendants and granted the motion to dismiss with respect to the state law claims.

For the intentional and negligent interference with prospective economic relations claim, the court found that the Copyright Act preempted the claim because it was essentially about the plaintiffs’ right to control the reproduction of their works. Similarly, for the unjust enrichment claim, the court found that the plaintiffs’ claim essentially concerned their exclusive right to create derivative works of their copyrighted works. For the negligence claim, the court found that the plaintiffs had merely recharacterized a copyright infringement claim as a negligence claim. The court dismissed each of these claims with prejudice. The court also dismissed the unfair competition claim but only to the extent that it was predicated on the plaintiffs’ other state law claims.

Thomson Reuters v. ROSS Intelligence

While recent copyright cases related to generative AI have attracted a great deal of attention, these are not the first copyright cases involving the unlicensed use of materials to train AI models. Thomson Reuters, which was filed in the U.S. District Court for the District of Delaware in May 2020, centers on allegations that copyrighted headnotes from Thomson’s Westlaw legal research database were used as training data for an AI legal research tool that was developed by ROSS Intelligence.

In September 2023, the court ruled that most issues in the case could not be resolved on summary judgment because “many of the critical facts in [the] case remain genuinely disputed.” The court did, however, grant the plaintiffs’ motion for summary judgment on the question of whether ROSS Intelligence engaged in at least some actual copying of materials from Westlaw.

Among the key facts that the court noted were in dispute related to fair use and the aims and outcomes of the training process. Fair use permits certain unauthorized uses of copyrighted works that would otherwise constitute infringement. The Copyright Act does not define fair use but instead sets out four nonexclusive factors for courts to consider in determining whether an unauthorized use of another’s work is a fair use.

The first factor (purpose and character of the use) also looks at whether the use was “transformative”—whether the use adds something new, with a further purpose or different character, altering the original work with new expression, meaning, or message. The court did not find that intermediate copying made in the training process is always transformative for purposes of evaluating the first fair use factor but rather held that whether the intermediate copying is transformative depends on the precise nature of the use. The court said that if, as ROSS Intelligence contended, the AI tool only studied the language patterns in the headnotes to learn how to produce judicial opinion quotes, then it would be transformative intermediate copying (following certain intermediate copying cases cited by ROSS Intelligence). But if, as Thomson Reuters alleged, ROSS Intelligence used the untransformed text of headnotes to get its AI to “replicate and reproduce the creative drafting done by Westlaw’s attorney-editors,” then those intermediate copying cases would not apply. As a result, the court found that the question of whether this use was transformative is a material question of fact that the jury needs to decide. Such disputes will be central, as the court noted that “the first fair use factor comes down to the jury’s finding of transformativeness.”

Factual disputes also remain for trial regarding the remaining fair use factors. For the second factor (the nature of the copyrighted work), the court found that disputes about how original the copied Westlaw material is and “whether it is in fact protected, and how far that protection extends,” must go to a jury. For the third factor (substantiality of the use), the court said a jury must resolve disputes related to whether “the scale of copying (if any) [engaged in by ROSS Intelligence] was practically necessary and furthered its transformative goals.” And for the fourth factor (potential market impact), a jury must ultimately decide “hotly debated” questions related to market harm, such as whether it is “in the public benefit to allow AI to be trained with copyrighted material.” The case is expected to go to trial in August 2024.

Takeaways

Because the initial rulings in the generative AI cases are at early stages of the proceedings and the courts granted broad leave to amend for many of the claims, these cases leave many crucial questions unanswered for both the providers and users of generative AI tools. Those unresolved questions should be considered as part of any evaluation of insurance coverage for generative AI-related activities.

However, the cases do provide insight into the types of claims that are likely to survive motions to dismiss and set precedents for future cases involving generative AI. Consequently, these early cases may influence the types of claims that future plaintiffs bring against AI tool providers and how the plaintiffs frame their proceedings. For example, some of the more recent cases appear to be tailored toward trying to address some of the pitfalls that led to the plaintiffs’ claims being dismissed in the cases discussed here.

In particular, these cases suggest that claims that the models themselves and the generated output of such models are infringing works may face an uphill battle unless it can be shown that they either contain portions of the copyrighted content or are substantially similar to the copyrighted content. Further, the Thomson Reuters case suggests that it may not be easy to resolve AI-related cases that involve fair use on summary judgment—at least not if the record indicates disagreement on how the tools at issue work and not until judicial consensus develops on how the fair use factors should be applied to generative AI.

    Authors