The use of copyrighted works to train artificial intelligence (AI) systems raises complex legal and ethical questions, particularly concerning the rights of the copyright holders. The landscape around this issue is evolving rapidly but a clear picture hasn’t emerged yet.
Legal uncertainty and variations across jurisdictions
Currently, there is no consensus or universal guidelines for the use of copyright material for training AI. The legality of using copyrighted works without permission for this purpose varies by jurisdiction and is subject to ongoing legal debate and interpretation. In some jurisdictions, such as the United States, the doctrine of fair use may allow the use of copyrighted works without the need for permission under certain conditions. Factors influencing this include the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect on the work's value.
For example, consider a university research team that is developing AI to analyze literary works and identify themes and patterns that could help in teaching literature. The team uses excerpts from copyrighted novels, poems, and essays to train the AI, aiming to enhance educational tools and methodologies. Now consider the Fair Use factors:
- Purpose and Character of the Use: The AI is being developed for a non-commercial, educational purpose, which is a factor often favoring fair use. Additionally, if the AI's analysis provides new insights or is transformative—meaning it adds new expression or meaning to the original works—this also supports a fair use argument.
- Nature of the Copyrighted Work: Using works that are highly creative (like novels and poems) could weigh against fair use since copyright protection is stronger for creative works than for factual ones. However, the educational context and the potential for transformative use could mitigate this factor.
- Amount and Substantiality of the Portion Used: If the research team uses only a small portion of each work to train the AI, this factor might favor fair use. However, if large portions or the "heart" of the works are used, this could weigh against fair use.
- Effect on the Market or Value of the Work: If the AI's training and resultant educational tool do not substitute for the original works and do not affect the market for those works, this factor may favor fair use. The argument is stronger if the AI's use could potentially increase interest in the original works.
Microsoft, GitHub, and OpenAI are currently being sued in a class action lawsuit that alleges they violated copyright law by allowing Copilot, Microsoft’s generative AI service, to train on billions of lines of public code, and regurgitate licensed code snippets without providing credit. Popular AI-driven generated art tools from Midjourney and Stability AI are the focus of a lawsuit that alleges they infringed on millions of artists' rights by training their tools on web-scraped images. Getty Images jumped on board the lawsuit bandwagon, suing Stability AI for reportedly using millions of images from its site without permission to train its AI art generator, Stable Diffusion.