AI Input Data and Fair Use: A View from the U.S.

Document Type

Article

Publication Date

10-2024

Abstract

For an AI system to generate text, images, music, or computer code, it must copy vast amounts of literary, artistic or musical works. Arguably, the massive copying of works, to enable AI systems to “learn” how to produce independent outputs of literary, artistic, musical, audio-visual works or software, could shelter under the fair use defense on the ground that creating training data sufficiently repurposes the copying to count as “transformative” – at least if the outputs enabled by the inputs do not themselves infringe the source content (a highly disputed point). But one should perhaps decouple the inputs from the outputs. As to whether the copying of works into training data is a “transformative” fair use, the Supreme Court’s most recent fair use decision in Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith suggests that analysis may depend on whether there is a market for licensing content for training data. Markets for high quality, reliable training data do exist or are emerging, notably in news media and scholarly publishing, and other authors and copyright owners are endeavoring to develop those markets as well. In that event, even if the outputs might not infringe particular inputs, commercial copying (at least) to create training data would be for the same purpose, and might, absent a “compelling justification” for supplanting authors’ markets, therefore fail a first factor fair use inquiry after AWF.

This article addresses a further issue: because traditional copyright analysis treats artistic style as akin to unprotectable ideas, is the copying of works of authorship in order to generate outputs «in the style of» the copied author or artist a fair use?

Disciplines

Intellectual Property Law | Law

This document is currently not available here.

Share

COinS