Introduction

2023 was the breakout year for generative artificial intelligence (AI). This branch of AI and machine learning uses generative models to create new content, such as text, images, music, or video. These models are trained on massive amounts of data to produce a neural network that encodes statistical information such as word frequencies, syntactic patterns, and thematic markers. Based on a user's prompt, the neural network can produce creative output based on the training data.

Generative AI has many applications, such as art, writing, design, healthcare, gaming, and marketing. But it has spurred just as many lawsuits. The growth of generative AI has been the source for myriad lawsuits claiming that the training process for generative models infringes copyrights in written and visual works. For example, on September 12, 2023, a class of author plaintiffs sued Meta Platforms for copyright infringement in the Northern District of California because Meta's “LLaMA” (Large Language Model Meta AI) allegedly “copied and ingested” plaintiffs' copyrighted protected works as part of its training. A class action was also brought against Stability AI in the same court for copyright infringement on January 13, 2023 because Stability AI allegedly infringed plaintiffs' copyrighted works in the process of training its generative AI model, Stable Diffusion.

It is anticipated that the doctrine of fair use will feature prominently in these lawsuits. Indeed, on August 30, 2023, the U.S. Copyright Office published a notice of inquiry and request for comments as part of its “study of the copyright law and policy issues raised by artificial intelligence.” 88 Fed. Reg. 59942 (Aug. 30, 2023). The notice includes over 30 questions related to the topic of artificial intelligence, of which six are specifically on the doctrine of fair use. For example, the Copyright Office asks, “Under what circumstances would the unauthorized use of copyrighted works to train AI models constitute fair use?” 88 Fed. Reg. 59942, 59946.

As we await legislative and legal guidance from the Copyright Office and courts on how fair use will be applied to generative AI, this article reviews the historical application of fair use by the courts in other cases of alleged copyright infringement involving novel technologies.

Fair Use Doctrine

The motivating purpose of copyright is “to promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.” U.S. Constitution Article I, §8, cl. 8. The law of copyright aims to find a fair balance between the rights of creators and inventors to profit from and control their works and inventions, and the rights of society to access and use ideas, information, and commerce.

Fair use is an affirmative defense under the Copyright Act rooted in the dual purpose of copyright to motivate the creativity of authors and advance public welfare through access to expressive works. Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 590 (1994) (clarifying that “fair use is an affirmative defense”). As stated succinctly by the Supreme Court, a person “who makes a fair use of the work is not an infringer of the copyright with respect to such use.” Sony Corp. of Am. v. Universal City Studios, Inc., 464 U.S. 417, 433 (1984) (emphasis added).

Under the Copyright Act, certain uses of copyrighted works are classified as non-infringing fair use: criticism, commentary, news reporting, teaching, scholarship, and research. 17 U.S.C. § 107 (2019). This list is non-exhaustive, however, and the Supreme Court has held that the doctrine “permits and requires courts to avoid rigid application of the copyright statute when, on occasion, it would stifle the very creativity which that law is designed to foster.” Campbell, 510 U.S. at 577. Fair use therefore is not amenable to simplification with bright-line rules—rather, the doctrine calls for a case-by-case analysis enabled by four statutory factors:

  1. the purpose and character of the use,
  2. the nature of the copyrighted work,
  3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole, and
  4. the effect of the use upon the potential market for or value of the protected work.

17 U.S.C. § 107. None of the factors individually are dispositive in determining fair use. Harper & Row Publishers, 471 U.S. 539, 549 (1985). The concept of fair use is, therefore, “flexible” and as a result has been courts' go-to doctrine for handling copyright issues arising in “significant changes in technology.” Oracle America, Inc. v. Google LLC, 141 S. Ct. 1183, 1197 (2021).

Fair Use's Interaction with Cutting-Edge Technologies

The flexible contours of the fair use doctrine are well illustrated through four instances where courts have applied the doctrine to novel technologies spanning the mid-80s to the present. 

Video Recordings – Sony v. Universal Studios (1984)

Background. In Sony Corporation of America v. Universal City Studios, Inc., Plaintiffs Universal Studios, Inc. and Walt Disney Productions were producers and owners of registered audiovisual works. Defendant Sony made and sold Betamax video tape recorders, primarily used for “time-shifting,” or the practice of recording a program to view it at a later time. Plaintiffs alleged that Sony violated their copyrights because Betamax custumers were recording plaintiffs' copyrighted works that had been showing on commercially sponsored television. Sony, 464 U.S. 417.

Holding.  Although the Court invoked fair use to find that Sony was not liable for copyright infringement, the fair use analysis was limited. Indeed, the Court borrowed from patent law's doctrine of “substantial non-infringing use,” which stands for the proposition that there is no infringement if a product is widely used for a legitimate, unobjectionable purpose. Id. The analysis foreshadowed the Court's willingness to flexibly apply the fair use doctrine to fit the technology at issue.

Thumbnails - Perfect 10 v. Amazon (2007)

Background. Thirty-seven years after Sony came Perfect 10 v. Amazon. 508 F.3d 1146 (9th Cir. 2007). Perfect 10 sought a preliminary injunction to prevent Google from allegedly copying, reproducing, distributing, publicly displaying, adapting, or otherwise infringing, or contributing to the infringement of Perfect 10's copyrighted images; linking to websites that provide full-size infringing versions of Perfect 10's images; and infringing Perfect 10's username/password combinations.

The technology at issue in Perfect 10  was Google's image search which provides responses to user queries in the form of images from its index of websites. The images are in the form of “thumbnails,” or small images stored in Google's servers that are reduced, lower-resolution versions of their full-sized counterparts stored on third-party computers. Defendant Perfect 10 was a company that marketed and sold copyrighted images of nude models. It offered paid password protected accounts for subscribers to view images on its website. 

Holding.  The Ninth Circuit began by remarking that the Court “must be flexible in applying a fair use analysis.” Id. at 1163. Of note was its analysis of the first factor under the fair use doctrine: the Ninth Circuit stated that the central purpose of the inquiry was to determine whether and to what extent the new work is “transformative.” Citing the Supreme Court's decision in Campbell, the Ninth Circuit defined a transformative work as follows: “the new work does not ‘merely supersede the objects of the original creation' but rather ‘adds something new, with a further purpose or different character, altering the first with new expression, meaning, or message.” Id.  at 1164. Finding that Google's use of thumbnails was “highly transformative,” the Court stated that Google's search engine provided an entirely new use for the original images by transforming the images to an electronic reference tool. The Court also found that Google's incorporation of Perfect 10's image into the search engine did not diminish the transformative nature of Google's search engine, because Google was using Perfect 10's images in a new context to serve a different purpose. 

Digital Books - Authors Guild v. Google (2015)

Background. Eight years after Perfect 10, Google once again drew a complaint for copyright infringement in 2015, this time from plaintiff Authors Guild, who were authors of copyrighted books. Plaintiffs brought a class action suit against Google alleging that Google, through its Library Project, acted without permission of rights holders in making digital copies of millions of books and making them publicly available through its search engine. Authors Guild v. Google, 804 F.3d 202 (2d Cir. 2015).

Google's Library Project, in which, beginning in 2004, Google had scanned more than 20 million books, including copyrighted books. The digital corpus Google accumulated through the scans fueled Google Books, a public search engine that allowed users to enter search words or terms and receive in response a list of all books in Google's database in which those terms appear. Google Books also allowed users to view the texts in limited fashion. The search function displayed three “snippets” containing the words or term selected by the user. Snippets were horizontal segments of books, and Google had rules surrounding snippet displays.

Holding.  Demonstrating a trend in the courts' use of the fair use doctrine for emerging technologies, the Second Circuit similarly focused the first factor analysis on “transformative use” with the analysis of the remaining factors focused on the particularities of the snippet view. The Court emphasized that the purpose of Google's copying of the books was to make available significant information about those books, permitting a searcher to identify books that contain a word or term of interest, which was the sort of transformative purpose that strongly favors satisfaction of the first factor. Google's snippet view was also seen by the Second Circuit as transformative. Snippet view, according to the Court, provided just enough context to help users evaluate whether a book actually falls within the scope of his or her interest.

Source Code – Oracle v. Google (2021)

Background. Google was sued as recently as 2021 for copyright infringement, this time over its source code. Oracle sued Google, alleging that the use of the Java API violated Oracle's copyright by copying the declaring code and organizational structure of the API. Oracle, 141 S. Ct. 1183.

In 2005, Google acquired Android to develop a software platform for mobile devices, such as smartphones. The Android platform was built on millions of lines of new code written by Google engineers. However, because Google also wanted the large population of software engineers already familiar with the popular Java platform to be able to use the Android platform, Google also used around 11,500 lines of code from the Java's API.

The Java API is essentially a collection of computing tasks that programmers can use for their programs. Each task is called a “method” and belongs to a larger group of related tasks called a “class.” The classes are also grouped into bigger categories called “packages.” For each task or “method,” there is a corresponding source code, called the “implementing code,” that tells the computer how to execute the particular task a programmer asks it to perform. Programmers can tell a computer which implementing code it should choose by entering into a program a command called a “method call” that corresponds to the specific task. The “method call” locates and invokes particular implementing code through another type of code called the “declaring code.”

Google did not copy Java's implementing code. It was written by Google engineers. Google also wrote the vast majority of Android's declaring code. However, for a limited number of tasks, Google used Java's declaring code, meaning that Google used the names given to particular tasks and the grouping of those tasks by Java. Google did this so that programmers migrating from Java could use declaring code they were already familiar with for certain tasks.

Holding.  The Supreme Court relied solely on fair use to rule in favor of Google and is informative because it has the most thorough analysis of the four factors to date. It began by stating that “fair use can play an important role in determining the lawful scope of a computer program copyright, such as the copyright at issue here. It can help to distinguish among technologies.” Id. at 1198.

The Court first turned its attention to the second factor in finding that declaring code—the code at issue in the action—was meaningfully different from copyrightable computer code because the declaring code was inextricably bound with the non-copyrightable idea of organizing tasks into categories. Furthermore, the Court noted the creative expression in an API was found in implementing code, and particularly the use of implementing code with the declaring code in the very different context of smartphones. Finally, the Court viewed declaring code as user-centered and differentiated it with the “innovative” implementing code. Declaring code's main value was not in its innovative nature but in the value that those who do not hold copyrights (i.e., computer programmers) invest of their own time and effort to learn the API's system and the corresponding value in the declaring code's efforts to encourage programmers to learn and use Java's API system so that they will use Oracle's implementing programs that Google did not copy. Therefore, the first factor pointed to fair use.

The Court then analyzed the first factor, focusing the analysis on the “transformative” nature of the allegedly infringing work. The Court concluded that Google's use was transformative. Google's Android product, in the Court's eyes, offered programmers a highly creative and innovative tool in a new and distinct smartphone environment. Similar to Perfect 10  and Authors Guild, that Google's use was commercial in nature was not dispositive because of the transformative nature of Google's use.

On the third factor, the Court admitted that the quantitative amount that Google copied was “large.” However, the Court viewed the 11,500 lines Google used in context of the several millions of lines Google did not use. Furthermore, Google used the 11,500 lines for the practical purpose of attracting programmers to build a different task-related system for a different computing environment. In the Court's view, Google's use was essentially tethered to a valid, and transformative purpose, meaning that the third factor also favored fair use.

For the fourth factor, the Court began by pointing to evidence that Android did not harm Java's actual or potential markets, as Oracle/Sun would not have been able to enter the smartphone market anyway. Second, the Court agreed with Google's expert that Android was not a market substitute for Java's software because the two products were on different devices. Java was on simple, mobile devices like the Amazon Kindle whereas Android was used in more advanced smartphone technology. Third, the Court noted that Oracle could benefit from Android because Java and Android operated in two distinct markets and programmers learning Java language to work on smartphones can bring their talents to Java's laptop market as well. Fourth, the Court noted that allowing enforcement of Oracle's copyright would risk harm to the public by stifling creativity. Oracle, 141 S. Ct. 1183.

Fair Use and Generative AI

Most striking about the fair use assessments in these cases is the highly individualized  analysis of the four fair use factors. This, of course, is inevitable because courts must mold fair use to whatever technology is at issue, and vast differences exist between video recording technology and computer source code.

It has not been determined yet how courts adjudicating cases involving generative AI will apply the fair use factors. However, the four cases collectively hint at the following three points:

  1. Whether generative AI is “transformative” will play a large role;
  2. The nature of the copyrighted work and the raw amount of the work used in the allegedly infringing technology will likely not be dispositive; and
  3. The analysis on the effect of the use upon the market for the protected work will be fact-specific.

September 25, 2023 marks the first decision in which a district court judge considered whether the use of copyrighted materials in machine learning was fair use, in Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc. 2023 WL 6210901 (D. Del. Sept. 25, 2023). Thomson Reuters sued Ross Intelligence Inc. for using content on Thomson Reuter's legal research platform, Westlaw, to create a “natural language search engine” using artificial intelligence. The plaintiff had alleged that the defendant had infringed Westlaw's copyrighted headnotes and corresponding “key numbers” that connect legal issues in a document to corresponding laws or cases. In response, defendant invoked the fair use defense, arguing that its artificial intelligence “studied the headnotes and opinion quotes only to analyze language patterns, not to replicate Westlaw's expression.” Id. at *8.

Despite considering all four fair use factors, Judge Stephanos Bibas declined to resolve the defense at the summary judgment stage. However, Judge Bibas' opinion affirms the three points above. Specifically, he wrote that the analysis will depend on “transformativeness,” specifically mentioning the role “transformativeness” played in Oracle. Judge Bibas also followed the approach taken by the cases above and stated—quoting Authors Guild—that the “second factor has rarely played a significant role in the determination of a fair use dispute” and that the amount of copying will depend on whether the use was “tethered to a valid purpose.” Id. at *9-10.

Finally, on the fourth factor, Judge Bibas admitted that the analysis will involve “factual market-impact questions” and listed generative AI-specific questions that must be answered: “How transformative is it? Can the public use it for free? Does it discourage other creators by swallowing up their markets?” Id. at *11.

Conclusion

Although Thomson Reuters  did not establish definitive precedent on fair use, Judge Bibas' opinion will be the first of many opinions that address fair use's applicability to artificial intelligence and machine learning as more plaintiffs assert copyright claims against companies utilizing such technology. And although Judge Bibas largely followed the trends and precedent established by the cases explored in this article, it remains an open question whether other courts will do the same.

The content of this article is intended to provide a general guide to the subject matter. Specialist advice should be sought about your specific circumstances.