Skip to content

IP/Entertainment Case Law Updates

Bartz v. Anthropic PBC

District court holds that Anthropic’s use of books to train its Claude large language models and its use of purchased copies of books to create digital permanent library constitute fair use, but its use of pirated books to create such library does not constitute fair use.

Plaintiffs, a group of book authors, initiated a class action against Anthropic PBC, the developer of the AI platform Claude, claiming that Anthropic infringed the authors’ copyrights by making copies of their books for its internal permanent library and reproducing them to train the large language models (LLMs) for Claude. Anthropic moved for summary judgment on fair use, arguing that using copies of the authors’ books and millions of others was justified because those copies were reasonably necessary for training the LLMs.

The record on summary judgment showed that in training Claude, Anthropic used books and other texts selected from a central library it had assembled. To assemble that library, Anthropic initially obtained books from so-called online pirate libraries. This included downloading Books3 (an online library of over 196,000 books that had been assembled from unauthorized copies of copyrighted books) and downloading distributed, reshared copies of other pirate libraries, including at least 5 million copies of books from Library Genesis (LibGen). Anthropic separately bulk-purchased used physical books for its library, stripping the books from their bindings, cutting their pages to size, scanning the books into digital form and discarding the paper originals.

With these pirated and purchased-and-scanned books, Anthropic created a general “research library” or “generalized data area” as a means of storing information that “would be voluminous” and that it “would use for research” or to otherwise train or inform its products. With respect to the books used for training purposes, those files were copied, cleaned and “tokenized” (i.e., broken down into manageable pieces) for training, with each LLM retaining compressed copies of the works on which it had been trained. Once Anthropic decided a book was not to be used for training at all or ever again, it retained the book as a “hard resource” for other or future uses.

The authors did not allege, and there was no evidence, that the LLMs could output infringing copies of the works to Claude users.

As to the question of what “use” or “uses” were at issue in the fair use analysis, Anthropic contended that it copied the books for a single use: to train LLMs. The authors, however, argued that at least two uses were at issue: first, the use of the books to build Anthropic’s central library, and second, the use of the books to train specific LLMs using subsets of that content. The court agreed with the authors’ framework and considered these as separate uses. Additionally, as to the building of the central library, the court considered differences between the use of the pirated copies and the use of the purchased-and-scanned copies.

In its fair use analysis, the court first addressed the purpose and character of the use, including whether Anthropic’s use of the copyrighted works was of a commercial nature. With respect to training, the court stated that “the purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative” and that “[l]ike any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant [those works]—but to turn a hard corner and create something different.”

As to the creation of a central library, the court distinguished the copies Anthropic purchased from the pirated copies it downloaded. With respect to the former, the court found that Anthropic purchased its print copies “fair and square” and that each purchase gave Anthropic the right to dispose of the copy as it saw fit. Digitizing the books was a “mere format change,” as the conversion of a print book to a digital file to save space and enable searchability was transformative.

The court drew a line, however, when it came to the pirated books, which were downloaded without payment and kept in Anthropic’s library irrespective of whether they were used to train its LLMs. The use of those books to build a library was not transformative, the court held. According to the court, this case was unlike Perfect 10, Inc. v. Amazon.com, Inc., in which Google visited websites having full-size images, made reduced-size copies, and incorporated them directly into its search engine, because the images in that case were deployed immediately into the transformative use of identifying the full-size images and websites from which they came. The court also held that this was unlike the Google Books cases, where libraries of authorized copies already had been assembled and all copies were made for direct employment in a one-to-one further fair use, principally for the transformative use of pointing to the works themselves. Here, as the court explained: “No authorized copies existed from which Anthropic made its first copies. No full-text copy therefrom was put immediately into use training LLMs. Not every copy was even necessary nor used for training LLMs.”

Similarly, the court held that “intermediate copying” cases such as Sega Enterprises Ltd. v. Accolade, Inc. and Sony Computer Entertainment, Inc. v. Connectix Corp. did not permit the uses at issue here because the defendants in those cases had purchased commercially available copies of game cartridges and made further copies “solely in order to discover the functional requirements for compatibility.”

Accordingly, the first factor weighed against fair use for the central library copies made from pirated sources.

The court next addressed the “nature of the copyrighted work” and held that this second factor weighed against fair use for all copies alike. Although some of the authors’ books were nonfiction, all of them contained expressive elements and were chosen for their expressive elements as potentially valuable training tools.

The court then assessed “the amount and substantiality” of the copyrighted works used and whether that amount was reasonable in relation to the purpose of the copying. As to the copies used to train specific LLMs, the court held that this third factor favored fair use because Anthropic’s copying was reasonably necessary to achieve a transformative use. As it stated, “all agree Anthropic needed billions of words to train any given LLM.” The court also clarified that what matters is not the amount and substantiality of the portions used in making a copy but rather the amount and substantiality of the portions that were made accessible to the public—and here, there was no allegation of any traceable connection between the LLMs’ outputs and the authors’ works. Thus, the “compelling benefits of training the LLMs on strong examples were not offset by revelations to the public of any portion of the works themselves.”

The court also held that the third factor favored fair use as to the purchased library copies converted from print to digital because the purpose of the copying was to keep the books in its library but with more favorable storage and searchability properties. This purpose required copying, there was no surplus copying and the source copy was destroyed. With respect to the pirated copies, however, the court held that because “Anthropic lacked any entitlement to hold those copies” and retained them “even after deciding it would not make further copies from them for training,” this third factor weighed against Anthropic for that particular use.

The final fair use factor concerns the effect of the use on the potential market for or value of the copyrighted work, which, the court noted, “points against fair use when a copyist makes copies available that displace demand for copies the copyright owner already makes available or readily could.” The court found that the copies used to train specific LLMs did not and would not replace demand for copies of the authors’ works. The court was not persuaded by the authors’ “market dilution” theory—that training LLMs would result in an explosion of new works competing with their books. This concern, the court opined, was no different from complaining that “training schoolchildren to write well would result in an explosion of competing works,” which is “not the kind of competitive or creative displacement that concerns the Copyright Act.”

The authors also argued that Anthropic’s training use has displaced or would displace “an emerging market for licensing their works for the narrow purpose of training LLMs.” But the court held that such a market is not one that the Copyright Act entitled the authors to exploit, given the transformative nature of the use. It therefore concluded that the fourth factor weighed in favor of fair use for the training copies.

As to the copies used to build a central library, the court deemed the fourth factor to be neutral for the purchased library copies that were digitized, and to weigh against fair use for the pirated library copies. For the purchased copies, the authors argued that the format change exposed them to the usurpation of the opportunity to sell rightful copies because Anthropic could transmit unauthorized digital copies more readily than it could have transmitted additional print copies. The court was unpersuaded and held that the record did not establish any intent to redistribute the library copies once acquired. The court viewed the pirated copies differently: Those copies “plainly displaced demand for [a]uthors’ books—copy for copy,” it held.

After assessing the fair use factors in totality, the court ultimately held that use of the copies of the books to train specific LLMs was justified as a fair use, as every factor but the nature of the copyrighted work weighed in Anthropic’s favor. The court emphasized that “[t]he technology at issue was among the most transformative many of us will see in our lifetime.” Use of the copies of the books that were purchased and converted into digital library copies was also justified as fair use, particularly because the purchased print copies were destroyed and their digital replacements not redistributed. The court granted summary judgment in favor of Anthropic on these uses.

With respect to the downloaded pirated copies used to build Anthropic’s central library, however, every factor weighed against fair use. The court thus denied summary judgment on this issue and left for trial the question of the resulting damages.

Summary prepared by Frank D’Angelo and Elena De Santis

Download our Intellectual Property/Entertainment Cases of Interest mobile app using the links below.