District court denies in large part AI developer NVIDIA’s motion to dismiss, finding authors plausibly stated claims for direct and contributory copyright infringement based on allegations that NVIDIA trained multiple LLMs using unauthorized copies of copyrighted books sourced from illegal “shadow libraries,” but grants motion to dismiss with leave to amend vicarious infringement claim, finding authors failed to allege NVIDIA had right and ability to control third-party infringement or derived direct financial benefit from infringing activity.
Plaintiffs are authors who filed a proposed class action against NVIDIA, alleging that NVIDIA trained AI large language models (LLMs) on unauthorized copies of their copyrighted books. Plaintiffs alleged that NVIDIA trained many of its AI models using unlawfully copied materials available on illegal pirating websites called “shadow libraries.” Plaintiffs alleged that the copyrighted works were used to develop multiple LLMs in NVIDIA’s Megatron family, including Megatron 345M. Plaintiffs also alleged that NVIDIA provided various scripts and tools enabling its customers to download and preprocess The Pile, a dataset containing their copyrighted books, through the NeMo Megatron Framework and BigNLP platforms.
In its motion to dismiss, NVIDIA sought to (1) dismiss or strike allegations about the Megatron 345M model and any unidentified models referenced in the complaint; (2) dismiss allegations about shadow libraries Pirate Library Mirror and Bibliotik, as well as those concerning any unspecified shadow library; (3) dismiss allegations that NVIDIA infringed using a distribution tool called the BitTorrent Protocol; and (4) dismiss claims that NVIDIA engaged in contributory and vicarious infringement.
Plaintiffs alleged that NVIDIA used The Pile as training data for Megatron 345M and that The Pile included a dataset containing approximately 196,640 books. NVIDIA argued Megatron 345M was trained on portions of The Pile other than that dataset and asked the court to take judicial notice of information regarding The Pile on its website. The court declined to take judicial notice, rejecting the notion that a document is judicially noticeable simply because it appears on a publicly available website, and found that plaintiffs adequately alleged Megatron 345M was trained on a dataset containing their works.
NVIDIA moved to dismiss or strike allegations referencing unidentified models beyond those in the Megatron family. Because plaintiffs clarified at the hearing on the motion that they were not accusing any unidentified model of infringement other than the checkpoints for the five models they identified, and because NVIDIA agreed that the proper scope of the complaint included those five models and their checkpoints, the court denied NVIDIA’s motion on this issue.
The court also denied NVIDIA’s motion to dismiss allegations concerning Pirate Library Mirror, Bibliotik and unidentified datasets, finding that references to Pirate Library Mirror were historical or related to other AI firms, that one allegation of copying from Bibliotik was plausible, and that plaintiffs did not allege infringement via unidentified libraries.
NVIDIA also sought to dismiss allegations concerning its use of the BitTorrent Protocol. The court noted there was only one reference to BitTorrent in the complaint—that Bibliotik distributes pirated works via the BitTorrent Protocol—and characterized BitTorrent as merely a tool, not a library or dataset, analogizing a request to dismiss allegations concerning BitTorrent to “asking to dismiss allegations concerning paintbrushes in a case about a dolphin painting.” Accordingly, the court denied NVIDIA’s request to dismiss those allegations.
Plaintiffs claimed NVIDIA was liable for contributory infringement, alleging that it impermissibly provided customers with scripts to automatically download and preprocess The Pile. The court found that plaintiffs properly pled the knowledge element of a contributory infringement claim, adequately alleging NVIDIA knew of infringing activity because NVIDIA itself used The Pile to train several LLMs and then provided scripts enabling customers to acquire and preprocess the same dataset—showing more than mere generalized awareness of product capabilities. The court also found that plaintiffs sufficiently pled inducement, alleging that NVIDIA actively encouraged infringement by developing and distributing code to download and extract copyrighted files. The court rejected NVIDIA’s argument that plaintiffs must allege advertising or promotion of infringing uses, finding that advertising is merely an example of an inducing act, not a prerequisite.
As to the “service tailored to infringement” theory, the court agreed that the relevant inquiry focused on the specific scripts NVIDIA provided, which were alleged to have no purpose other than to speed up infringement, distinguishing this case from the systems at issue in prior decisions. Accordingly, the court denied the motion to dismiss contributory infringement under both theories.
The court dismissed plaintiffs’ vicarious liability claim, with leave to amend, finding that plaintiffs’ allegations only established control over NVIDIA’s own tools, not the legal right or practical ability to stop users from obtaining or using infringing materials, distinguishing the case from one where defendant maintained centralized control and could police or block user access. The court held that plaintiffs also failed to adequately allege the required causal relationship between the alleged infringement and any financial benefit, noting that plaintiffs’ allegations did not establish that access to The Pile acted as a “draw” for customers, and that conclusory allegations were insufficient without supporting factual content.
Summary prepared by Todd Densen and Chloe Gordils
-
合伙人 -
Associate