Skip to content

IP/Entertainment Case Law Updates

Advanced Local Media LLC v. Cohere Inc.

District court denies AI developer Cohere’s motion to dismiss, finding that news publishers plausibly stated claims for direct copyright infringement, including for “substitutive summaries” generated by Cohere’s AI models, for secondary copyright infringement based on inducement and contributory liability theories, and for trademark infringement and false designation of origin based on the display of plaintiffs’ trademarks along with hallucinated articles.

Plaintiffs, a group of news and magazine publishers sued Cohere Inc., a technology company that develops, operates and licenses AI models, for direct and secondary copyright infringement, trademark infringement and false designation of origin, based on allegations that Cohere copied plaintiffs’ works to train its AI tools and that the output of those tools infringed plaintiffs’ copyrighted works and trademarks. Cohere’s primary product is a suite of large language models (LLMs) known as the Command Family, which Cohere markets as a “knowledge assistant” for businesses “designed to shortcut research and content analysis,” and which it promotes as a “tool to receive the latest news.” Cohere also provides free trial access to certain of its services in an effort to capture more paying customers.

Plaintiffs alleged that, in training its LLMs, Cohere copied and downloaded text from the publishers’ websites, including those protected by paywalls, and used a dataset provided by the Common Crawl Foundation, an organization that provides large portions of content extracted through internet crawling to the public at no cost. The Common Crawl dataset does not distinguish between text from copyrighted works and material in the public domain.

The publishers alleged that the Command programs unlawfully reproduced, distributed and displayed their copyrighted works in response to users’ queries. The Command programs include a feature called Retrieval Augmented Generation (RAG), which accesses external data sources, including the publishers’ sites, and incorporates those sources into responses to user queries, including verbatim copies, substantial excerpts and “substitutive summaries” of news articles. When RAG is turned off, if a user asks for a copy of a specific article, the program will often “hallucinate” and provide a fabricated article accompanied by the name of the putative publisher.

Cohere moved to dismiss the publishers’ claims for secondary copyright infringement, trademark infringement and false designation of origin, and the claim for direct infringement to the extent it is based on Command’s output of substitutive summaries of the publishers’ articles. The court denied defendant’s motion, finding plaintiffs’ allegations to be sufficient at the pleading stage.

On the direct copyright infringement claim, the court acknowledged that the underlying facts contained in plaintiffs’ news articles are not copyrightable. Compilations of facts, however, “may possess the requisite originality” to be protectable, and in considering whether the publishers plausibly alleged substantial similarity between the works sufficient to support a claim of direct infringement, the court “looks only to the original elements in Publishers’ presentation of the facts” to determine whether “the copying is quantitatively and qualitatively sufficient to support a finding of infringement.”

The court concluded that plaintiffs adequately alleged that Command’s outputs are quantitatively and qualitatively similar to their works. Plaintiffs alleged that Command’s output both heavily paraphrased and copied phrases verbatim from source articles and that they “go well beyond a limited recitation of facts,” including “lifting expression directly or parroting the piece’s organization, writing style, and punctuation.” The court relied on 75 examples of allegedly infringing content, including 50 in which the output included allegedly verbatim copying and 25 with a mix of verbatim copying and close paraphrasing, which plaintiffs included in their pleading.

The court rejected Cohere’s argument that the news summaries differed from the copyrighted works in tone, style, length and sentence structure, noting that in some of the examples, the outputs were nearly identical to the underlying work, lifting several paragraphs in their entirety. The court also rejected Cohere’s argument that any copying of expression was minimal and therefore noninfringing, declining to adopt a bright-line test for the amount of protectable content that must be copied to allow a claim of infringement to proceed.

On the claim for secondary copyright liability, the court found plaintiffs sufficiently alleged contributory infringement under three theories: material contribution, inducement and vicarious infringement. Cohere argued that plaintiffs failed to state a claim because (1) plaintiffs did not allege direct infringement and therefore cannot claim secondary infringement under any theory; (2) Cohere did not have actual knowledge of any specific infringement by users and therefore no contributory infringement claim was stated; and (3) plaintiffs offered only conclusory allegations of contributory infringement by inducement. The court rejected all three arguments, noting that plaintiffs provided enough factual detail (including the 75 examples) to suggest actual and constructive knowledge of infringement by users.

Cohere argued that plaintiffs were required to allege actual knowledge of specific acts of infringement in order to state a claim for contributory infringement. Noting this “heightened knowledge standard” from the Ninth Circuit had not been adopted by the Second Circuit, the court held that “contributory infringement liability is imposed on persons who ‘know or have reason to know’ of the direct infringement.” While knowledge of specific infringing acts is not required, plaintiffs must allege more than “generalized knowledge of the possibility of infringement.” The court noted that plaintiffs alleged that Cohere knew training its LLMs would result in infringing output, because they were specifically designed to do so, and that Cohere was on notice that it was not authorized to use plaintiffs’ works as a result of copyright notices included on those works and in the terms of service on plaintiffs’ websites. Plaintiffs also alleged they sent do-not-crawl instructions to Cohere’s bots via robots.txt protocol and that Cohere continued to copy their works even after receiving cease-and-desist letters.

Cohere argued that it could not be held liable for secondary infringement based only on its knowledge that Command could be used in infringing ways. The court found this argument unavailing, as the plaintiff publishers alleged that Cohere intentionally programmed its LLMs to retrieve copies of their copyrighted works and deliver them to users.

The court likewise rejected Cohere’s argument that plaintiffs offered only conclusory allegations of contributory infringement by inducement, finding that plaintiffs plausibly pleaded that Cohere took affirmative steps to foster infringement, including by marketing and advertising Command as a tool to access the latest news and keep users “up to date.” The program’s chat interface and free online demo also suggest that users use the interface to access news stories. Plaintiffs’ claims were also based on the theory that defendant intentionally programmed its AI systems to generate and deliver copies of their copyrighted works to users.

On the claims of trademark infringement and false designation of origin, the court found that plaintiffs plausibly alleged Cohere’s use of their trademarks in commerce, including by displaying the publishers’ trademarks to users of the free online trial, the purpose of which was to attract paying customers, and by displaying those trademarks in ways that would divert traffic, sales and subscriptions from the publishers, as well as deprive the publishers of advertising revenue earned through traffic to their websites.

The court also found that plaintiffs plausibly alleged a likelihood of confusion. Noting that the publishers alleged—and defendant didn’t contest—that Command’s outputs include the publishers’ registered trademarks, the court concluded that plaintiffs were not required to plead either real-world instances of consumer confusion or allegations under the Polaroid factors that would show a likelihood of confusion. The outputs and hallucinated summaries created by Cohere’s AI model that include plaintiffs’ trademarks could be misleading to users who might believe these texts originate from plaintiffs, especially given that the publishers have announced licensing arrangements with other AI companies.

While the court stated that it need not and would not evaluate the merits of Cohere’s nominative fair use defense on a motion to dismiss, it noted that it was doubtful that the defense would apply in this case, and concluded that the complaint adequately alleges facts that could, if proved, cause a trier of fact to reject application of the nominative fair use defense.

Summary prepared by Tal Dickstein and Jessica Manavi

Download our Intellectual Property/Entertainment Cases of Interest mobile app using the links below.