Anthropic Scores Partial Victory in Copyright Case Over AI Training Data

The AI firm downloaded over seven million pirated books to assemble its research library, internal emails revealed.

In brief

A U.S. District Judge has ruled that Anthropic’s AI training on copyrighted books is “exceedingly transformative” and qualifies as fair use.
However, storing millions of pirated books in a permanent library violated copyright law, the court said.
OpenAI and Meta face similar author-led lawsuits over the use of copyrighted works to train AI models.

AI firm Anthropic has won a key legal victory in a copyright battle over how artificial intelligence companies use copyrighted material to train their models, but the fight is far from over.

U.S. District Judge William Alsup found that Anthropic’s use of copyrighted books to train its AI chatbot Claude qualifies as “fair use” under U.S. copyright law, in a ruling late Monday.

“Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them — but to turn a hard corner and create something different,” U.S. District Judge William Alsup said in his ruling.

But the judge also faulted the Amazon and Google-backed firm for building and maintaining a massive “central library” of pirated books, calling that part of its operations a clear copyright violation.

“No carveout” from Copyright Act

The case, brought last August by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson, accused Anthropic of building Claude using millions of pirated books downloaded from notorious sites like Library Genesis and Pirate Library Mirror.

The lawsuit, which seeks damages and a permanent injunction, alleges Anthropic “built a multibillion-dollar business by stealing hundreds of thousands of copyrighted books,” to train Claude, its family of AI models.

Alsup said that AI training can be “exceedingly transformative,” noting how Claude’s outputs do not reproduce or regurgitate authors’ works but generate new text “orthogonal” to the originals.

Court records reveal that Anthropic downloaded at least seven million pirated books, including copies of each author’s works, to assemble its library.

Internal emails revealed that Anthropic co-founders sought to avoid the “legal/practice/business slog” of licensing books, while employees described the goal as creating a digital collection of “all the books in the world” to be kept “forever.”

“There is no carve out, however, from the Copyright Act for AI companies,” Alsup said, noting that maintaining a permanent library of stolen works — even if only some were used for training — “destroy the academic publishing market” if allowed.

Judge William Alsup’s ruling is the first substantive decision by a U.S. federal court that directly analyses and applies the doctrine of fair use specifically to the use of copyrighted material for training generative AI models.

The court distinguished between copies used directly for AI training, which were deemed fair use, and the retained pirated copies, which will now be subject to further legal proceedings, including potential damages.

AI copyright cases

While several lawsuits have been filed—including high-profile cases against OpenAI, Meta, and others—those cases are still in early stages, with motions to dismiss pending or discovery ongoing.

OpenAI and Meta both face lawsuits from groups of authors alleging their copyrighted works were exploited without consent to train large language models such as ChatGPT and LLaMA.

In brief

“No carveout” from Copyright Act

AI copyright cases

Related Posts

Leave a Reply Cancel reply