Big Tech Wins in Copyright Cases Come With Strings Attached

Two federal judges in the same Northern California district court have issued ground-breaking decisions in separate copyright infringement lawsuits brought by book authors against companies that developed generative AI large language models.

Although the judges agreed that Meta Platforms Inc. and Anthropic PBC’s use of the authors’ books to train LLMs hadn’t crossed the line of fair use, the decisions highlight risks associated with acquiring content without authorization, and the outcomes may be different for AI platforms that can generate output that is substantially similar to works on which they were trained. The judges also split on a new market dilution theory that could undermine the tech giants’ fair use arguments in future cases.

Determining Fair Use

Judge William Alsup, who presided over the case against Anthropic, and Judge Vince Chhabria, who presided over the case against Meta, both held that training LLMs using copyrighted books, among millions of other works, was fair use and therefore not infringing. Chhabria described Meta’s Llama LLM as “highly transformative,” while Alsup lauded Anthropic’s Claude chatbot as “exceedingly,” “quintessentially” and even “spectacularly” transformative. Both judges pointed to the fact that LLMs are capable of generating human-like responses to queries on a wide variety of topics.

Copying full books was reasonably necessary, the judges held, because an LLM’s quality depends on the size, diversity, and quality of the materials on which it was trained. And because it was reasonably necessary to train the LLMs on a large number of works, including any single book was no less reasonable than including any other.

In finding fair use, both judges rejected the authors’ arguments that the AI companies supplanted the market for licensing their books to train LLMs, even though there was evidence that both Meta and Anthropic had at least considered licensing content to train their LLMs, and in some cases actually took steps to do so.

Those rulings are in tension with at least one other federal court, which found that even the potential market for licensing AI training content weighed against fair use.

Alsup expressed the view that licensing works for the “narrow purpose” of training LLMs “is not one the Copyright Act entitles Authors to exploit,” while Chhabria noted difficulties that Meta had encountered in attempting to license works to train its family of LLMs—including that book publishers sometimes don’t own the necessary rights, or only in certain territories, and some publishers didn’t respond to Meta’s outreach at all.

Those obstacles may become less onerous as the licensing market for AI training data matures, potentially leading to different outcomes in future cases.

Methods Matter

Both judges found LLM training to be fair use even though the AI companies obtained the books by downloading unauthorized digital libraries, and internal emails suggested they knew training on copyrighted material without authorization posed legal risk. Alsup quoted Anthropic’s cofounder and CEO as wanting to avoid the “legal/practice/business slog” associated with purchasing training content.

But the ways the AI companies ultimately obtained the books mattered.

Chhabria found that Meta might be liable for downloading pirate libraries through “torrents,” which can cause the libraries to be redistributed to third parties. He also hypothesized that LLM training might not be fair use if downloading pirate libraries provided a material benefit to the people and entities who operate the libraries.

Alsup held that Anthropic’s downloading of pirate libraries to create a repository for purposes other than LLM training supported a separate infringement claim against the company.

Companies looking to train LLMs and other AI systems should therefore consider how they obtain the content needed to train their systems.

Outputs and Outcomes

Both judges emphasized there was no evidence before them that the LLMs had generated outputs that were substantially similar to the authors’ books.

In the Anthropic case, there wasn’t evidence of any similar outputs whatsoever, and in the Meta case, expert testimony showed that Llama would generate no more than 50 words and punctuation marks from any of the plaintiffs’ books. To the extent the LLMs replicated the authors’ style, that didn’t implicate any rights protected by copyright, the courts explained.

Alsup warned that if Claude had output material that was similar to the books on which it was trained, “this would be a different case,” and that the “authors remain free to bring that case in the future should such facts develop.”

Notably, other pending lawsuits against AI companies—including claims that Anthropic infringed copyrighted song lyrics, and claims that OpenAI infringed The New York Times’ articles—do allege outputs that replicated the plaintiffs’ works, which may tilt the analysis against fair use.

Market Dilution Theory

Although neither judge found the market for licensing content to train AI systems weighed against fair use, Chhabria laid out an alternate theory—market dilution—which, in his view, would undercut fair use.

He expressed concern that, even if AI systems didn’t output portions of any of their source material, their ability to “generate countless competing works with a miniscule fraction of the time and creativity it would otherwise take” would “severely undermine the incentive for human beings to create,” which, according to Chhabria, is a “harm that copyright aims to prevent.”

Certain types of work—such as news articles—might be particularly susceptible to this kind of market dilution, he explained.

Chhabria noted that market dilution had never been applied in the copyright infringement context, yet nevertheless predicted it “will often cause plaintiffs to decisively win the fourth [fair use] factor—and thus win the fair use question overall[.]” But because the book authors suing Meta didn’t pursue this theory, Chhabria had “no choice” but to grant summary judgment for Meta.

Alsup rejected Chhabria’s market dilution theory, reasoning that “training schoolchildren to write well would result in an explosion of competing works” yet “[t]his is not the kind of competitive or creative displacement that concerns the Copyright Act.”

Chhabria, in turn, criticized Alsup for “blowing off” the risk that generative AI—with its ability to rapidly generate new content at minimal cost—will swamp the market for human created works.

Going forward, copyright owners asserting infringement claims against AI companies will no doubt follow Chhabria’s suggestion and argue that dilution of their works in the market tips the scales against fair use. Only time will tell whether they will be successful.

This article does not necessarily reflect the opinion of Bloomberg Industry Group, Inc., the publisher of Bloomberg Law, Bloomberg Tax, and Bloomberg Government, or its owners.

Author Information

Tal Dickstein is an entertainment and intellectual property litigator at Loeb & Loeb, helping clients in the music, motion picture, and advanced media sectors navigate complex legal and business issues.

Write for Us: Author Guidelines

To contact the editors responsible for this story: Max Thornberry at jthornberry@bloombergindustry.com; Jessie Kokrda Kamens at jkamens@bloomberglaw.com

Learn more about Bloomberg Law or Log In to keep reading:

Learn About Bloomberg Law

AI-powered legal analytics, workflow tools and premium legal & business news.