- Authors will point to evidence of Meta’s digital piracy
- Judge’s ruling will give first look into thorny fair use issue
The way courts will view the fair use argument for training generative artificial intelligence models with copyrighted materials will be tested Thursday in a San Francisco courtroom, when the first of dozens of such lawsuits reaches summary judgment.
The proposed class action in the US District Court for the Northern District of California, which also includes journalist Ta-Nehisi Coates and Pulitzer-winning novelist Andrew Sean Greer as plaintiffs, is part of a wave of legal challenges filed nationwide against top generative AI firms including
OpenAI is battling a recently consolidated group of copyright cases from authors and media organizations including the
A ruling in the authors’ case has the power to profoundly influence copyright law and the billion-dollar business model behind AI, which relies on the belief that training with unlicensed copyrighted content doesn’t violate the law. An adverse ruling for Meta could open it up to potentially billions of dollars in damages.
Chhabria’s decision will “provide a window into, at least, how this court is thinking about fair use in the context of generative AI” and could “send ripples throughout the other cases,” said Kevin Madigan, senior vice president of Policy and Government Affairs at the Copyright Alliance.
Undisputed and Dramatically Different
During the hearing Thursday, the parties will present “dramatically different framings of what they consider to be the undisputed facts,” said intellectual property law professor Edward Lee at Santa Clara University.
The authors will argue that Meta engaged in widespread direct copyright infringement by knowingly downloading millions of books from notorious online piracy networks known as “shadow libraries,” such as Library Genesis and Z-Library.
Meta will counter that its copying falls under the fair use exception to the law, regardless of how it obtained the books. It argues that it used the books to train an entirely new, useful, and transformative product, which is encouraged by copyright law.
“Llama is nothing like a book; it is not meant to be read,” Meta argued in court briefs.
Chhabria can look to a number of cases involving copyright and new technologies to help guide his decision. The AI industry has cited Google’s fair use win in a Authors Guild v. Google Inc., where it scanned millions of copyrighted books without permission to create a searchable database. Rights holders have pointed to a recent decision by a Delaware federal judge who found that legal software company Ross Intelligence Inc.'s use of Thomson Reuters’ copyrighted material for non-generative AI isn’t fair use.
But courts, until now, have yet to take head on the copyright implications of the most powerful generative AI products.
“All of the cases that they discuss and cite to are readily distinguishable and all really different technologies that are nothing like generative AI,” Madigan said.
Pirating or Training
Central to the authors’ argument is evidence uncovered during discovery purporting to show Meta’s disregard for the law when obtaining books to train its models in 2022 and 2023.
The authors point to emails, messages, and depositions of key employees to argue that in Meta’s effort to catch up with competitors, it abandoned its initial licensing deals and moved to directly acquire millions of books and scientific papers from shadow libraries, many of which have been sanctioned by federal courts.
Meta allegedly downloaded the pirated books using a peer-to-peer file-sharing protocol known as Torrenting, where the downloader also hosts and distributes the material. The technique makes the infringement all the more blatant, the authors say. Meta contests that it reuploaded any of the books during the Torrenting process.
But the authors’ theory largely bypasses the thorny legal issue of whether the use of copyrighted material to train an AI is a fair use. Meta argues that training is a quintessential fair use regardless of how it obtained the training material.
The authors are “trying to make the tail wag the dog,” said Mitch Stoltz, IP litigation director at the Electronic Frontier Foundation, by focusing on the pirating issue instead of the core fair use question. The foundation urged the court in an amicus brief supporting Meta not to be distracted by how it obtained the books.
For others, Meta’s choice to use shadow libraries shows its willingness to undermine the goals of copyright law. “I don’t see how the court could ignore that kind of unprecedented bad faith,” said Terrence Hart, general counsel for the Association of American Publishers, which filed an amicus brief in support of the authors.
Adam Eisgrau, senior director of AI, creativity, and copyright policy at the Chamber of Progress, said Meta’s case is “relatively unique” compared to the dozens of other pending lawsuits because of how central the origins of the training material are to the pleadings. He said “whether a loss for Meta on fair use in this case would doom all other defendants would come down to the actual opinion.”
Boies Schiller Flexner LLP, Joseph Saveri Law Firm LLP, DiCello Levitt LLP, Lieff Cabraser Heimann & Bernstein LLP, and Cafferty Clobes Meriwether & Sprengel LLP represent the plaintiffs.
Cooley LLP, Cleary Gottlieb Steen & Hamilton LLP, and Paul, Weiss, Rifkind, Wharton & Garrison LLP represent Meta.
The case is Kadrey v. Meta Platforms Inc., N.D. Cal., No. 23-cv-03417, hearing scheduled for 5/1/25.
To contact the reporters on this story:
To contact the editors responsible for this story:
Learn more about Bloomberg Law or Log In to keep reading:
Learn About Bloomberg Law
AI-powered legal analytics, workflow tools and premium legal & business news.
Already a subscriber?
Log in to keep reading or access research tools.