- COURT: N.D. Cal.
- DOCKET: Nos. 4:24-cv-02655 and 3:24-cv-2653
Four prominent authors filed putative class actions against software companies
Novelist Andre Dubus III and journalist and nonfiction writer Susan Orlean alleged Nvidia’s NeMo Megatron models were trained by copying their and others’ work in a complaint filed Thursday in the US District Court for the Northern District of California. Fiction authors Rebecca Makkai and Jason Reynolds filed a nearly-identical suit against Databricks Inc. over its MosaicML model in the same court Thursday.
The claims made by the authors in both suits mirror those made within the past week by newspapers against OpenAI and a putative class of artists and photographers against Google LLC, among other suits. The cases challenge the core premise of training AI large language models with millions-to-billions of inputs from large datasets, much of it potentially copyrighted works, to learn to mimic human creativity.
The suits are also not the first putative class actions against Nvidia and Databricks for their AI models. Authors Abdi Nazemian, Brian Keene and Stewart O’Nan filed lawsuits against the two companies in March.
“We respect the rights of all content creators and believe we created our models in full compliance with copyright law,” a Nvidia spokesperson said in a statement.
Databricks did not immediately respond to a request for comment.
AI developers have argued that having models merely train on the vast quantity of works, each providing infinitesimal independent influence on outputs, constitutes fair use. OpenAI Inc. has claimed fair use and said it would be impossible to create useful AI tools without them, emphasizing it lets creators opt out.
But creators have balked at the assertion, claiming that the developers copy and use their works for free, relying on creators’ work to then produce expressive content that competes with them.
The answers to the thorny legal questions around AI’s use of copyrighted material will dramatically affect the viability of generative AI models given the shear volume of works needed to effectively train AI and the fact that any work with minimal creativity and originality would be protected.
While works gain protection upon creation, Dubus, Olrean, Makkai and Reynolds focus on authors with registered copyrights in books in a training dataset that Nvidia and Databricks has allegedly admitted copying to train its NeMo Megatron models.
One part of the dataset used to train the models comprises 108 gigabytes of data pulled from Bibliotik, a “shadow library” that hosts and distributed unlicensed copyright materials, according to the complaint. Shawn Presser, creator of Bibliotik, has confirmed in public statements that it contains nearly 200,000 books.
DiCello Levitt LLP represents the authors against both Nvidia and Databricks. Cafferty Clobes Meriwether & Sprengel LLP also represent Makkai and Reynolds.
Dubus v. Nvidia Corp., N.D. Cal., No. 24-2655, Complaint 5/2/24 and Makkai v. Databricks Inc., N.D. Cal., No. 24-2653, Complaint 5/2/24.
To contact the reporter on this story:
To contact the editor responsible for this story:
Learn more about Bloomberg Law or Log In to keep reading:
See Breaking News in Context
Bloomberg Law provides trusted coverage of current events enhanced with legal analysis.
Already a subscriber?
Log in to keep reading or access research tools and resources.