OpenAI Legal Troubles Mount With Suit Over AI Training on Novels

OpenAI Inc. was hit with another class action copyright lawsuit claiming its enormously popular artificial intelligence chatbot ChatGPT is trained on books without permission from the authors.

The complaint filed in San Francisco federal court on Wednesday said ChatGPT’s machine learning training dataset comes from books and other texts that are “copied by OpenAI without consent, without credit, and without compensation.”

VIDEO: Can Copyright Law Stop ChatGPT and Generative AI?

OpenAI and other generative AI companies have faced a barrage of intellectual property and privacy lawsuits in recent months as Congress and government regulators look to reign in the burgeoning industry.

This week, OpenAI was sued in another sweeping class action alleging that the machine learning models behind ChatGPT and the text-to-image generator DALL-E illegally scrape personal information across the internet in violation of various state and federal privacy laws. The company was hit with a separate copyright suit last fall claiming its AI coding assistant called Copilot reproduced open source software without proper copyright notices.

Courts have not yet determined whether using copyrighted material to train generative AI models is copyright infringement.

The Wednesday lawsuit, filed in the US District Court for the Northern District of California by the same law firm in the Copilot case, was brought by the science fiction and horror author Paul Tremblay and novelist Mona Awad.

They said ChatGPT can provide generally accurate summaries of their books, leading them to believe the works were “copied by OpenAI and ingested by the underlying OpenAI Language Model” without permission.

The complaint cited a 2020 paper from OpenAI introducing ChatGPT-3, which said 15% of the training dataset comes from “two internet-based books corpora.” The authors alleged that one of those book datasets, which contains over 290,000 titles, comes from “shadow libraries” like Library Genesis and Sci-Hub, which use torrent systems to illegally publish thousands of copyrighted works.

“These flagrantly illegal shadow libraries have long been of interest to the AI-training community,” the complaint said.

The lawsuit also said ChatGPT strips the books of their copyright notices in violation of the Digital Millennium Copyright Act.

OpenAI didn’t immediately return a request for comment.

Joseph Saveri Law Firm LLP represents the authors.

The case is Tremblay v. OpenAI Inc., N.D. Cal., No. 3:23-cv-03223, complaint filed 6/28/23.

Learn more about Bloomberg Law or Log In to keep reading:

See Breaking News in Context

Bloomberg Law provides trusted coverage of current events enhanced with legal analysis.

Learn more about Bloomberg Law or Log In to keep reading:

See Breaking News in Context

Already a subscriber?