OpenAI to Provide Training Dataset to Authors in Suit (Correct)

Jan. 28, 2025, 11:27 PM UTC

A California federal judge is requiring OpenAI Inc. to turn over an entire training dataset used to build its flagship GPT-4 AI model, resolving a discovery dispute in one of nearly a dozen copyright lawsuits filed against the company.

The group of authors suing OpenAI convinced Magistrate Judge Robert M. Illman at a virtual hearing Tuesday to compel OpenAI to provide one of the datasets “central” to the case.

The “English Colang Dataset” contains a number of web pages that “very likely” contain copyrighted content owned by the author plaintiffs, the authors said in a filing on Jan. 17. They ...

Learn more about Bloomberg Law or Log In to keep reading:

Learn About Bloomberg Law

AI-powered legal analytics, workflow tools and premium legal & business news.

Already a subscriber?

Log in to keep reading or access research tools.