- Copyright lawsuit experts to sift through OpenAI training data
- OpenAI objects to expert who works at competing AI firm
Dr. Ricardo Baeza-Yates, a “retained consultant” for eight newspapers suing the tech giant, is also the chief scientific officer of Theodora AI, a company that works in the same space as OpenAI, according to a letter filed by OpenAI’s attorneys on Oct. 18 in the US District Court for the Southern District of New York.
Both companies help “fine-tune” large language models, the letter said, and now that plaintiffs in the copyright suits are inspecting OpenAI’s training data as a part of discovery, there is a “guarantee of advertent disclosure” of proprietary information to Theodora AI through Baeza-Yates.
“It would be impossible for Dr. Baeza-Yates to segregate what he would learn from OpenAI’s highly confidential technical documentation from the work he is doing for Theodora AI,” OpenAI said.
The company said this was the first time it was trying to curb exposure of information to an expert retained on the Daily News suit, and that it isn’t requesting a protective order against 12 other experts working with the plaintiffs, three of whom work directly in the AI field.
Daily News, the Chicago Tribune, and other publications sued OpenAI in April, alleging—like dozens of other lawsuits—that the tech company used copyrighted material to train its AI models without permission.
OpenAI in June asked a judge to merge the Daily News lawsuit with another action brought by
Kever Van Nest & Peters LLP, Latham & Watkins LLP, and Morrison & Foerster LLP represent OpenAI. Rothwell Figg Ernst & Manbeck PC represents the news plaintiffs.
The case is Daily News LP v. Microsoft Corporation, S.D.N.Y., 1:24-cv-03285, letter filed 10/18/24.
To contact the reporter on this story:
To contact the editor responsible for this story:
Learn more about Bloomberg Law or Log In to keep reading:
Learn About Bloomberg Law
AI-powered legal analytics, workflow tools and premium legal & business news.
Already a subscriber?
Log in to keep reading or access research tools.