Authors Escape OpenAI Bid for Entirety of ChatGPT Testing Data

Authors accusing OpenAI Inc. of copyright infringement persuaded a federal judge to partly overturn an order requiring them to share all of the methodology and data they used to test the flagship ChatGPT chatbot in preparation for their lawsuit.

The author-plaintiffs still must disclose the prompts, outputs, and account settings that resulted in the results they said in their complaint demonstrate infringement, but they won’t have to turn over testing data for queries that didn’t improperly reproduce their works, Judge Araceli Martinez-Olguin wrote in an order issued Thursday in the US District Court for the Northern District of California.

Comedian Sarah Silverman and a dozen other authors sued OpenAI in June 2023, claiming OpenAI trained ChatGPT by copying hundreds of thousands of books without the authors’ permission. Their March 2024 amended complaint included examples of queries to ChatGPT to summarize in detail various parts of their writing, along with their responses.

As part of the discovery process, OpenAI requested documents about the OpenAI accounts, prompts, and outputs the authors used, and documentation of the authors’ methodology. The authors offered to produce full threads of the prompts and outputs that elicited the complaint’s examples, but balked at providing other materials.

OpenAI has made similar demands in other copyright suits against it, including one brought by New York Times Co. The AI company accused the Times of “prompt hacking” to obtain the results cited in its complaint, which the newspaper refuted in a filing.

The authors “offered up only their preferred, cherry-picked results,” OpenAI argued, and asked the court to require the authors to produce of the entirety of their test results. In June, Magistrate Judge Robert M. Illman sided with OpenAI and said the authors “cannot avoid the notion that by placing a large subset of these facts” in the complaint that they “have waived the ability to assert work product protection,” Illman said.

Martinez-Olguin disagreed, saying the authors’ testing qualified as “virtually undiscoverable” opinion work product because “the ChatGPT prompts were queries crafted by counsel and contain counsel’s mental impressions and opinions about how to interrogate ChatGPT, in an effort to vindicate Plaintiffs’ copyrights against the alleged infringements.”

Because OpenAI failed to show it has a compelling need for the material, it was improper to extend a waiver to work product not disclosed in the complaint.

Latham & Watkins LLP; Morrison & Foerster LLP; and Keker, Van Nest & Peters LLP represent OpenAI. Joseph Saveri Law Firm LLP and Matthew Butterick. represent the authors. Cafferty Clobes Meriwether & Sprengel LLP also represents author Michael Chabon.

The case is Tremblay v. OpenAI Inc., N.D. Cal., No. 23-cv-03223, order 8/8/24.

Learn more about Bloomberg Law or Log In to keep reading:

See Breaking News in Context

Bloomberg Law provides trusted coverage of current events enhanced with legal analysis.

Learn more about Bloomberg Law or Log In to keep reading:

See Breaking News in Context

Already a subscriber?