OpenAI’s Privacy Bet in Copyright Suit Puts Chatbots on Alert

OpenAI Inc. is banking on a privacy argument to block a court’s probe into millions of ChatGPT user conversations.

That hasn’t worked so far as a winning legal strategy that can be used by other chatbot makers anticipating similar discovery demands in exploding chatbot-related litigation.

Instead, it threatens to turn attention to just how much information chatbots like ChatGPT are collecting and retaining about their users.

“If they really cared about privacy, they wouldn’t be collecting all this data to begin with,” said Ryan Kriger, a University of Vermont lecturer. “That’s another important point: one big part of data minimization is data retention. You don’t retain the data for longer than you need it for.”

OpenAI is pushing back against a discovery order forcing it to turn over 20 million de-identified user conversations in a major copyright lawsuit brought by newspapers including New York Times Co., citing privacy concerns. The AI outputs at issue may contain key evidence of copyright infringement to bolster the plaintiffs’ case.

The judge has previously shot down OpenAI’s privacy pleas, and did so again Nov. 7, saying it failed to explain how its users’ privacy was at risk given the protective order in the case and the removal of personally identifying information.

“It’s not a reasonable legal reaction—it’s a reasonable public relations reaction,” said Matthew Sag, AI law professor at Emory University.

The failed strategy foreshadows what other AI companies caught in the crosshairs of chatbot litigation may soon face. Companies like Anthropic PBC and Midjourney Inc. also face copyright infringement suits. OpenAI itself faces at least half a dozen lawsuits alleging it released psychologically manipulative and harmful models. Character Technologies Inc.'s Character.AI and Google have been accused of psychological abuse of teenagers.

Sag said he “absolutely” expects similar discovery requests in those suits, where the claims are specific to the model’s outputs.

“I would be careful not to over-dramatize it,” he added. “It’s not about making those user conversations public. It’s about giving that data so that they can do some analysis.”

OpenAI declined a request for comment.

The Privacy Argument

Earlier this year, OpenAI unsuccessfully argued privacy concerns when the court forced it to preserve its ChatGPT conversations, even ones users deleted.

The latest court order pushed OpenAI’s chief information security officer Dane Stuckey to publicly lambaste the request, saying the Times’ demand disregards “long-standing privacy protections” and breaks with common-sense security practices. A Q&A accompanying his statement stated the discovery probe ran against OpenAI’s “privacy standards.”

Most companies’ privacy policies, including OpenAI’s, account for that possibility, warning users that personal information may be collected and shared to comply with legal obligations.

Privacy laws—even the EU’s most stringent privacy regulation, the General Data Protection Regulation—also include exemptions allowing companies to share user data for legal demands.

“As far as privacy rights here, the law is so, so weak in this area,” said Kriger, former attorney in the Federal Trade Commission’s division of privacy and identity protection.

The Protective Order

The discovery probe from the Times also isn’t an unusual ask for a copyright lawsuit implicating user conversations with ChatGPT, Sag said.

The newspapers and OpenAI agreed user data will be scrubbed of identifying information, remain under a protective order, and kept out of the hands of anyone except the Times’ counsel and expert analysts.

While privacy professionals have long debated the efficacy of de-identification, threats of the data being re-identified are almost non-existent in this case, attorneys said.

“In the privacy world, re-identification is a real worry, and it’s something we try to deal with in legislation, ” Kriger said. “But they’re giving it to lawyers for this litigation. I can’t imagine that the lawyers are going to try to re-identify the information.”

The newspapers’ experts would analyze the data to figure out how often users query ChatGPT about news content, they said in a Nov. 12 filing pushing back against OpenAI. Under the protective order governing the case, those with access to materials produced in discovery are precluded from disclosing the data to anyone without permission and using it for commercial purposes, among other requirements.

The attorneys and discovery experts, who have been bound by the confidentiality provisions for two years of litigation, have given OpenAI “zero cause for concern they would ever breach those provisions,” the newspapers added.

In a Nov. 14 letter to the court, OpenAI said the protective order still doesn’t warrant such an expansive discovery order.

“No court would require Google to hand over millions of irrelevant Gmail messages, even under a protective order. There is no reason why a different standard should apply to ChatGPT conversation logs,” it said. The motion for reconsideration is pending a ruling from Magistrate Judge Ona T. Wang.

“It’s a very difficult argument to say that the court’s order is not sufficient,” said Kenneth Suh, leader of the AI practice group at McDonald Hopkins and member of its national data privacy and cyber team.

Fair Game

OpenAI’s privacy concerns may have been moot if the company restricted the amount of data it collected and the time it held on to it, attorneys said in interviews.

States are increasingly looking to enforce a concept known as data minimization, which would require companies to collect only the personal information they need to conduct their business.

That balancing exercise is different with chatbots.

With AI, retaining information can be in users’ interest. Chatbots are now a resource for mental health advice, companionship, and even legal counsel. Users—including OpenAI’s—can decide how much information they want the chatbots to retain to remember past interactions.

“All of these things that are so useful for their business can become part of litigation,” Suh said. “So make sure we understand what we’re doing and why.”

The case is In Re: OpenAI Copyright Infringement Litigation, S.D.N.Y., No. 1:25-md-03143.

To contact the reporters on this story: Cassandre Coyer in Washington at ccoyer@bloombergindustry.com; Aruni Soni in Washington at asoni@bloombergindustry.com

To contact the editors responsible for this story: Jeff Harrington at jharrington@bloombergindustry.com; Catalina Camia at ccamia@bloombergindustry.com

Learn more about Bloomberg Law or Log In to keep reading:

See Breaking News in Context

Bloomberg Law provides trusted coverage of current events enhanced with legal analysis.