OpenAI’s Tactics Test First Amendment in New York Times Fight

OpenAI’s effort to force The New York Times to hand over reporting notes behind millions of articles allegedly used to train ChatGPT could be a hardball tactic to run up the newspaper’s legal bills or stall its copyright infringement suit, according to several intellectual property and First Amendment attorneys.

OpenAI Inc. has adopted an aggressive discovery stance against New York Times Co. over the past several months, and the parties ultimately have reached an impasse.

LISTEN: This season on UnCommon Law we examine how large language models actually ingest and learn from the billions of online data points, including copyrighted works.

The AI company asked Judge Sidney H. Stein of the US District Court for the Southern District of New York to step in and compel the Times to produce reporters’ notes, interview memos, and other materials for each of the roughly 10 million contested articles the publication alleges were illegally plugged into the company’s AI models. OpenAI said it needs the material to suss out the copyrightability of the articles. The Times quickly fired back, calling the request absurd.

The unprecedented move levies a hefty ask on the publication and, if successful, would pierce reporters’ protections that typically cloak the newsgathering process and prevent journalists from being forced to reveal sources.

“It’s a First Amendment concern because you have a litigant using the power of the judiciary to rifle through reporters’ notebooks,” said Jeff Kosseff, an attorney and non-resident senior fellow at the Future of Free Speech.

UCLA media law professor Doug Lichtman called the discovery requests “abusive,” saying they detract from the legal issues at the heart of the lawsuit.

“New York Times has a limited budget to spend on a case like this, because, win or lose, these issues are not core to its business,” he said in an email. For OpenAI, “by contrast, this is a bet-the-company fight. That imbalance means that OpenAI might want to run up the bill and thereby force the Times to settle, depriving the court of its opportunity to address these important copyright questions.”

If granted, the motion would obligate the Times to cough up a century’s worth of reporters’ materials, the paper said in its response. The sheer volume of the request makes it unlikely to be granted in full, according to Joshua Rich, an IP lawyer from McDonnell Boehnen Hulbert & Berghoff LLP.

Ultimately, OpenAI’s stance may be a more strategic symbol, “a shot across the bow of the Times to show how onerous this case is going to be,” he said.

Discovery Dispute

The Times sued OpenAI and Microsoft in the Manhattan-based Southern District in December 2023, accusing them of infringing copyrights by loading articles into training datasets for its generative AI models. OpenAI moved to dismiss large parts of the suit in February, arguing the Times used prompt hacking to generate the allegedly infringing outputs.

The tech company in March submitted a discovery request for reporters’ notes to the Times, along with several other requests for documents related to employees’ OpenAI account information and the prompts plugged into ChatGPT to produce the allegedly infringing outputs, according to court records. Meanwhile, the Times demanded information on how OpenAI’s generative AI models were trained.

Amid the ensuing discovery dispute, on May 20, the Times asked Stein to schedule recurring, twice-monthly discovery status conferences, saying OpenAI and Microsoft were moving too slowly given the September 17 discovery deadline. OpenAI responded on May 22 saying the conferences would be “a waste of judicial resources.” The next day, the company filed a motion to compel the Times to produce documents related to the prompts fed to ChatGPT. It filed another motion to compel, this time for reporters’ notebooks, on July 1.

OpenAI clarified to the newspaper that it’s seeking “reporters’ notes, interview memos, records of materials cited in the asserted works” in May, according to court records.

The request for journalists’ reporting information “is necessary to determine whether and to what extent the Times is pursuing claims for infringement of works that are not protected, in part or in full, by copyrights the Times owns,” OpenAI’s counsel said in a July 1 letter to Stein. The relevance of the materials to the question of copyrightability, as well as the fact that the files requested are not accessible from other sources, means reporters’ privilege doesn’t justify withholding the information, the company said. It added it’s not seeking the identities of any confidential sources.

The Times responded two days later, arguing OpenAI’s request “turns copyright law on its head” and cites no caselaw to support it.

The demand for reporters’ notes is “unprecedented, harassing, and absurd,” Susman Godfrey LLP partner Ian Crosby, lead counsel for the Times, said in an email to Bloomberg Law.

“Courts protect press freedom by prohibiting litigants from abusively seeking the disclosure of reporting materials that are wholly irrelevant,” Crosby said. “This request is pure retaliation against a news organization for asserting its well-established intellectual property rights.”

Copyrightability of News

OpenAI’s request for reporting materials is unheard of in the context of copyright disputes, according to media and entertainment lawyer Adam Weissman. While it’s normal for lawsuits to include a challenge to the validity of a copyright, it’s unusual to see a defendant challenge the copyrightability of news, he said.

To bring a copyright infringement suit to court, you need to have a copyright registration—which the Times has for the articles it says were pumped into ChatGPT. That registration should be prima facie evidence of copyrightability, Jason Bloom, an IP attorney with Haynes & Boone LLP, said.

Challenging the validity of that registration is a feeble argument, too, he said, because news articles are considered “obviously” copyrightable. Even though articles convey facts—which on their own are not protectable intellectual property—originality is infused in the organization and sequencing of those facts.

“The standard for copyrightability for authorship is really low, so it’s a tough bar for OpenAI to hit anyway,” William Stroever, IP attorney with Cole Schotz PC said.

The Times argued the request bears no relevance to the case: the expressive nature of a work doesn’t need outside materials but is determined by reference to the work itself, Crosby wrote in the July 3 response. It also invades reporters’ privilege, which protects the identities of journalists’ sources regardless of whether those identities are confidential, the response said.

OpenAI didn’t “address the chilling effect that such massive discovery requests would have on a news organization’s reporting—and its ability to bring lawsuits to defend its copyrighted works,” the filing said. “Indeed, given the wildly improper scope of this request, one has to wonder if a chilling effect is what OpenAI, who appears to have stolen from millions of content creators, is hoping for.”

OpenAI didn’t respond to repeated requests for comment.

“If there’s a compelled disclosure ordered in a copyright case like this, it does send a message to future litigants that this is a vulnerability for news organizations,” said Jane Kirtley, a media ethics and law professor at the University of Minnesota.

Looking Ahead

OpenAI’s request could be a typical ask-for-the-moon-settle-for-the-stars strategy that is often seen in discovery, Rich said. While the request is odd within the context of this lawsuit, it’s not unlike boilerplate discovery requests about allegedly infringed works in other copyright cases.

“I don’t think there’s any chance that that OpenAI gets all of the reporters’ notes from every article,” Rich said, adding that he expected the request would probably be narrowed to a small number of representative works.

The move could also be a delay tactic to push the lawsuit out as far as possible, Weissman said, especially as OpenAI and other artificial intelligence companies await possible government regulation or congressional action on generative AI.

“If they get everything they want, this will extend deadlines on the trial and on the entire process for this case, which is probably something they want to do,” he said. “If a negative judgment comes out before positive things are legislated in their favor, that’s obviously not a good look.”

A complete approval would have a lasting effect on copyright lawsuits—plaintiffs may be forced to prove copyrightability from scratch, a burden they haven’t had to carry, Stroever said. It could also severely wither reporters’ privilege, drawing an easy path for people to lay their hands on private newsgathering processes by deliberately infringing copyrights, waiting for an infringement suit, and then demanding reporters’ notebooks to prove the original work is copyrightable, Stroever added.

“If this goes through with any major level of success, they’re certainly setting things up for copyright infringement cases to be handled differently going forward, not just for AI, but for really any type of work,” Weissman said.

The case is: The New York Times Company v. Microsoft Corporation et al, S.D.N.Y., 1:23-cv-11195.

OpenAI’s effort to force The New York Times to hand over reporting notes behind millions of articles allegedly used to train ChatGPT could be a hardball tactic to stall its copyright infringement suit.@AruniSoni explains:https://t.co/yd7uDZdfDJ pic.twitter.com/73xNbhCJ9E
— Bloomberg Law (@BLaw) July 18, 2024

Learn more about Bloomberg Law or Log In to keep reading:

See Breaking News in Context

Bloomberg Law provides trusted coverage of current events enhanced with legal analysis.