Anthropic Settlement Resets Balance of Power for Content Creators

Oct. 29, 2025, 8:30 AM UTC

Anthropic’s $1.5 billion settlement agreement with a class of authors whose books were allegedly pirated for AI training—and its eventual approval—mark a turning point in how companies must approach data acquisition for developing artificial intelligence.  

District Court Judge William Alsup’s Sept. 23 ruling validated AI training as transformative fair use while drawing a sharp distinction between legitimate data acquisition and data downloaded from “pirate” sites and “shadow libraries,” which could fundamentally alter the negotiating landscape for AI training data.

While the precedent is limited, and new cases may offer different perspectives, Bartz v. Anthropic, provides both a warning and a roadmap for corporate counsel and AI developers. 

New Leverage

The Anthropic case provides content owners with leverage they previously lacked. TheNorthern District of California’s framework creates new liabilities related to acquiring data that can arise even when the AI training itself qualifies as fair use.

If pirate sites or other illegitimate channels obtain training data, companies can face massive statutory damages. With penalties reaching $150,000 per willfully infringed work, a training dataset that potentially contains millions of pirated works could create a liability dwarfing most companies’ market cap.

Anthropic’s decision to settle, despite prevailing on the fair use question, demonstrates how concerns about acquisition methods can outweigh favorable legal precedent. 

This distinction between acquisition and use gives publishers a powerful negotiating position. Publishers can approach AI companies with a straightforward proposition: Acquire rights to content legitimately or face legal actions and the risk that discovery reveals pirated materials in their training data.

The threat is incisive because many AI companies can’t fully trace the provenance of their training data pulled from web scraping or obtained from third-party sources. It’s hard to claim there isn’t any pirated data in the dataset when companies don’t know what is and isn’t there.

Anthropic also establishes that courts will scrutinize data sources. Alsup’s rejection of Anthropic’s “research purpose” defense creates persuasive authority other courts are likely to follow.

AI companies no longer can assume that asserting a research or educational purpose will excuse potentially questionable acquisition methods. This strengthens publishers’ positions in both litigation and data use negotiations.

The settlement also validates a class action approach to AI copyright claims, allowing content creators to aggregate claims and share litigation costs, dramatically lowering barriers and increasing potential damages. 

Mitigating Copyright Risk 

Considering the Anthropic settlement, the following framework provides practical steps to reduce potential copyright exposure: 

Perform a data audit. The source of every component of training data should be documented. This includes creating and maintaining detailed records showing where training data originated, how it was acquired, and what licenses or other permissions have been obtained. Identifying problematic internal data sources early provides options that are greatly limited once a lawsuit is filed. 

Identify high-risk data. Any data obtained from illegitimate sites, through scraping violative of applicable terms, or without clear legal authority should be flagged immediately. For any such data, companies face potentially difficult decisions about whether to purge the data and retrain models or seek retroactive permission for use. While retraining can be expensive, it may cost less than defending or settling a class action. 

Implement acquisition protocols.  For AI development, companies should implement protocols that require training data review prior to incorporating new data sources. These protocols should mandate the documentation of the legal basis for using each data source—a license, fair use for acquisition (not just training), public domain status, or explicit permission.  

Prioritize use rights.  Companies should prioritize licensing agreements or otherwise obtain the right to use training data with major publishers and content creators. Use rights should be broad enough to cover derivative models and future use cases. While this may require up-front costs, it provides a high degree of legal certainty. The leverage publishers now possess means that licensing costs may rise, likely entitling early movers to more cost-effective terms than delayed actors.  

Engage content owners. Companies with existing models in production containing uncertain data provenance should consider engaging content owners proactively. Approaching publishers prior to a dispute with a willingness to negotiate use rights can prevent an adversarial posture and guard against damages exposure that settlement negotiations would likely entail. And publishers may prefer licensing revenue to the uncertainty of litigation. 

Consider other technical issues. Companies should implement technical measures to prevent models from reproducing copyrighted content verbatim. While Anthropic suggests training constitutes fair use, models that reproduce substantial portions of copyrighted works on demand may undermine fair use defenses and create separate infringement liability. Technical safeguards such as output filtering and response monitoring can reduce this risk. 

Looking Forward 

The Anthropic settlement won’t be the last chapter in AI copyright litigation. Music publishers, news organizations, and other content creators have filed similar lawsuits that will test the principles established in Alsup’s ruling. And there may still be claims brought against Anthropic outside of this class and settlement, in addition to claims against other large language models that have downloaded works for AI training.

Alsup’s ruling may not be the last word as to authors of literary works, but its effect will be material even as additional cases likely will add to the framework governing AI training and data acquisition.
Innovation doesn’t excuse illegitimate data acquisition.

AI companies must choose between investing in proper data use rights and facing the substantial risk of class action litigation. Publishers and content creators will increasingly demand their share of the value that AI companies create from human creative output. 

Companies that thrive in this new landscape will be those that recognize early that data provenance isn’t just a technical concern, but a fundamental business and legal imperative. Under the framework established by Anthropic, the legitimacy of your data sources matters as much as the sophistication of your models. 

The case is Bartz v. Anthropic PBC, N.D. Cal., No. 24-cv-05417.

This article does not necessarily reflect the opinion of Bloomberg Industry Group, Inc., the publisher of Bloomberg Law, Bloomberg Tax, and Bloomberg Government, or its owners.

Author Information

Jason Loring is partner in Jones Walker’s corporate practice group and co-leader of the privacy, data strategy, and artificial intelligence team in the firm’s Atlanta office.

Whit Rayner is partner in Jones Walker’s litigation practice group and co-leader of the intellectual property team in the firm’s Jackson, Miss., office.

Write for Us: Author Guidelines

To contact the editors responsible for this story: Max Thornberry at jthornberry@bloombergindustry.com; Rebecca Baker at rbaker@bloombergindustry.com

Learn more about Bloomberg Law or Log In to keep reading:

See Breaking News in Context

Bloomberg Law provides trusted coverage of current events enhanced with legal analysis.

Already a subscriber?

Log in to keep reading or access research tools and resources.