Morrison Foerster attorneys say the latest Code of Practice draft for copyright-related obligations in the EU AI Act lessens uncertainty for stakeholders.
The EU AI Office on the Code of Practice’s March 11 draft on general-purpose model provider obligations under the EU AI Act is more workable than its predecessors. The CoP must be finalized by May 2, leaving GPAI model providers just three months before their obligations come into force on Aug. 2.
The stakes are high: A provider’s failure to comply will risk fines of up to 3% of the provider’s annual global turnover or 15 million euros ($16.2 million), whichever is higher, and possibly even an EU ban on the model.
More than 1,000 stakeholders have been working with the EU AI Office on the CoP. This article gives an overview of the CoP status regarding copyright-related obligations.
Practice and Compliance
The EU AI Act introduced the concept of a CoP as a detailed guide for GPAI model providers in meeting their obligations under the act. Although CoP adherence is voluntary, it will demonstrate compliance with the act until “harmonized standards” for the obligations are established. Alternatively, providers may choose other means of compliance, subject to individual assessment by the European Commission.
The third CoP draft eases the copyright-related measures, which are now based on the principle that compliance should be commensurate and proportionate to the size and capacity of the individual provider.
Copyright Policy
GPAI model providers must put in place a policy to comply with European copyright law. CoP specifications require:
- Providers must assign internal compliance responsibilities, and a single document must describe all the provider’s copyright-relevant commitments. Publishing an up-to-date policy summary is encouraged.
- Providers are asked to mitigate the risk that a downstream AI system into which the model is integrated will repeatedly generate copyright-infringing output. They must make “reasonable efforts” to prevent model memorization of training content that could lead to such output. Providers must also prohibit copyright-infringing use of the model in their use policy and terms and conditions, with a carve-out for open-source models. The CoP makes clear that these measures must be taken regardless of whether the model integration is vertical (by the provider) or horizontal (by another entity).
- The first CoP draft stated that in the case of modification and fine tuning, a provider’s obligations only relate to these actions. Although this was removed from the CoP text, this statement remains in the AI Office’s GPAI model Q&A. The AI Office has also indicated that it will include this issue in its EU AI Act guidance.
Opt-Outs and Training
Providers must identify and comply with machine-readable rights reservations, or opt-outs, from rightsholders for the use of their content for text and data mining that would otherwise be permitted under the EU TDM copyright exception. A recital calls for this obligation to apply to AI training conducted outside the EU as well as inside the EU.
The CoP specifications say that for text and data mining of lawfully accessible material:
- Consistent with the TDM copyright exception, providers must not circumvent effective technological measures (e.g., paywalls) when crawling the web themselves or having it crawled on their behalf. Otherwise, the content wouldn’t be lawfully accessible as required by the copyright exception.
- Providers must make “reasonable efforts” to exclude “piracy domains” from their crawling.
- When providers use training material that they haven’t web-crawled themselves or through agents on their behalf (i.e., third-party data sets), and for which they haven’t obtained authorization from the rightsholder, providers must make “reasonable efforts” to obtain information about whether the material was obtained by web-crawlers respecting rights reservations in the Robot Exclusion Protocol (robots.txt). The CoP expressly excuses providers from an obligation to perform work-by-work assessments. (The previous CoP draft asked providers to conduct copyright due diligence by obtaining copyright compliance “assurance” from the contractual partner or by assessing samples from the dataset.)
For recognizing machine-readable opt-outs:
- Providers must use web crawlers that read and follow instructions expressed in robots.txt, which is in line with the view of legal scholars and technical experts on machine-readability.
- “Best efforts” must be made to comply with other appropriate machine-readable protocols expressing opt-outs, e.g., asset- or location-based metadata, where these are either the product of a cross-industry standard-setting process, or for a more immediate solution, a widely adopted, cross-sector state-of-the-art solution approved through a stakeholder discussion facilitated at EU level. This clarifies previous CoP drafts that required compliance with other vague “means” of opt-outs.
- The CoP underlines that rightsholders remain free to effectuate their opt-out by any other appropriate means. However, such alternatives won’t be expressly covered by the CoP measure to identify and comply with rights reservations.
Although not an obligation under the act, the CoP directs providers to designate a point of contact for the benefit of affected rightsholders. Specifications of what is a training data summary and a “placing on the market” of a GPAI model isn’t part of the CoP.
Takeaways
While the CoP doesn’t resolve all legal and technical uncertainties for GPAI model compliance under the act, it does offer a first structured compliance pathway. Considering that the EU AI Act merely states that providers must implement a policy to comply with EU copyright law and the EU TDM exception, the CoP adds a great deal of specification. In particular, the clarification of robots.txt as the current standard for TDM opt-outs significantly reduces uncertainties while also paving the way for stakeholders to collaboratively establish further standards.
A key challenge remains defining what model providers must do to prevent copyright-infringing output from downstream systems. The latest draft has moved away from the term “overfitting,” criticized as a mere technical description, but the current approach—preventing the repeated memorization of training content in output—remains ambiguous. Crucially, the revisions made in the drafting process reflect that stakeholder input on what is practical and feasible is being acknowledged. Now is the final opportunity to be heard, as stakeholders are invited back to submit final feedback by March 30 and to participate in the last discussion rounds. Therefore, speak now or forever hold your peace.
This article does not necessarily reflect the opinion of Bloomberg Industry Group, Inc., the publisher of Bloomberg Law and Bloomberg Tax, or its owners.
Author Information
Paul Goldstein is professor of law at Stanford Law School and of counsel at Morrison Foerster.
Christiane Stuetzle is partner at Morrison Foerster and co-chair of the firm’s global film & entertainment practice.
Susan Bischoff is a lawyer, PhD candidate in copyright law, and a research assistant at Morrison Foerster.
Write for Us: Author Guidelines
To contact the editors responsible for this story:
Learn more about Bloomberg Law or Log In to keep reading:
Learn About Bloomberg Law
AI-powered legal analytics, workflow tools and premium legal & business news.
Already a subscriber?
Log in to keep reading or access research tools.