How Artists and AI Developers Can Navigate Uncertain IP Terrain

March 20, 2023, 8:00 AM UTC

The artificial intelligence landscape is a picture of both potential and uncertainty. AI companies are making moves that stir fresh legal questions—think DALL-E generating expansive art compositions and Stability AI allegedly processing millions of unauthorized Getty images.

While practitioners and scholars wrangle with legal issues like whether training AI on publicly available artworks is fair use and if AI-generated works are copyrightable, both visual artists who display their works online and AI developers who collect content to train their models could benefit from practical considerations in this uncertain terrain.

For Artists

To gather training data for their models, many AI developers use web crawling tools to scrape online materials. Artists who wish to prevent their works from appearing in these databases should first consider technical solutions, such as employing login mechanisms or deploying bot detectors to auto-ban scraping activity.

From a legal perspective, such measures might also facilitate claims against unwanted web scrapers under the Computer Fraud and Abuse Act.

However, artists may not want to restrict access to their site if they seek to drive traffic to their online galleries. In the absence of definitive legal guidance on whether artists could prevail on copyright infringement claims against scrapers, these artists should consider a contractual alternative. Artists should post terms of use for their websites with language specifically prohibiting scraping in connection with training AI models.

Under contract law in many states, such terms of use can be enforceable, particularly in clickwrap implementations, where AI developers deploying scrapers are required to affirmatively assent by checking a box. Browsewrap implementations, where developers receive a link to the terms without a checkbox, are less favored.

For artists publishing their artworks on third-party platforms or who use cloud-based editing or storage platforms for their works, these artworks are subject to the platform’s terms of use, including the third-party’s security and data collection policies. Artists should carefully review such terms, which may include a default opt-in consent allowing the platforms to use images or other data for AI training.

Such artists may also take advantage of NoAI metatags or similar tools to alert parties deploying scrapers of their consent terms. Still, given the lower level of control on these platforms, artists may wish to upload their artworks in segmented, distinct images—which may be scraped separately, and are thus challenging for AI models to resolve into a whole image. Artists can also upload only representative samples to third-party platforms while linking to the artists’ more extensive portfolios on their own websites with tailored terms of use.

To track their artworks, artists could add watermarks or metadata to all versions displayed online, particularly because AI-generated outputs have included distinctive portions of watermarks traceable to their original source. If artists recognize their watermarks in such outputs or otherwise become aware of unwanted use, they may send cease-and-desist letters to offenders. From a legal perspective, this approach may also serve as constructive notice sufficient to make even browsewrap terms enforceable.

Alternatively, absent contractual restrictions, artists may turn to the Digital Millennium Copyright Act takedown to put AI developers on alert. As seen in recent litigation, artists may allege DMCA Section 1202 violations if they believe AI developers are stripping or altering copyright management information like watermarks or metadata from copyrighted works.

Artists who implement these practices may deter AI developers, even without filing a legal action in court. But in case of further dispute, artists can choose to enforce in the Copyright Claims Board for monetary damages, but not injunctive relief. This is an alternative to the significant time and cost of formal litigation in federal courts.

Thinking “bigger claims” in enforcement, artists may also publish or register their works outside the US, in, for example, Europe, where artists may better claim attribution for their works or object to unauthorized alterations under their moral rights.

For AI Developers

Conversely, AI models are data-driven. If AI developers cannot rely solely on licensed materials or “freely available” materials, developers may consider implementing practices that can weigh favorably in the ambiguous “fair use” analysis and are instrumental to infringement risk management, like:

  • Diversifying input sources
  • Removing outputs that are visually similar to images in the training data
  • Tracking artwork inputs and corresponding scope of permitted use
  • Classifying developers’ own uses of scraped artworks

To expand, diversifying inputs can reduce outputs with “substantial similarity” to existing artworks and mitigate a single source’s recurrent undesirable traits, such as watermarks, irrelevant metatags, or low-resolution quality.

Controlling outputs could avoid regurgitation of specific images, and tracking and classifying can flag particularly litigious sources to avoid. It can also help ensure compliance with AI developers’ own stated policies in the collection and use of data—an area subject to increasing scrutiny by the Federal Trade Commission and other regulatory authorities.

In addition to being responsive to legal takedown notices, AI developers could provide on their websites an active opt-out mechanism for artists, or even a preemptive opt-in. If technically feasible, developers could build in the ability to re-train AI models at low costs to adapt to dataset modifications on a semi-regular implementation schedule—e.g., re-training a model to remove artworks of artists opting-out within a set time period.

Finally, in the spirit of building a collaborative relationship with creators, developers can adopt practices that address some artists’ concerns, such as displacement of artists and lack of credit to the original artists.

For example, developers can contractually limit certain uses of outputs, clearly inform users that their use of outputs may be subject to third-party rights, obligate users to disclose when they are using AI-generated outputs particularly for commercial purposes, and also publicly adopt ethics standards for creating “responsible” AI, such as the Partnership on AI’s Responsible Practices for Synthetic Media.

This article does not necessarily reflect the opinion of Bloomberg Industry Group, Inc., the publisher of Bloomberg Law and Bloomberg Tax, or its owners.

Author Information

Rachel Yu is a partner in S&C’s IP & technology transactions group and advises clients on complex technology, IP, cybersecurity, and major life sciences. She is a law lecturer at Columbia Law School.

KJ Lim is an associate in Sullivan & Cromwell’s IP & technology transactions group.

Angela Wu, a law clerk at Sullivan & Cromwell, contributed to this article.

Write for Us: Author Guidelines

Learn more about Bloomberg Law or Log In to keep reading:

Learn About Bloomberg Law

AI-powered legal analytics, workflow tools and premium legal & business news.

Already a subscriber?

Log in to keep reading or access research tools.