Trade Secrets Risk Exiting a One-Way Door When Data Is Fed to AI

Trade secret exposure can happen before artificial intelligence systems receive “training data.” Once information enters an AI-enabled system, there is no reliable or practical way to fully withdraw it from most large language models or agent-based workflows.

Focusing only on whether an artificial intelligence provider “trains on user data” misses the more immediate source of trade secret exposure. Most leakage occurs outside formal training pipelines through routine employee use.

Employees and contractors paste source code, design documents, technical specifications, and internal analyses into AI tools to debug errors, summarize materials, or draft work product under time pressure. These disclosures are often informal, undocumented, and repeated across teams. Sensitive information therefore can leave the organization long before any training dataset is involved.

Because trade secret dissemination may be nearly impossible to undo once confidential information leaves controlled channels, the most practical approach rests on three pillars:

Strong front-end controls to reduce disclosure risks
A structured response plan to contain dissemination and document protective efforts
Early engagement with experienced counsel to align governance, contracts, and remediation strategies with evolving AI systems.

Rising AI Agents

Trade secrets typically leak in two ways: through incorporation into a model’s training dataset or through routine interaction with AI tools. In practice, the latter is often the faster and more common pathway.

AI agents and workflow-integrated systems amplify this risk. Unlike a one-off prompt, many agent systems retain memory, logs, embeddings, or intermediate summaries across sessions and connected tools. Information disclosed once can be reused in later outputs, propagated to other systems, or incorporated into downstream processes without the user realizing it.

From a legal standpoint, secrecy can be lost through retention, reuse, or redistribution of information even if the data never enters a formal training dataset.

Workforce surveys suggest the issue is already widespread. Roughly four in 10 employees report entering sensitive workplace information into AI tools without employer authorization, according to Security Management. Disclosure also occurs inadvertently through embedded AI features in software such as grammar assistants, document editors, code completion tools, and collaboration platforms.

Against that backdrop, many organizations ask whether confidential code or documentation can be removed once it enters an AI system. With current technology, the answer is usually no.

Organizations therefore should implement structural safeguards. Sensitive material shouldn’t be entered into consumer-facing AI interfaces or public chatbot links. Where AI functionality is necessary, companies should obtain enterprise licenses that provide contractual controls over data use, retention, and auditability and that restrict training or secondary use of customer inputs.

Technically Difficult Deletion

LLMs blend patterns from their inputs into complex internal representations, including model parameters, embeddings, and memory structures. Once confidential material is absorbed into those systems, isolating and deleting a single company’s information may be impossible without dismantling the system itself.

Researchers are exploring machine unlearning and model editing, but these approaches remain experimental. Recent work shows that supposedly “unlearned” content can sometimes be partially recovered.

In practice, providers can suppress outputs with filters, but suppression isn’t deletion. The closest remedy is retraining or rebuilding the system on clean data. For frontier models, that can cost tens or hundreds of millions of dollars. The Stanford HAI AI Index estimates training costs of roughly $78 million for GPT-4 and $191 million for Gemini Ultra.

Development Pipeline Contamination

Courts have recognized similar problems in traditional technology disputes. Once trade secrets are incorporated into complex systems, assurances of non-use are often insufficient.

In Waymo LLC v. Uber Technologies, Inc., the court focused on whether Waymo’s autonomous-vehicle design information had already been integrated into Uber’s engineering processes. Once incorporated into a development pipeline, the information could influence technical decisions in ways that could not be reliably isolated or reversed. The court treated that contamination as irreparable harm.

Regulators have taken similar positions in AI contexts. In FTC v. Everalbum, Inc., the Federal Trade Commission required deletion not only of improperly obtained biometric data but also of the AI models trained on that data.

Together, these cases reflect a consistent principle: When information is integrated into systems designed to retain and reuse data, the loss of secrecy occurs at incorporation.

NDA, Confidentiality Breakdown

Traditional nondisclosure agreements assume limited disclosure in controlled settings. AI agents and integrated workflows undermine that assumption.

If a system retains prompts, logs interactions, or stores embeddings for reuse, a trade secret may persist beyond the original task and user. A single paste of proprietary code into an AI-enabled workflow can later appear in outputs, be routed into other tools, or be reused across prompts.

What feels like one disclosure can become many disclosures across a toolchain.

An Incomplete Remedy

Takedown requests and formal notices to AI providers still matter and should be handled promptly. Disclosure may occur through routine interaction with AI-enabled features embedded in business applications.

A well-prepared notice can limit further dissemination and preserve contractual and statutory remedies. It also creates a record that the company acted promptly to protect its secrecy.

Even so, takedowns rarely restore confidentiality once information leaves controlled channels.

Governance as Protection

In this environment, the most effective protection is governance at the front end. Companies should treat public AI systems as untrusted for sensitive material and prohibit entering source code, specifications, or design documents into public chatbots.

Agreements with employees, contractors, licensees, and AI vendors should address AI use explicitly. Policies should cover agent memory, logs, embeddings, and tool integrations, not just “training data.”

Where workflows involve persistent agents, organizations should prohibit feeding confidential material into systems that retain prompts or reuse content across sessions unless the company controls the environment and can enforce deletion and auditability.

This article does not necessarily reflect the opinion of Bloomberg Industry Group, Inc., the publisher of Bloomberg Law, Bloomberg Tax, and Bloomberg Government, or its owners.

Author Information

Justin Pierce is co-chair of Venable’s intellectual property division with expertise in artificial intelligence and innovative technologies.

Brandon Phemester is a patent agent with Venable with expertise in artificial intelligence, biotechnology, pharmaceuticals, and advanced manufacturing.

Write for Us: Author Guidelines

Learn more about Bloomberg Law or Log In to keep reading:

See Breaking News in Context

Bloomberg Law provides trusted coverage of current events enhanced with legal analysis.