Anonymization is a trusted tool in the data privacy toolkit for organizations seeking to use data for innovative purposes while minimizing privacy risks and compliance obligations.
The promise is clear: Properly anonymized datasets, where individuals can’t reasonably be reidentified, fall outside the scope of data protection regimes such as the EU’s General Data Protection Regulation and cross-border data transfer frameworks. This reduces compliance pressure and paves the way for international data-driven initiatives.
However, the rise of artificial intelligence and a fragmenting global regulatory landscape are turning this promise into a minefield. Businesses relying on data must be aware of the critical challenge posed by the shifting test for what is “truly” anonymous, as courts and regulators question traditional methods.
Data Protection
AI could be kryptonite to anonymization.
The core problem is the “mosaic effect"—how data points can be pieced together to re-identify individuals—which is now supercharged by AI because machine learning models put together data points from different sources.
AI can analyze a dataset of anonymized journey start and end points and cross-reference it with public social media posts, voter rolls, and scraped or stolen data sets. This can enable it to pinpoint an individual with astonishing accuracy.
For example, researchers have shown that it’s possible to re-identify individuals from “anonymized” datasets of their browsing history by combining that data with other public information. Large language models amplify this risk, analyzing vast, unstructured datasets to find connections that typically are invisible to any human analyst.
Fractured Legal Map
The legal landscape for anonymization is more fragmented than ever, making this technological challenge even more difficult. Regulators worldwide are taking different and potentially conflicting approaches to the definition of “anonymous,” leaving multinational corporations in a compliance crossfire.
The EU GDPR sets a notoriously high bar. Data is treated as anonymous only when it can’t be identified through reasonable means. The European Data Protection Board has reinforced this position, stating that an AI model could contain personal data if the underlying training data can be extracted through queries.
However, recent case law signals a shift towards a more pragmatic approach. The assessment of whether information is considered personal data must be based on the means reasonably available to the holder of the data, rather than to hypothetical bad actors.
This has offered hope to businesses that receive data without identifiers for analytics purposes–particularly where they don’t have the means to decode the data. In those circumstances, they may conclude that the data is anonymous and thus free from privacy law requirements.
The UK also has signaled a desire for a more pragmatic, risk-based approach, but its legal standard is still tethered to the high EU bar for now. The Information Commissioner’s Office has made it clear that the context of the data and the capabilities of potential intruders are critical factors.
Meanwhile, the US remains a sectoral and state patchwork. Some federal laws govern health or financial data, but there is no comprehensive federal privacy law.
Instead, a growing number of states are creating their own rules. The California Consumer Privacy Act and other state consumer privacy laws, for example, have definitions of “personal data” that exempt “de-identified” or “deidentified” data. However, the states vary greatly in whether “aggregated” data is considered personal data or exempt, creating a complex compliance map.
In Asia and the Pacific, legal obligations range from China’s strict Personal Information Protection Law requirement for irreversible anonymization, to the more pragmatic, risk-based assessments used in jurisdictions such as Singapore and Hong Kong. However, a recent push for harmonization among several APAC countries is underway.
A New Playbook
As a result of this mixed bag of legal approaches, organizations must adopt a more sophisticated, technologically-aware, and risk-based approach to anonymization.
Anonymity isn’t an on/off switch; it’s a spectrum. Boards and legal teams must understand that what is “anonymous enough” for one purpose may not be for another, and that the risk of re-identification is dynamic. This requires embedding regular, aggressive re-identification testing—essentially, running your own “red team” attacks—into your data protection impact assessment process.
The focus should be on a toolbox of privacy-enhancing technologies:
- Differential privacy offers a more resilient approach by adding mathematical “noise” to a dataset. This can provide a statistical guarantee that the presence or absence of any single individual won’t significantly affect the outcome of a query, protecting against many forms of inference attacks. But even this tool depends on how it’s implemented, with many businesses having to weigh up the anonymization advantages of high levels of noise against the risk of distorting the analytics results.
- Federated learning is another powerful tool, allowing AI models to be trained on decentralized data without the raw data ever leaving its source. This is invaluable for collaborations, such as in medical research, where data can’t be pooled directly.
- Synthetic data generation involves creating an entirely new, artificial dataset that mirrors the statistical properties of the original without containing any real personal information. This allows data scientists to train models and run analytics with near-zero privacy risk, as there are no actual individuals to re-identify. This may be a significant tool in the AI space, where the expected need for more training data may be met by appropriate synthetic data.
Finally, legal and compliance strategies need to remain dynamic. A one-size-fits-all global anonymization policy is doomed to fail. Companies need a nuanced strategy that maps data flows against the higher bar of the EU and Brazil, the evolving standards in the UK and China, and the sectoral patchwork of the US. This means moving beyond simple contractual assurances from vendors to demand technical validation of their privacy claims.
For businesses, the path forward isn’t to abandon the immense value of data, but to embrace a new, more technologically robust and legally nuanced approach to privacy. Those who fail to adapt to risk finding, perhaps in front of a regulator or a judge, that their anonymous data wasn’t anonymous at all.
This article does not necessarily reflect the opinion of Bloomberg Industry Group, Inc., the publisher of Bloomberg Law, Bloomberg Tax, and Bloomberg Government, or its owners.
Author Information
Giles Pratt is a partner at Freshfields and heads the firm’s global IP, data and technology practice.
Richard Bird is a partner at Freshfields and heads the firm’s IP and commercial practice group in Asia.
Brock Dahl is a partner at Freshfields and spent several years with the National Security Agency.
Michael Schwaab and Christine Chong contributed to this article.
Write for Us: Author Guidelines
To contact the editors responsible for this story:
Learn more about Bloomberg Law or Log In to keep reading:
See Breaking News in Context
Bloomberg Law provides trusted coverage of current events enhanced with legal analysis.
Already a subscriber?
Log in to keep reading or access research tools and resources.


