When a federal court decertified a $20 billion class in In re Apple iPhone Antitrust Litigation in late October, it didn’t throw out a legal theory—it threw out a data model.
Millions of iPhone users who bought apps and in-app content through the App Store were once potential class members. Today, they’re not, because the plaintiffs’ team couldn’t answer a seemingly simple question: Which real people, tied to which accounts, were actually harmed?
That failure was about data science, not antitrust doctrine. And it should be a wake-up call for every lawyer handling large-ticket class actions and complex litigation.
Machines vs. Spreadsheets
Consumers see eerily tailored ads after a single search or conversation and assume their phones are “listening.” What’s really happening is that big tech companies are operating enormous data-science engines that continuously analyze digital exhaust—IDs, devices, locations, transactions—to predict what we’ll do next and use that data to their benefit.
Contrast that with the way most plaintiffs staff large cases against those same companies – and companies to whom that data is transferred. The legal theory may be sophisticated, but the data work is often treated as an administrative chore–a process that looks more like this:
- Export some transaction data
- Hand it to a traditional claims administrator or expert
- Stitch it together with ad-hoc code and spreadsheets
- Hope the model survives rigorous Rule 23 and Daubert analyses
In Apple, that approach failed. The court initially certified a broad class of App Store purchasers but warned that certification would hinge on plaintiffs producing a “cognizable” damages model. At the heart of that model was a data-science assignment: Accurately match Apple IDs and payment records to real, identifiable consumers to prove class-wide injury.
The court didn’t get a data science analysis. It got data stitched together with ad hoc code and spreadsheets in which the judge noted “several alarming errors.” The result: A once-certified, multibillion-dollar class unraveled for lack of a data model grounded in true data science.
Threshold Question
The central lesson of Apple isn’t limited to class certification. It’s broader and simpler—in today’s age of big data, every stage of complex litigation involves some type of data analysis.From the day a large-scale action is contemplated, counsel must ask at each step of the process: “Is there a data science problem here?”
In modern class actions, the answer is almost always yes. Consider just three phases that recur in nearly every large matter:
Class Certification and Class Definition: Class-wide predominance depends on whether you can reliably identify who’s in the class and who’s not. That is an identity-resolution problem at scale—normalizing names and aliases, reconciling multiple accounts and payment instruments, resolving household versus individual purchasers, mapping digital IDs to human beings. Treat that as “mailing list work,” and you risk another Apple-style collapse.
“Best Practicable” Is a Data Question First: Courts assume “best practicable” notice reaches the people who were actually harmed. But in a mobile-first, multi-channel world, that assumption is only defensible if the underlying data work is rigorous.
Effective notice demands industrial-grade identity resolution and channel optimization. Address history, email and phone validation, deduplication, language and demographic analysis. Weak pipelines systematically miss mobile-only households, frequently moving consumers, and language communities. That administrative shortfall has become an access-to-justice problem.
Claims Processing, Eligibility, and Fraud: Claims administration is no longer about checking boxes and rubber-stamping documentation. Eligibility itself is a big-data problem: Did the claimant live in a specific city during the exposure period? Were they an account holder or an authorized user? Are multiple claims really the same person?
Traditionally, we push that burden onto claimants: “Attach a bill. Prove you lived here.” Modern data science allows the opposite—quietly validating claims against historical address, transaction, and network data on the back end, while flagging anomalies and suspected fraud through sophisticated pattern recognition. If that work is done poorly, or not done at all, cases invite later objections, re-openers, and collateral attacks.
Across these phases, the pattern is the same: Big litigation is now big data, and big data problems require real data science, not improvised scripts.
Apple may seem like a technical fight over matching logic. But the underlying claim—that Apple’s app distribution and 30% commission structure overcharged consumers—reflects a larger issue. When a class is decertified because the plaintiffs can’t present a defensible model of who was harmed, millions of consumers may lose any realistic path to relief.
A New Playbook
If consumers are to stand any chance in these cases, their legal counsel must eliminate Big Tech’s data advantage and begin taking data science just as seriously as they take the law and legal theories behind their cases. At minimum:
- Ask the data science question early. At case intake and again when designing discovery, identify which issues turn on large-scale data analysis (class membership, exposure, injury, allocation).
- Design discovery around the data pipeline, not the other way around. What raw data, keys, and reference tables will you need to build a defensible model? Are they available? In what format? With what gaps?
- Build or buy real data-science infrastructure. Treat identity resolution, record linkage, and claims validation as engineering problems. Invest in data engineers and statisticians or specialized providers who can construct and document robust pipelines.
- Interrogate your own model before your opponent. Stress-test your matching rules, run sensitivity analyses, and sample your outputs. If you can’t explain why a given record is in or out of the class, a judge will notice.
- Document for Daubert from day one. Keep an audit trail from ingestion to final opinion. Courts no longer “trust the expert” without visibility into the machinery.
In the age of big data, every major case is, at bottom, a fight over data. If plaintiffs and courts don’t insist on data science rigor, we’ll keep seeing versions of Apple where legally plausible claims come undone because the numbers behind them can’t be validated and trusted.
The case is In re Apple iPhone Antitrust Litig., N.D. Cal., No. 4:11-cv-06714, 10/27/25.
This article does not necessarily reflect the opinion of Bloomberg Industry Group, Inc., the publisher of Bloomberg Law, Bloomberg Tax, and Bloomberg Government, or its owners.
Author Information
Don Beshada, a former litigator, is CEO of Covalynt, a technology company advising in complex litigation.
Write for Us: Author Guidelines
To contact the editors responsible for this story:
Learn more about Bloomberg Law or Log In to keep reading:
See Breaking News in Context
Bloomberg Law provides trusted coverage of current events enhanced with legal analysis.
Already a subscriber?
Log in to keep reading or access research tools and resources.