One might expect to hear phrases like “Not statistical significance there!” and, “There is no way that anybody would tell you that these ten cases are statistically significant” hurled by a disgruntled professor at an underperforming statistics student. Yet, in January 2011, they came from the Supreme Court bench during the argument in Matrixx Initiatives Inc. v. Siracusano.^{1}*Matrixx Initiatives Inc. v. Siracusano*, 131 S. Ct. 1309 (2011) (No. 09-1156), 2011 WL 65028, at *12 & *16 (Kagan, J.).

This article discusses the Court’s unanimous opinion and argues for a narrow reading of its dicta about proof of causation. Part I describes the case. Part II explains more precisely than the Court did why plaintiffs alleging that a pharmaceutical company failed to disclose adverse event reports from physicians should not have to plead a statistically significant number of these case reports. It also clarifies the meaning of “statistical significance” as applied to such anecdotal data. Part III presents the *Matrixx* Court’s dicta on causation and statistical significance and shows that the Court’s remarks do not address the limited value of adverse event reports (AERs) in establishing causation in toxic tort litigation.

### I. Speak No Evil:

Zicam and Anosmia

An Arizona company, Matrixx Initiatives Inc., developed and sold over-the-counter cold remedies. In 2009, the FDA “concluded that these products may pose a serious risk to consumers who use them” and ordered the company to take corrective action.^{2}^{3}*Id*. at 270a.

Long before the case reports accumulated, however, a group of investors who had bought Matrixx stock in late 2003 and early 2004 filed a securities fraud class action. They alleged that in this early period, Matrixx issued reassuring statements that misled them. Specifically, plaintiffs alleged that Matrixx did not disclose reports from physicians about consumers who lost their sense of smell after using its homeopathic remedy, Zicam. The complaint stated that the company knew of at least 12 cases of anosmia following nasal inhalation,^{4}*Matrixx*, 131 S. Ct. at 1315; Brief for Petitioner at 16, *Matrixx Initiatives Inc. v. Siracusano*, 131 S. Ct. 1309 (2011) (No. 09-1156), 2010 WL 3334501 (referring to between 12 and 23 reports).^{5}*supra* note 2, at *78a & *82a.

The federal district court dismissed this complaint, but the Court of Appeals for the Ninth Circuit reinstated it. The Supreme Court granted certiorari to consider “[w]hether a plaintiff can state a claim under §10(b) of the Securities Exchange Act and SEC Rule 10b-5 based on a pharmaceutical company’s nondisclosure of adverse event reports even though the reports are not alleged to be statistically significant.”^{6}*Matrixx Initiatives Inc. v. Siracusano*, 131 S. Ct. 1309 (2011) (No. 09-1156), 2010 WL 1063936.^{7}

The answer is clear. Justice Sotomayor’s opinion for the Court explained that a reasonable investor might want to know of such reports if they (along with other information) are sufficiently extensive and disturbing that they could prompt the FDA to take some action or might lead to costly lawsuits. That is enough to trigger a duty to disclose in order to prevent other company statements from being misleading.^{8}*Id*. at 1321.

### II. Beneath the Simple Answer:

Why Allegations of Statistical Significance

Are Not a Viable Pleading Rule

But how much potentially disturbing information is enough to necessitate disclosure? Matrixx argued forcefully for a bright-line rule of statistical significance (at the 0.05 level).^{9}*David H. Kaye et al., The New Wigmore: A Treatise on Evidence: Expert Evidence* (2d ed. 2011).

For historical reasons, the most common “significance level” in biomedical and social science research is 0.05. This level ensures that inferring that something other than randomness is at work when, in fact, randomness is all there is to it occurs no more often than 1 time in 20 (in the long run). In other words, if Nature is perverse and arranges things so that a scientist never encounters a true association, always demanding significance at the 0.05 level protects the scientist from getting it wrong by declaring a false association more than 5% of the time (on average).

Mattrix wanted companies manufacturing biomedical products to have the same level of protection against investors’ complaints of fraud. Unless the plaintiffs alleged that the undisclosed case reports were numerous enough to have occurred with a probability (a “*p*-value”) of less than 0.05 given some model of the world that presumes absolutely no association between the adverse events and the company’s products, a court would have to toss out the complaint.

Yet, neither the company nor the Court was clear about how a *p*-value for case reports could be computed. One would have to consider, not just the number of case reports, or the proportion of these reports out of all purchasers of Zicam (the figure that Matrixx’s briefs emphasized), but the number expected under a model for the probability of anosmia in a world in which anosmia has no association with the use of Zicam. Matrixx’s lawyers grandly suggested “consider[ing] the background rate in a relevant population of the reported event”^{10}*supra* note 4, at 13.^{11}*Id*. at 14.*p*-value could be computed.^{12}*See* Joseph L. Gastwirth, Statistical Considerations Support the Supreme Court’s Decision in Matrixx Initiatives v. Siracusano (Aug. 12, 2011) (unpublished manuscript).

The Court was correct to reject a rule requiring pleading *p* < 0.05 in all Rule 10b-5 actions. A rigid 0.05 rule would have been somewhat arbitrary. It is hard to justify the particular threshold of 0.05 as opposed to, say, 0.04 or 0.06. Yet, many legal rules are no less arbitrary than this particular statistical convention. A more fundamental objection is that using any such cutoff at the pleading stage would not have achieved the purpose of a rule for screening out meritless cases based on the pleadings. Demanding statistical significance would have helped employ statisticians willing to look for models and data to achieve the desired threshold, but it would not accurately have filtered out the cases for which no reasonable investor would care about the “insignificant” number of case reports. Combined with other evidence about a drug’s safety, even a number for which *p* exceeds 0.05 or the like could justify investor concern about the drug’s future in the marketplace.

Interestingly, Matrixx did not question this possibility. It conceded that “a claim can be pled absent statistically significant evidence, but that’s … because doctors and researchers will conclude that there may be causation under … the Bradford-Hill [or similar] criteria. But nothing like that is pled here … .”^{13}*supra* note 1, at *5-*6. *See* Austin Bradford Hill, *The Environment and Disease: Association or Causation?*, 58 *Proceedings Royal Soc’y Med*. 295 (1965) (listing “nine different viewpoints from … which we should study association before we cry causation,” rejecting the proposition “that we can usefully lay down some hard-and-fast rules of evidence that must be obeyed before we can accept cause and effect,” and emphasizing that “formal tests of significance [are useful only to] remind us of the effects that the play of chance can create, and [to] instruct us in the likely magnitude of those effects.”).*only* evidence of risk that plaintiffs allege—is not so implausible as the one the Court dispatched.^{14}*See* Paul Meier et al., *What Happened in* Hazelwood: *Statistics, Employment Discrimination and the 80% Rule*, 1984 *Am. B. Found Res. J*. 139, 152 (“If a difference does not attain the 5% level of significance, it does not deserve to be given weight as evidence of a disparity. It is a ‘feather.' ”).

Ultimately, however, the proposed rule is undesirable even in the subset of cases in which AERs are all that plaintiffs can plead. In practice, significance testing is not the purely mechanical, objective process that some courts think it is.^{15}*See, e.g.*, D.H. Kaye, *Is Proof of Statistical Significance Relevant?*, 61 *Wash. L. Rev*. 1333 (1986).^{16}*see* Kaye et al., *supra* note 9.^{17}*See, e.g*., Sander Greenland & Charles Poole, *Problems in Common Interpretations of Statistics in Scientific Articles, Expert Reports, and Testimony*, 51 *Jurimetrics J*. 113 (2011).

The *Matrixx* opinion does not mention the problem of searching for significance. Guided by the Federal Judicial Center’s *Reference Guide to Epidemiology*, it talks fuzzily about “a study” being “statistically significant.”^{18}*Id*. at 1319 n.6. Loose statements about “samples” being significant are criticized in David H. Kaye & David A. Freedman, *Reference Guide on Statistics*, *in Reference Manual on Scientific Evidence* (3d ed. 2011).^{19}*Id*.^{20}*Matrixx Initiatives, Inc. v. Siracusano*, 131 S. Ct. 1309 (2011) (No. 09-1156), 2010 WL 4657930. It is trivial to construct examples contradicting this interpretation. A bag contains 100 coins. One of them is a trick coin with tails on both sides; the other 99 are biased coins that have a 0.3 chance of coming up tails and a 0.7 chance of coming up heads. I pick one of these coins at random and flip it twice to obtain two tails. On the basis of only the sample data of two tails, you must decide which type of coin I picked. The *p*-value with respect to the “null hypothesis” that the coin is the heads-tails one is the probability of seeing two tails in the two tosses: *p* = 0.3 x 0.3 = 0.09. Should you reject the null hypothesis and conclude that I flipped the unique tails-tails coin? Are the odds for this alternative hypothesis 10:1, as the brief of the statistical experts asserts?*p*-value of 0.09, the alternative hypothesis remains quite improbable.*p*-value of 0.09 always can be safely ignored. It is that the *p*-value, by itself, cannot be converted into a probability that the alternative hypothesis is true (“that the adverse effect is ‘real’”). Knowing that the two tails arise only 9% of the time when the head-tails coin is the cause does not imply that 9% is the probability that a heads-tails coin is the cause or that 91% is the probability that the tails-tails coin is the “real” cause.*p*-value.^{21}*See, e.g*., Kaye et al., *supra* note 9. Unlike classical statisticians, who must remain silent about the probabilities of hypotheses, Bayesian statisticians regard probabilities for hypotheses as analogous to the “prior” probabilities for picking a type of coin in the example of the preceding note. They can compute how sample data changes a prior probability. In the coin example, the prior odds for a tails-tails coin grew from 1:99 to a little under 1:9. The sample data supported the alternative hypothesis of a tail-tail coin, but not by an amount sufficient to give the posterior odds of 10:1 claimed for it in the amicus brief.

### III. From Association to Causation:

Sharpening the Distinction

That reasonable investors might want to know about small numbers of adverse events reports in some contexts does not mean that even large numbers of AERs provide valid statistical proof of causation. Thus, Justice Sotomayor wisely cautioned that “we do not attempt to define here what constitutes reliable evidence of causation.”^{22}*Id*. at 1319.

Case reports are anecdotal evidence—a series of little stories of event X (e.g., use of the remedy) followed by event Y (e.g., anosmia). At best, such anecdotes can establish an association between X and Y. That is where statistical significance comes in. Ideally, it justifies an inference that something other than random chance produced the observed association. This “something” might not be Zicam at all, but some other factor associated with Zicam use. That is why clinical trials (or, at a minimum, further analysis of potentially confounding variables in observational studies) are important. They can help eliminate other factors as explanations for an observed difference. Thus, a statistically significant number of adverse events does not establish causation, but it can trigger further study or regulatory action.^{23}^{24}^{25}*Daubert* of AERs as evidence of causation. Brief for Petitioner, *supra* note 4, at 24.

One might not know this from the dicta in *Matrixx*. Justice Sotomayor wrote that:

A lack of statistically significant data does not mean that medical experts have no reliable basis for inferring a causal link between a drug and adverse events. As Matrixx itself concedes, medical experts rely on other evidence to establish an inference of causation. … [C]ourts frequently permit expert testimony on causation based on evidence other than statistical significance. … It suffices to note that, as these courts have recognized, medical professionals and researchers do not limit the data they consider to the results of randomized clinical trials or to statistically significant evidence.^{26}*Matrixx*, 131 S. Ct. at 1319-20 (citations and internal quotation marks omitted). In this paragraph, the Court tried to soften the impact of its comments, stating that in the lower court cases on causation in toxic tort litigation, “[w]e need not consider whether the expert testimony was properly admitted … .” *Id*. at 1319.

Indeed, the Justice lists “a temporal relationship”^{27}*Id*. at 1322.^{28}*Id*.^{29}*Id*. at 1319.^{30}*Id*.

It would be a mistake to read too much into these remarks about the multiple types of information that underlie judgments of causation in different arenas. After all, they occur in the context of deciding whether the facts known to Matrixx “revealed a plausible causal relationship between Zicam Cold Remedy and anosmia” insofar as “[c]onsumers likely would have viewed the risk associated with Zicam (possible loss of smell) as substantially outweighing the benefit of using the product (alleviating cold symptoms), particularly in light of the existence of many alternative products on the market.”^{31}*Id*. at 1323.^{32}*Id*.

To be sure, other types of studies and knowledge of biological mechanisms are relevant in causal analysis for both regulatory decision and tort verdicts. For example, there are situations when it is reasonable to infer causation from observational data. The relationship between smoking and lung cancer is one. But the fact that “medical professionals and researchers do not limit the data they consider to the results of randomized clinical trials or to statistically significant evidence”^{33}*Id*. at 1320.^{34}*See supra* note 25.