Is “Statistically Significant” Really Significant?

A new paper by psychologists E.J. Masicampo and David Lalande finds that an uncanny number of psychology findings just barely qualify as statistically significant.  From the abstract:

We examined a large subset of papers from three highly regarded journals. Distributions of p were found to be similar across the different journals. Moreover, p values were much more common immediately below .05 than would be expected based on the number of p values occurring in other ranges. This prevalence of p values just below the arbitrary criterion for significance was observed in all three journals.

The BPS Research Digest explains the likely causes:

The pattern of results could be indicative of dubious research practices, in which researchers nudge their results towards significance, for example by excluding troublesome outliers or adding new participants. Or it could reflect a selective publication bias in the discipline – an obsession with reporting results that have the magic stamp of statistical significance. Most likely it reflects a combination of both these influences. 

“[T]he field may benefit from practices aimed at counteracting the single-minded drive toward achieving statistical significance,” say Masicampo and Lalande.

Leave A Comment

Comments are moderated and generally will be posted if they are on-topic and not abusive.

 

COMMENTS: 15

View All Comments »
  1. Andy W says:

    I was recently pointed to a similar paper on SSRN, http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2089580, examining p-values reported in economics journals.

    Star Wars: The Empirics Strike Back (Brodeur et al., 2012)

    Thumb up 2 Thumb down 0
  2. Joe says:

    This has been a problem for years. Any decent grad student in the hard sciences will have a stack of papers about Type I and II errors, the file drawer problem, publication bias, and the various other names given to this group of issues.

    In my field there has been an effort to move away from searching for statistical significance and towards statistical estimates of effect size. The effort moves at a glacial pace, since so many scientists don’t want to learn something new to replace what they already don’t understand.

    The core problem is that most scientists don’t understand the statistics they’re using and most statisticians don’t understand the role of statistics in the sciences.

    Submit a paper using an analysis that makes more sense, and at least one reviewer will require you to shoehorn it back into the good ‘ol significance test.

    Well-loved. Like or Dislike: Thumb up 14 Thumb down 0
  3. MeanOnSunday says:

    This is common to all scientific fields. Many of the major medical journals now require the study protocols to be registered publicly before the study starts, and that the authors submit a detailed analysis plan with the paper. Some even require the authors to provide the data to an independent 3rd party to verify the results.

    An excellent reference to understanding research results and putting them in the proper context is John Ioannidis’ “Why Most Research Findings are False”

    http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124

    Thumb up 1 Thumb down 0
  4. GKB says:

    Or it could be a reflection of efficiency. When designing an experiment the second thing you do after defining your null and alternative hypothesis is to calculate the minimum sample size required to detect a significant difference at an alpha of 0.05. Generally you then add a buffer to account for drop outs and missing data etc.

    If your initial assumptions are accurate and you recruit the minimum number required…your test will be significant with a p value just under 0.05. If you have a p value of 0.0001 it means you either underestimated the effect size or variability of the effect and had a lot more subjects than required to dismiss the null.

    Of course, at least in medical research, it is well established that top journals are much more likely to publish a study with significant results as these will lead to a change in practice.

    Thumb up 4 Thumb down 1
  5. Seminymous Coward says:

    The paper doesn’t seem to be public, so I guess I’ll ask here. Is there a nifty chart like the one from http://www.freakonomics.com/2011/07/07/another-case-of-teacher-cheating-or-is-it-just-altruism/ that made the problem so transparent?

    Thumb up 0 Thumb down 0
  6. L Lehmann says:

    Also even if a finding is found (legitimately) to be statistically significant, this should not be confused with its having clinical significance. A finding may be highly statistically significant yet clinically meaningless.

    Well-loved. Like or Dislike: Thumb up 5 Thumb down 0
  7. Paul S says:

    The “field” might benefit even more from practices aimed at developing a reasonable level of humility. The trouble is, saying that “in our ‘study’ we really did not find out much of anything that anyone can rely on” is probably not an effective way to angle for the next grant. So it goes.

    Thumb up 4 Thumb down 0
  8. Anonymous Coward says:

    Obligatory XKCD link:

    http://xkcd.com/882/

    Thumb up 2 Thumb down 0