# Medicine and Statistics Don’t Mix

Some friends of mine recently were trying to get pregnant with the help of a fertility treatment. At great financial expense, not to mention pain and inconvenience, six eggs were removed and fertilized. These six embryos were then subjected to Pre-Implantation Genetic Diagnosis (P.G.D.), a process which cost $5,000 all by itself.

The results that came back from the P.G.D. were disastrous.

Four of the embryos were determined to be completely non-viable. The other two embryos were missing critical genes/D.N.A. sequences which suggested that implantation would lead either to spontaneous abortion or to a baby with terrible birth defects.

The only silver lining on this terrible result is that the latter test had a false positive rate of 10 percent, meaning that there was a one-in-ten chance that one of those two embryos might be viable.

So the lab ran the test again. Once again the results came back that the critical D.N.A. sequences were missing. The lab told my friends that failing the test twice left only a 1 in 100 chance that each of the two embryos were viable.

My friends — either because they are optimists, fools, or perhaps know a lot more about statistics than the people running the tests — decided to go ahead and spend a whole lot more money to have these almost certainly worthless embryos implanted nonetheless.

Nine months later, I am happy to report that they have a beautiful, perfectly healthy set of twins.

The odds against this happening, according to the lab, were 10,000 to 1.

So what happened? Was it a miracle? I suspect not. Without knowing anything about the test, my guess is that the test results are positively correlated, certainly when doing the test twice on the same embryo, but probably across embryos from the same batch as well.

But, the doctors interpreted the test outcomes as if they were uncorrelated, which led them to be far too pessimistic. The right odds might be as high as 1 in 10, or maybe something like 1 in 30. (Or maybe the whole test is just nonsense and the odds were 90 percent!)

Anyway, this is just the latest example of why I never trust statistics I get from people in the field of medicine, ever.

My favorite story concerns my son Nicholas:

Relatively early on in the pregnancy we had an ultrasound. The technician said that although it was very early, he thought he could predict whether it would be a boy or a girl, if we wanted to know. We said, “Yes, absolutely we want to know.” He told us he thought it would be a boy, although he couldn’t be certain.

“How sure are you?” I asked

“I’m about 50-50,” he replied.

## browning

The client asked his lawyer: "What are my odds?"

The lawyer replied: "Sixty-forty."

"OK," said the client. "But which way?"

The lawyer answered: "Either way."

## Willie

I'm obviously in the wrong career. I want the job of ensuring people that they have a 50/50 chance of having a boy (especially if they are Chinese). Who knows I might even get a tip for the good news! It probably would have been cheaper to just implant the first egg they harvested rather than spend $5000 for a test which proved unreliable. That's like paying your mechanic $5000 to run a test to verify that your Michellins will go 50,000 miles. He takes the tires off bounce the once on the pavement and if they don't explode collects his $5000. Oh and if I'm wrong I'll pay you your $5000 back. I'd take that risk everytime on a Lexus, but maybe not so much on a Jeep Wrangler.

## Mark

Thought I would add another dimension to the "statistics" at work in this case. IVF is a very competitive practice. Prospective parents "shop" practices based on success metrics. As a result, practitioners, who are required to track results, have an incentive to dissuade those whose case is not optimal to opt out of implantation. Poor statistics, fewer customers. My wife and I are one of those cases who were encouraged to try again, due to "low quality" embryos. Our perfect, healthy twins are now 10.

## Gu

I am certain that low estimates are partially influenced by a litigious culture. The point being that these estimates are not purely statistical but that external considerations probably play some role.

## David M

Since this hasn't been asked yet - what "critical" sequences were missing the embryo(s)? Have the twins been tested to see whether those "critical" sequences were present or not? This has to be addressed first, in my opinion, before blabbing about the numbers.

## jz-md

I am a clinical physician and I'll summarize what stats concepts I use.

-I focus on NPV (negative predictive value) and PPV of tests. For most tests I don't know these , but for those critical to me, I do.

-From time to time I brush up on Bayes Theorem , but use it so infrequently it atrophies.

-When I read an outcomes study, I first look at the disclosures and extend of pharm involvement. More studies than not are corrupted. If still interested, I look at how the study was done, and if it correlates with my world. Experience trumps meta-analysis.

-I convert relative risk to absolute risk.

-I no longer look at P value. The % differences either look meaningful or not. (OKay, I do peek at it.)

-I understand primary / secondary outcome. Many corrupted studies cherry pick the outcomes needed to corrupt the discussion section.

I agree with shan (#31). My formal training in stats ended with an undergrad intro course, but it serves me well enough.

## Justin

This seems like an example of base rate neglect. If the test for the embryos is 90% accurate, that doesn't mean that if the test comes back with a bad result that there is only a 10% chance of the embryo being viable.

Let's say for instance that 1 out of every 1000 embryos is not viable; that 1/1000 is the base rate. The actual chance that an embryo is not viable after being tested with 90% accuracy is 1/10 multiplied by the base rate of 1/1000.

I'm aware that this result doesn't "feel" right, but I'm sure if you Google base rate neglect you'll find both that it's a common logical fallacy and people who can explain it better than I can.

## Chris Nelson

During childbirth classes for my second child, we learned that US OBs consider full-term to be 40 weeks but European OBs consider it to be 41. So, a lot of US women are induced to labor because they are "late" when in Europe they'd be right on schedule. I became convinced that doctors can't do statistics.

## Jp

That they defied the odds is not evidence that the odds were incorrect. Maybe they really did have a 1 in 10,000 chance of having a healthy baby, and they were that one chance. That something is unlikely is not the same as to say that it is impossible.

This is not to argue the statistics were not wrong. Only that your evidence of it being wrongs is not evidence at all. If you had actual evidence (like, say, information about the specific test or how it was performed, the impact of other factors, etc.), you could argue they did their math wrong. But getting lucky doesn't make statistics useless. It actually helps us better understand the nature of probability, and how it differs from possibility.

## Peter

Moral of the story: statistics don't lie, but sometimes life just gets its way

Mazel to all!

## Jesse

It's like the Dilbert cartoon where the pointy-haired boss gets upset when told (probably by Dogbert) that 40% of all days-off are taken on a Monday or Friday.

Besides, the chances of having a boy are 49-51, right? So he upped the odds by 1%. If you had 100 kids, that might make a difference. If you hurry, you can test this theory.

## EmilyAnabel

A colleague of mine opted for an early screening test during her pregnancy. Part of the test involved a visit with a "genetic counselor" whose job it was was to explain how to interpret what amounted to the Type I and Type II errors from the tests. Unfortunately, the counselor, though well-meaning, had no grasp of elementary probability. After a few moments my colleague (a researcher in biochem) realized this and spent the rest of the appointment trying gently and patiently to explain Bayes' theorem in hopes that this might help other patients in the future. She made no detectable progress.

## Jerry Tsai

While not discounting the possibility they merely had good luck and certainly agreeing with your point the test results are correlated, perhaps the Pre-Implantation Genetic Diagnosis test is flawed.

Such a test would be difficult, and think about how the estimate of its accuracy would have had to been gained. You would have needed to implant each embryo (positive or negative test) and measured whether each embryo came to term. I'd bet that that sort of research was not done, which probably means they used some sort of proxy for viability, which may mean the test results should not be given as much credence as they purport to have.

Since in a very recent post, you touted the Johns Hopkins Department of Biostatistics' videos, you should be aware you seem to be contradicting the Department's very existence with the title of your blog post. Actually, your invocation of correlation to explain why we should not trust the one-in-10,000 probability figure would be textbook statistics. The lesson here, more likely, is not that statistics and medicine do not mix, but that bench scientists and clinicians need to better educated in statistical thinking.

## tim

And yet think of all the children in foster homes and orphanages...

## Chintan Mehta

Mr. Levitt,

The technician adminstering your son's ultrasound is unfairly accused of using poor statistics here: you asked him, "How sure are you that my unborn child is male?" His response of "50/50" implies he's claiming there is a 50% certainty its a boy and 50% uncertainty as to what the sex of the child is.

So really, he is saying there is a 75% chance the child is male and 25% chance the child is female - which is not a trivial claim.

## Josh

Justin's comment is on the right track, but not quite correct. I would explain it slightly differently. If the frequency of disease is 1/1000 and the test has a 10% false positive rate (you can't just say that a test is 90% accurate, as the balance of false positive and false negative rates varies depending on thresholds used), then the chance of having the disease given that you have a positive test is only ~1/100. In a population of 1000 people, about 100 people will test positive, based on the error rate of the test, but only one of them actually would actually be expected to have the disease.

The comments about repeating the test are spot on though. Most medical tests have non-random error causes. Repeating the test will give the same result no matter how many times you do it; if it is wrong once, it will probably be wrong again.

## Cliff

Comment #3 is the winner. 10% false positive rate does NOT mean there's a 10% chance the test results were wrong. Absolutely not. It depends how often a positive will actually occur in that situation (base rate). If only 1/1000 have that problem, assuming no false negatives, out of every 1000 tested 100 will be false positives and 1 will be a true positive. That means if you get a positive result the odds the test was wrong are 100/101, over 99%!

When you retest, the odds (of false results) go down to 10%... unless the results are correlated, as you say, in which case they're higher.

## Craig

The medical profession as a whole is really lousy at statistics. When considering whether to implant more than one embryo as part of an IV process, most docs, and virtually all recipients don't realize that the odds (among the embryos) are highly correlated. So, if one of the multiple embryos "takes", then it is more likely that some of the other embryos will also "take". The same is true in reverse: if one embryo does not "take", then the other embryos are also less likely to "take". Thus, people overestimate the odds of conceiving at least one child, and simultaneously underestimate the odds of getting multiple kids. (Not only is raising multiples harder, but they have a much, much higher chance of being delivered very early and therefore requiring years of significantly more medical care than a singleton.)

## Paul Koenig

The odds of both embryos being viable was, in fact, 1 in 10,000. However, the important issue in deciding to go forward is the odds of at least one being viable. Since the odds of neither being viable is 98.01%, then the odds the couple faced was a 1.99% chance of success in having at least one child. That assumes each embryo had a 1% chance of being viable.

Now, if the cause of a false positive is purely random, then a retest is independent. If the cause of the false positive had some systemic component, then a retest will have a much higher chance of a false positive. Not knowing about the test, that leaves odds of viability of one embryo between 1% and 10%. In turn, the odds of having at least one viable embryo would be between 2% and 19% and the odds of having twins between 1 in 10,000 and 1 in 100.

## Kelly

Please - every statistical figure has an anecdotal story to the contrary. Don't throw the baby out with the bathwater, if you'll pardon the expression.