Are the F.B.I.’s Probabilities About DNA Matches Crazy?

Jason Felch and Maura Dolan of the Los Angeles Times recently wrote a fascinating piece about a controversy that has arisen regarding the use of DNA in identifying criminal suspects. The article starts like this:

State crime lab analyst Kathryn Troyer was running tests on Arizona’s DNA database when she stumbled across two felons with remarkably similar genetic profiles.

The men matched at 9 of the 13 locations on chromosomes, or loci, commonly used to distinguish people.

The [Federal Bureau of Investigation] estimated the odds of unrelated people sharing those genetic markers to be as remote as 1 in 113 billion. But the mug shots of the two felons suggested that they were not related: One was black, the other white.

In the years after her 2001 discovery, Troyer found dozens of similar matches — each seeming to defy impossible odds.

As word spread, these findings by a little-known lab worker raised questions about the accuracy of the F.B.I.’s DNA statistics and ignited a legal fight over whether the nation’s genetic databases ought to be opened to wider scrutiny.

Later, a systematic search of the 65,000 felons in the Arizona database revealed that there were 122 pairs that matched at 9 of 13 loci. Twenty pairs matched at 10 loci.

When I heard about this, I wondered if the F.B.I. is totally off its rocker when it comes to the probabilities it gives about DNA matches. Is it possible that the F.B.I. is right about the statistics it cites, and that there could be 122 nine-out-of-13 matches in Arizona’s database?

Perhaps surprisingly, the answer turns out to be yes. Let’s say that the chance of any two individuals matching at any one locus is 7.5 percent. In reality, the frequency of a match varies from locus to locus, but I think 7.5 percent is pretty reasonable. For instance, with a 7.5 percent chance of matching at each locus, the chance that any 2 random people would match at all 13 loci is about 1 in 400 trillion. If you choose exactly 9 loci for 2 random people, the chance that they will match all 9 is 1 in 13 billion. Those are the sorts of numbers the F.B.I. tosses around, I think.

So under these same assumptions, how many pairs would we expect to find matching on at least 9 of 13 loci in the Arizona database? Remarkably, about 100. If you start with 65,000 people and do a pairwise match of all of them, you are actually making over 2 billion separate comparisons (65,000 * 64,999/2). And if you aren’t just looking for a match on 9 specific loci, but rather on any 9 of 13 loci, then for each of those pairs of people there are over 700 different combinations that are being searched.

So all told, you end up doing about 1.4 trillion searches! If 1 in 13 billion searches yields a positive match as noted above, this leads to roughly 100 expected matches on 9 of 13 loci in a database the size of Arizona’s. (The way I did the calculations, I am allowing for 2 individuals to match on different sets of loci; so to get 100 different pairs of people who match, I need a match rate of slightly higher than 7.5 percent per locus.)

What I find interesting about this article and these calculations is that they show how the same sets of basic statistical relationships can appear much more or less convincing depending on how they are portrayed. When we hear that there are 112 matches out of 65,000 people, it seems like DNA fingerprinting is not nearly as good as we think — but that is largely because we aren’t thinking about the fact that 65,000 people imply 2 billion pairs of people.

Note, however, that if we start with DNA from a crime scene and then go search the Arizona database for matches, we aren’t doing 2 billion searches, we are doing “only” 46 million (65,000 people times 715 different combos of 9 loci), so we will have a false positive rate of “only” 1 in 279.

The bottom line is that DNA testing is not perfect, but it is still a million (or maybe a thousand?) times better than anything else we have to catch criminals and (just as importantly, especially in Illinois) exonerate the innocent.

(Thanks to Dimitris Batzilis for cranking out these numbers.)


Ryan

Does your analysis assume that loci are interchangeable? Is that a reasonable assumption?

Jen

At least I'm not the only one who's first thought was the birday coincidence!

I don't know the first thing about genetics, but matching against more loci would solve this pretty well. If you double the number of locations that can be matched to achieve identification, the chances of a false match would improove.

A lesson here - statistical confidence cannot be gained by calculating the number of unique matches.

Jose

Great piece, I'm using it for my stats class this semester!

Erin

Why would the FBI open themselves up to scrutiny? Our government has made it clear that they do not care if innocent people suffer by getting in the way of
the law". I could cite a million cases in which completely innocent people were run over by the wheels of justice - from police work to the courts. I would not be surprised if a large amount of the DNA evidence used in criminal cases turns out to be bunk. The FBI, I am sure, will never cooperate to find out.

justin

@1. Yes, and the key in the birthday problem is that we're looking for the probability of AT LEAST one match. It seems that Levitt is only talking about the "expected" number of DNA matches. The probability of at least one match is likely very large indeed.

Maria

It leads to the question of how certain we need to be that we are blaming the right person for a crime. We might be happy if there is only a miniscule chance we're wrong, but an innocent person facing a big punishment might not be!

Don Monroe

It's not only that the results appear more or less convincing depending on how they are presented. More importantly, it depends on how they were obtained, which is always a critical aspect of evaluating police evidence. If the match was found by first identifying the suspect and then checking the DNA, it's one in millions. If the match was found by scouring the database for a match, it's one in hundreds. I find the latter odds uncomfortably low.

There is another important issue, not captured in the simple calculations, which is that some ethnic groups may be more likely to share a particular genetic variant with each other than with the population at large.

James

We're talking PCR-produced short-tandem-repeat (or, on occasion, variable-number tandem repeat) sequences. There is no reason that we should be using 9 loci, rather than 13 (or more, given the low cost of doing this at this point in time).

Decreasing the # of loci to 9 increases your sensitivity (that is, you will pick up more matches). However, it decreases your specificity (that is, more of these matches will be to the wrong person). In the US court system, we require that 12 jurors be in agreement to convict in a criminal case. Let's also require that 12 loci agree, too.

David Blackburn

Perhaps DNA should only be used to prove innocence. It seems it can't be used to prove guilt beyond a shadow of a doubt.

Dan

If you compare two random sets of DNA you expect these huge probabilities. But these aren't random sets. They are both from people in Arizona's criminal class. I'm guessing it's not a coincidence; they're probably related. Also the potential number of people they could be is smaller: not the world population, but the population of criminals getting caught in Arizona. So the standard can also be lowered.

Or am I missing something here?

John

"Why not match all 13 loci? Spare no expense and all that..."

A quick explanation for everyone. What Levitt is saying about the 9 of 13 means that 13 loci were "tested" and that many pairs of people matched at 9 of them, while simultaneously not matching at 4.

It just like saying your fingerprint is 69% similar to someone else's. It's still unique to you, just looks a lot like another guy. Thus there is the possibility for mistaken identity, which is what the last 2 paragraphs are about. However, one should note that the 1 in 279 false positive rate above is only for the 9 loci match, and should be corrected to include all 13 to provide a more honest false positive rate.

I don't know anything about the requirements for "identification" or exoneration in a court based on this method, so I can't speak to those things.

Tony

This is the same concept as the same-birthday problem: How many people does it take in a group until the probability that any 2 people have the same birthday exceeds 50%? 99%? (23 + 57)

Carl

So that means that 3-4 tenths of a percent of people convicted with ONLY dna evidence that has been searched through the database might be innocent?

Not a one in a million chance as has been portrayed on television?

doug

Why not match all 13 loci? Spare no expense and all that...

Chris Blalock

"Why not match all 13 loci? Spare no expense and all that..."

The same reasons why fingerprints aren't matched "exactly" - too many factors work to corrupt the sample, so the evidence you're working with isn't likely to match up perfectly - even with the same person.

That said, 1 in 300 is a far far cry from 1 in 13 million. The FBI needs to stop fighting the analysis of DNA databases and open them to research such as this - which will probably push research into ways to reduce the false positive rate.

charles

ahhh Amos Tversky speaks from beyond the grave.

JJ

Thanks to Steven, this is a valuable exercise that nicely illustrates the power as well as some of the limitations of genetics. The article is particularly powerful in pointing out how sweeping claims by officials and some 'experts' are frequently adopted by journalists and by the public without further questioning.
Still, in the case of DNA matching, for the reasons outlined so nicely here, one must remember that DNA testing can never prove with ABSOLUTE certainty that two samples are identical. However, by simply increasing the number of polymorphic loci tested (there is no reason to test only 13 loci if doubts remain, there are thousands of polymorphic microsatellites known that could be used in addition), the probability of non-identity can be reduced to an infinitesimal level. As a defense attorney, I would demand that additional DNA matching is performed if against all odds the standard 13 loci of my client should indeed match perfectly.
On the other hand, one should also remember, that discordance at a SINGLE locus would be enough to establish non-identity. Thus, DNA matching allows EXCLUSION with CERTAINTY, and INCLUSION only with a finite, but astronomically high probability.
Overall, a perfect DNA match (at all currently known polymorphic loci) is far less likely than a fingerprint match.
What I am more concerned about is contamination of the sample during handling and testing through human error, especially if the sample from the crime scene and the test sample from the defendant are handled in the same laboratory. The likelihood of an inadvertent match through cross-contamination (false positive) is likely to be far higher than a perfect match between two unrelated individuals. Here, additional safe-guards and controls are needed.
All said, though, DNA matching, short of catching the perpetrator in flagranti, if applied correctly, is the ultimate forensic tool. The concept and the limitations are not difficult to understand, but few prosecutors, defense attorneys, or judges grasp them fully.

Read more...

Harvey S. Cohen

The most oversold evidence in criminal cases is still the eyewitness account. Witnesses often identify the wrong person, and memory of events can easily be distorted or even created out of whole cloth.

Jose

Is this loci matching technique the same used for paternity tests?

William Spencer

This phenomenon of random match over 18 alleles (each locus has two alleles, since most people have chromosomes in pairs) was discovered fairly early on (late 1990's) by the Forensic Science Services in the UK. This is precisely why the FBI requires that 25 of the 26 maximum possible alleles match a database profile before they will consider calling it "identity".

Also, a conviction on DNA evidence alone with no other circumstantial evidence tying a suspect to the crime scene would be very rare if not unknown.

What is equally important about DNA evidence is its ability to exclude suspects as well as for post-conviction exonerations.