Jason Felch and Maura Dolan of the Los Angeles Times recently wrote a fascinating piece about a controversy that has arisen regarding the use of DNA in identifying criminal suspects. The article starts like this:
State crime lab analyst Kathryn Troyer was running tests on Arizona’s DNA database when she stumbled across two felons with remarkably similar genetic profiles.
The men matched at 9 of the 13 locations on chromosomes, or loci, commonly used to distinguish people.
The [Federal Bureau of Investigation] estimated the odds of unrelated people sharing those genetic markers to be as remote as 1 in 113 billion. But the mug shots of the two felons suggested that they were not related: One was black, the other white.
In the years after her 2001 discovery, Troyer found dozens of similar matches — each seeming to defy impossible odds.
As word spread, these findings by a little-known lab worker raised questions about the accuracy of the F.B.I.’s DNA statistics and ignited a legal fight over whether the nation’s genetic databases ought to be opened to wider scrutiny.
Later, a systematic search of the 65,000 felons in the Arizona database revealed that there were 122 pairs that matched at 9 of 13 loci. Twenty pairs matched at 10 loci.
When I heard about this, I wondered if the F.B.I. is totally off its rocker when it comes to the probabilities it gives about DNA matches. Is it possible that the F.B.I. is right about the statistics it cites, and that there could be 122 nine-out-of-13 matches in Arizona’s database?
Perhaps surprisingly, the answer turns out to be yes. Let’s say that the chance of any two individuals matching at any one locus is 7.5 percent. In reality, the frequency of a match varies from locus to locus, but I think 7.5 percent is pretty reasonable. For instance, with a 7.5 percent chance of matching at each locus, the chance that any 2 random people would match at all 13 loci is about 1 in 400 trillion. If you choose exactly 9 loci for 2 random people, the chance that they will match all 9 is 1 in 13 billion. Those are the sorts of numbers the F.B.I. tosses around, I think.
So under these same assumptions, how many pairs would we expect to find matching on at least 9 of 13 loci in the Arizona database? Remarkably, about 100. If you start with 65,000 people and do a pairwise match of all of them, you are actually making over 2 billion separate comparisons (65,000 * 64,999/2). And if you aren’t just looking for a match on 9 specific loci, but rather on any 9 of 13 loci, then for each of those pairs of people there are over 700 different combinations that are being searched.
So all told, you end up doing about 1.4 trillion searches! If 1 in 13 billion searches yields a positive match as noted above, this leads to roughly 100 expected matches on 9 of 13 loci in a database the size of Arizona’s. (The way I did the calculations, I am allowing for 2 individuals to match on different sets of loci; so to get 100 different pairs of people who match, I need a match rate of slightly higher than 7.5 percent per locus.)
What I find interesting about this article and these calculations is that they show how the same sets of basic statistical relationships can appear much more or less convincing depending on how they are portrayed. When we hear that there are 112 matches out of 65,000 people, it seems like DNA fingerprinting is not nearly as good as we think — but that is largely because we aren’t thinking about the fact that 65,000 people imply 2 billion pairs of people.
Note, however, that if we start with DNA from a crime scene and then go search the Arizona database for matches, we aren’t doing 2 billion searches, we are doing “only” 46 million (65,000 people times 715 different combos of 9 loci), so we will have a false positive rate of “only” 1 in 279.
The bottom line is that DNA testing is not perfect, but it is still a million (or maybe a thousand?) times better than anything else we have to catch criminals and (just as importantly, especially in Illinois) exonerate the innocent.
(Thanks to Dimitris Batzilis for cranking out these numbers.)