**Jason Felch** and **Maura Dolan** of the *Los Angeles Times* recently wrote a fascinating piece about a controversy that has arisen regarding the use of DNA in identifying criminal suspects. The article starts like this:

State crime lab analyst

Kathryn Troyerwas running tests on Arizona’s DNA database when she stumbled across two felons with remarkably similar genetic profiles.The men matched at 9 of the 13 locations on chromosomes, or loci, commonly used to distinguish people.

The [Federal Bureau of Investigation] estimated the odds of unrelated people sharing those genetic markers to be as remote as 1 in 113 billion. But the mug shots of the two felons suggested that they were not related: One was black, the other white.

In the years after her 2001 discovery, Troyer found dozens of similar matches — each seeming to defy impossible odds.

As word spread, these findings by a little-known lab worker raised questions about the accuracy of the F.B.I.’s DNA statistics and ignited a legal fight over whether the nation’s genetic databases ought to be opened to wider scrutiny.

Later, a systematic search of the 65,000 felons in the Arizona database revealed that there were 122 pairs that matched at 9 of 13 loci. Twenty pairs matched at 10 loci.

When I heard about this, I wondered if the F.B.I. is totally off its rocker when it comes to the probabilities it gives about DNA matches. Is it possible that the F.B.I. is right about the statistics it cites, and that there could be 122 nine-out-of-13 matches in Arizona’s database?

Perhaps surprisingly, the answer turns out to be yes. Let’s say that the chance of any two individuals matching at any one locus is 7.5 percent. In reality, the frequency of a match varies from locus to locus, but I think 7.5 percent is pretty reasonable. For instance, with a 7.5 percent chance of matching at each locus, the chance that any 2 random people would match at all 13 loci is about 1 in 400 trillion. If you choose exactly 9 loci for 2 random people, the chance that they will match all 9 is 1 in 13 billion. Those are the sorts of numbers the F.B.I. tosses around, I think.

So under these same assumptions, how many pairs would we expect to find matching on at least 9 of 13 loci in the Arizona database? Remarkably, about 100. If you start with 65,000 people and do a pairwise match of all of them, you are actually making over 2 billion separate comparisons (65,000 * 64,999/2). And if you aren’t just looking for a match on 9 specific loci, but rather on *any* 9 of 13 loci, then for each of those pairs of people there are over 700 different combinations that are being searched.

So all told, you end up doing about 1.4 trillion searches! If 1 in 13 billion searches yields a positive match as noted above, this leads to roughly 100 expected matches on 9 of 13 loci in a database the size of Arizona’s. (The way I did the calculations, I am allowing for 2 individuals to match on different sets of loci; so to get 100 different pairs of *people* who match, I need a match rate of slightly higher than 7.5 percent per locus.)

What I find interesting about this article and these calculations is that they show how the same sets of basic statistical relationships can appear much more or less convincing depending on how they are portrayed. When we hear that there are 112 matches out of 65,000 people, it seems like DNA fingerprinting is not nearly as good as we think — but that is largely because we aren’t thinking about the fact that 65,000 people imply 2 billion pairs of people.

Note, however, that if we start with DNA from a crime scene and then go search the Arizona database for matches, we aren’t doing 2 billion searches, we are doing “only” 46 million (65,000 people times 715 different combos of 9 loci), so we will have a false positive rate of “only” 1 in 279.

The bottom line is that DNA testing is not perfect, but it is still a million (or maybe a thousand?) times better than anything else we have to catch criminals and (just as importantly, especially in Illinois) exonerate the innocent.

(Thanks to **Dimitris Batzilis** for cranking out these numbers.)

This is the same concept as the same-birthday problem: How many people does it take in a group until the probability that any 2 people have the same birthday exceeds 50%? 99%? (23 + 57)

So that means that 3-4 tenths of a percent of people convicted with ONLY dna evidence that has been searched through the database might be innocent?

Not a one in a million chance as has been portrayed on television?

Why not match all 13 loci? Spare no expense and all that…

“Why not match all 13 loci? Spare no expense and all that…”

The same reasons why fingerprints aren’t matched “exactly” – too many factors work to corrupt the sample, so the evidence you’re working with isn’t likely to match up perfectly – even with the same person.

That said, 1 in 300 is a far far cry from 1 in 13 million. The FBI needs to stop fighting the analysis of DNA databases and open them to research such as this – which will probably push research into ways to reduce the false positive rate.

It leads to the question of how certain we need to be that we are blaming the right person for a crime. We might be happy if there is only a miniscule chance we’re wrong, but an innocent person facing a big punishment might not be!

It’s not only that the results appear more or less convincing depending on how they are presented. More importantly, it depends on how they were obtained, which is always a critical aspect of evaluating police evidence. If the match was found by first identifying the suspect and then checking the DNA, it’s one in millions. If the match was found by scouring the database for a match, it’s one in hundreds. I find the latter odds uncomfortably low.

There is another important issue, not captured in the simple calculations, which is that some ethnic groups may be more likely to share a particular genetic variant with each other than with the population at large.

We’re talking PCR-produced short-tandem-repeat (or, on occasion, variable-number tandem repeat) sequences. There is no reason that we should be using 9 loci, rather than 13 (or more, given the low cost of doing this at this point in time).

Decreasing the # of loci to 9 increases your sensitivity (that is, you will pick up more matches). However, it decreases your specificity (that is, more of these matches will be to the wrong person). In the US court system, we require that 12 jurors be in agreement to convict in a criminal case. Let’s also require that 12 loci agree, too.

Perhaps DNA should only be used to prove innocence. It seems it can’t be used to prove guilt beyond a shadow of a doubt.