Statistical Forensics Launches a Polling Donnybrook

The digital revolution with its increasing storage of terabytes of data often leaves behind electronic traces of malfeasance.? A cell phone thief left behind call records that I used to?get my phone back.? Justin Wolfers analyzed data sets on?college and professional basketball games to uncover residue of?point-shaving and?racial bias.? I once called on the NBA to release more data to “give Freakonomics a chance.”? Steve Levitt led the way in this new field of forensic Freakonomics with his famous discoveries of?cheating by school teachers and?sumo wrestlers.

Now statistical forensics is playing a central role in claims of fraud leveled at the polling firm Research 2000 (“R2K”).

A political consultant, a retired physicist, and a wildlife researcher walked into a bar…no, wait – it only sounds like the beginning of a joke.? Actually, in June, they checked the internal consistency of Research 2000’s polling data because “on June 6, 2010,?FiveThirtyEight.com rated R2K as among the least accurate pollsters in predicting election results”:

For the past year and a half, Daily Kos has been featuring?weekly poll results from the Research 2000 (R2K) organization. These polls were often praised for their “transparency”, since they included detailed cross-tabs on sub-populations and a clear description of the random dialing technique. . . .? One of us (MG) wondered if odd patterns he had noticed in R2K’s reports might be connected with R2K’s mediocre track record, prompting our investigation of whether the reports could represent proper random polling. . . .

The three features we will look at are:

  1. A large set of number pairs which should be independent of each other in detail, yet almost always are either both even or both odd.
  2. A set of polls on separate groups which track each other far too closely, given the statistical uncertainties.
  3. The collection of week-to-week changes, in which one particular small change (zero) occurs far too rarely. This test is particularly valuable because the reports exhibit a property known to show up when people try to make up random sequences.

The full post with details of their analysis can be found?here.

It appears that Daily Kos has filed a?lawsuit against R2K.?? The president of the polling firm, Del Ali, has?responded:

On the data is too clean crap, let me say this and I challenge anyone to then look at comparable data from other firms, not one or two but many others. As I stated, using Gallup one could question the frequency of 46% on Obama’s approval. Regardless though. to you so-called polling experts, each sub grouping, gender, race, party ID, etc must equal the top line number or come pretty darn close.?Yes we weight heavily and I will, using the margin of error adjust the top line and when adjusted under my discretion as both a pollster and social scientist, therefore all sub groups must be adjusted as well [sic]. I would have gladly gone over with Kos before his accusation in a vile email on June 9. However, it is clear that no matter what, Kos was going to go the route they have not just to get out of paying their bill but as stated for several other sinister reasons that have come to light.? (emphasis added)

I like the fact that Ali is calling for comparable analysis of other polls.? But I’m a bit baffled by the bolded section of his response.? Other commentators are even less charitable.? For example,?Mark Blumenthal wrote:

[P]ollsters and social scientists never have the “discretion” to simply “adjust” the substantive results of their surveys, within the margin of error or otherwise. As a pollster friend put it in an email he sent me a few minutes after reading Ali’s statement: “That’s not polling. It’s Jeanne Dixon polling.”

I can imagine some discretion in how one chooses an algorithm to weight top line results.? But I don’t understand the need for keeping the algorithm and the data secret.

The blood is in the water and non-statistical analysts at the?Baltimore Daily Record are now digging up details on the sparse number of employees and financial difficulties at the firm.

If I were representing the Daily Kos, I would consider adding a promissory fraud count to the complaint.? As Greg Klass and I explored in “Insincere Promises,” showing that another person repeatedly promised to do something and then failed to do it is one of the easiest ways to prove promissory fraud.? (My favorite real-world example of repetition as proof of no intent to perform is the?Tri-State Crematorium in north Georgia, which promised to cremate more than 300 bodies but instead left the bodies to rot in various storage areas on the property; for fans of musical theater, there is the more whimsical example of?Professor Harold Hill, who repeatedly promised to create a boy’s band without delivering the goods.)

[hat tip: Jack Hitt]


Kaydiv

I'm not a statistican or economist, but isn't fudging the results kind of... missing the point?

TS Galpin

So cognitive bias influences weighting of results, and we are back to editorializing based on non scientific numbers.

If only the media was savvy enough to explain than when the report bogus numbers.

Thanks for the great info!

-Ted
http://strategicscience.org/

Boogie Knight

I dunno, considering that A-List pollsters such as Zogby are willing to go on venues such as the HuffPo to call into question said 'forensics experts' - I'm thinking that the jury is still definitely out in the case of Koz v. R2K.

http://www.huffingtonpost.com/john-zogby/a-note-to-nate_b_636626.html

Owinok

This reminds me of the statement towards the end of Supercrunchers about the dire need for data audit firms. Polling firms sometimes know too well that people do not question results of scientifically-designed polls and that's the problem.

CalD

@ Kaydiv,

If I understand it correctly (and someone please correct me if I'm wrong), it's considered an industry best practice for pollsters to weight subsamples in their polls according to census data for such things as age, gender, ethnicity, income, education, etc. The idea is to try and decrease the impact of under-sampling or over-sampling various groups in cases where there's a marked tendency for people to differ in opinion on some issue according to group identity.

For example, if you took a poll in Arizona right now about their new immigration law, I'll go out on a limb and guess that you might find lower levels of support for the law in the hispanic community than among non-hispanic white folks. But let's say that when randomly dial the 5000 or 10,000 people it likely takes to to find 500 people willing to take your survey, maybe you get 350 non-Hispanic white people, 180 hispanic and 70 are people of other ethnicity. Well according to the census bureau only 290 out of 500 Arizonans should be non-Hispanic white, 150 should be hispanic/latino, 25 native american, 20 black, 8 asian and 7 of other ethnicies if your sample were truly representative in that respect.

So you say no problem, we'll just adjust the weight the responses that we got from each group a little to compensate before calculating the final results. Now you're all good, right? Well, maybe.

One problem is that as sample size decreases, so do the odds of a random sample being representative of a population. And it's a non-linear relationship. You don't expect 180 people to be anything like as representative of a population of 2 million as 500 should be of a 6.6 million -- in fact, once you're talking more than 10,000 people or so you can pretty much ignore the population size in calculating probable margins of error. So the odds of getting an unrepresentative sample are much higher in the sub-samples and if you happen to get a blooper you're going to be amplifying the error by increasing its weight.

Another problem is that nobody belongs to just one social group and ethnicity isn't the only thing that pollsters try to compensate for in post processing. By the time they finish trying to adjust their results into line with the correct proportions for ethnicity, gender, education, income etc., it may take some judgment calls to get the totals to come out right. At that point it's unclear to me to what extent you still expect things like error functions to strictly adhere to the laws of probability.

Read more...

CalD

Oops. I meant 80, not 180 hispanic (in my hypothetical poll results above).