Just over a year ago, Francesca Gino was — there’s really no other way of putting it — she was a superstar. An academic superstar, at least.
Leif NELSON: She was at the center of everything, being a prestigious faculty member at Harvard and all of her public speaking, and her books.
Joe SIMMONS: Her reputation was perfect.
NELSON: She was synonymous with the highest levels of research in organizational behavior.
SIMMONS: She’s just a giant in the field.
The field in which Gino is a giant, where her reputation was “perfect,” is variously called behavioral science, or decision science, or organizational psychology. According to her website at the Harvard Business School, where she has been a professor of business administration, Gino’s research “focuses on why people make the decisions they do at work, and how leaders and employees have more productive, creative and fulfilling lives.” Who wouldn’t want that? Gino became a superstar by publishing a great number of research papers in academic journals as well as a couple of books; her latest is called Rebel Talent: Why It Pays to Break the Rules at Work and in Life. She produced the kind of camera-ready research that plays perfectly into the virtuous circle of academic superstars: a journal article is amplified by the publisher or university into the mainstream media, which feeds a headline to all the firms and institutions who are eager to exploit the next behavioral-science insight, and this in turn generates an even greater appetite for more useful research. The academic who is capable of steadily producing such work is treated almost like an oracle. There are TED Talks to be given, books to be written, consulting jobs to be had — Francesca Gino, for instance, gave talks or consulted for Google, Disney, Walmart; for the U.S. Air Force, Army, and Navy; and many more.
But that’s all over, for now. In July of 2023, Harvard Business School — responding to an analysis by academic whistleblowers — investigated Gino’s work and found that she had, quote, “intentionally, knowingly, or recklessly committed research misconduct.” Gino was suspended without pay. She then sued Harvard and the whistleblowers. Those same whistleblowers have also produced evidence of what they call data fraud by an even more prominent behavioral scientist, Dan Ariely of Duke. Ariely has enjoyed the spotlight for many years, going back to his 2008 book Predictably Irrational: The Hidden Forces That Shape Our Decisions. Duke is said to be finalizing its investigation into Ariely — although that’s been going on for a while now, and when it comes to academic fraud, universities have a habit of downplaying charges against their superstar professors, for the obvious reason that it reflects poorly on them. Meanwhile, Dan Ariely’s book lives on, as the basis for a new NBC crime drama called The Irrational. It stars Jesse L. Martin as a professor who uses behavioral psychology to help solve crimes.
FBI AGENT: Hey, what was it you did back there?
Alec MERCER: Paradoxical persuasion. I overly embraced his idea to force him to think it through enough to realize it was a terrible idea.
FBI AGENT: And how did you know he wasn’t going to pull the trigger?
MERCER: It works about 95 percent of the time.
FBI AGENT: And the other 5 percent?
MERCER: There’s always outliers, Marissa.
Dan Ariely and Francesca Gino have both maintained that they never fabricated data for their research. Neither of them agreed to record an interview for this episode, but one of their co-authors did:
Max BAZERMAN: Certainly I felt a moral obligation to correct the record.
Today on Freakonomics Radio: we’ll hear from him as well as the three data detectives who blew the whistle on Gino and Ariely:
Uri SIMONSOHN: So I would say I have — on the falsity of the findings, I don’t have reasonable doubt.
But this is a much bigger story than two high-profile cases in behavioral science. We will get into the incentives that produce academic fraud.
Simine VAZIRE: If you were just a rational agent acting in the most self-interested way possible, as a researcher in academia, I think you would cheat.
We will hear what’s being done to change that — and, most important, why this matters. Because the research fraud in academia extends well beyond academia, and it has consequences for all of us. The first of a two-part series begins right now.
* * *
I rarely do this, but today I’m going to start by reading a couple sentences from Freakonomics, which Steve Levitt and I published in 2005: “Cheating,” we wrote, “may or may not be human nature, but it is certainly a prominent feature in just about every human endeavor … Cheating is a primordial economic act: getting more for less.” So, when you think about it, why shouldn’t we expect cheating even among scientific researchers? Consider this: today, it is thought that Ptolemy, the second-century Greek astronomer, faked his observations to fit his theories. And a new study in the journal Nature found that last year, more than 10,000 research articles were retracted, easily breaking the old record. While a lot of the recent headlines are about scholars at big-name American universities, the countries with the most retractions were Saudi Arabia, Pakistan, Russia, and China.
Brian NOSEK: Fraud has existed since science has existed, and that’s primarily because humans are doing the science, and people come with ideas, beliefs, motivations, reasons that they’re doing the research that they do, and in some cases, people are so motivated to advance an idea or themselves that they are willing to change the evidence fraudulently to advance that idea or themselves.
That is Brian Nosek, a psychology professor at the University of Virginia. In 2013, he founded the Center for Open Science, a nonprofit that tries to improve the integrity of scientific research. Just to get it out of the way, I asked Nosek where his funding comes from.
NOSEK: Our funders include N.I.H., N.S.F., NASA, and DARPA as federal sources, and then a variety of private sources, such as the John Templeton Foundation, Arnold Foundation, and many others. And that diverse group of funders — and it’s quite diverse — I think share the recognition that the substantive things that they are trying to solve won’t be solved very effectively if the work itself is not done credibly.
In other words, the stakes here are high— higher than just individual academic researchers trying to advance their careers. If your goal is to improve medicine or transportation or immigration policy — any area where decisions are based on academic research — you don’t want that research to be compromised.
NOSEK: There are specific cases where a finding gets translated into public policy or into some type of activity that then ends up actually damaging people, lives, treatments, solutions. One of the most prominent examples is the Wakefield scandal relating to development of autism, and the notion that vaccines might contribute. And that has had an incredibly corrosive impact on public health, on people’s beliefs about the sources of autism, the impacts of vaccines, etc. And that is very costly for the world. There’s also a local cost in academic research, which is just a ton of waste. So even if it doesn’t have public downstream consequences, if a false idea is in the literature and other people are trying to build on it, it’s just waste, waste, waste, waste.
There’s also the idea that, as much as universities worry about their students cheating — like using ChatGPT to write a paper — what kind of example are their professors setting? And there’s one more big reason this story is so frustrating, and that has to do with the standards of academic research. The general view — at least this is the view that I’ve long held — is that academic research exists in a special category: it is a fact-finding coalition that operates under a set of rules built around the accurate gathering and analysis of data, with the entire process subject to fact-checking and peer review. Good journalism operates under similar rules. The New York Times has an old mission statement I’ve always liked; it goes: “to give the news impartially, without fear or favor, regardless of party, sect, or interests involved.” I’ve always thought this mission applies to academic research as well — that it’s meant to be not only accurate, but free of personal or financial interests. These research papers aren’t being written by some political official or management consultant or equity analyst; they’re being written by someone so devoted to their field of research that they went through the hell of getting a Ph.D. in order to spend their days doing that research. But the fact that Brian Nosek has been kept very busy with his Center for Open Science suggests that my faith in academic research has been misplaced. I asked Nosek to walk me through how he went from being a researcher himself to being a new sort of referee.
NOSEK: I have always had an interest in how to do good science, as a principled matter. And in doing that, we in the lab would work on developing tools and resources to be more transparent with our work, to try to be more rigorous with our work, to try to do higher-powered, more sensitive research designs. And so I wrote grant applications to say, “Can we make a repository where people can share their data?” You know, this is like 2007. And they would get polarized reviews, where some reviewers would say, “This would change everything, it’d be so useful to be more transparent with our work,” and others saying, “But researchers don’t like sharing their data. Why would we do that?”
DUBNER: Why would researchers not want to share their data?
NOSEK: Yeah, it’s based on the academic reward system. Publication is the currency of advancement. I need publications to have a career, to advance my career, to get promoted. And so the work that I do that leads to publication — I have a very strong sense of, oh, my gosh, if others now have control of this — my ideas, data, my designs, my solutions — then I will disadvantage my career.
DUBNER: Gosh, I do feel so naive because I’ve spent time in this ecosystem for many years now, and what you’re saying just sounds wrong and it sounds selfish, and, worst of all, it sounds like it goes against the mission of the scientific goal, which is to discover and distribute knowledge to the world. And that kind of offends me, I have to say.
NOSEK: And here’s the irony, is that almost every academic would say, “Of course, science is supposed to be transparent. Of course we’re doing research for the public good. Of course this is all to be shared. But come on, Pollyanna, we live in a world, right?” The reality here is that there is a reward system, and I have to have a career in order to do that research. And so, yes, we can talk all about those ideals of transparency and sharing and rigor, reproducibility. But if they’re not part of the reward system, you’re asking me to either behave by my ideals and not have a career or have a career and sacrifice some of those ideals.
You can see how these incentives create a problem: if a system has a built-in bias against transparency, not only will there be less transparency, but also more opportunity to cheat. Nosek and his colleagues set out to address this, by trying to replicate the result of papers that had already been published in academic journals. They called their idea The Reproducibility Project.
NOSEK: And in the end of that, 2015, we published the findings, which was a 270-coauthor paper of 100 replications of findings from three different journals in psychology. We got a little less than half of the findings successfully replicated.
You did not mishear Brian Nosek. That’s what he said:
NOSEK: A little less than half of the findings successfully replicated.
So he’s been running large-scale replications ever since, and not just in psychology.
NOSEK: A year and a half ago, we published the results of the Reproducibility Project in Cancer Biology, doing the same kind of process, and found very similar results: less than half of the findings in preclinical cancer research replicated successfully when we tried to do so.
It’s important to point out that what Nosek says here may not be as bad as it sounds. Here’s how he thinks about that high number of studies that don’t replicate:
NOSEK: That doesn’t mean that the original finding is necessarily wrong. We could have screwed something up in the replication. Successfully replicating doesn’t mean the interpretation is right. It could be that both the findings have a confound in them, but we just are able to repeat the confound.
There might be other explanations, too — legitimate explanations. So we shouldn’t equate a failure to replicate with fraud, or even misconduct. Still, Nosek’s numbers do suggest that a lot of research happening today may not be producing science of lasting value. And we’re talking here about the research being done at the most elite universities and academic journals. If you are a fan of science — any kind of science — this should concern you.
NOSEK: Fraud is the ultimate corrosive element of the system of science. Because as much as transparency provides some replacement for trust, you can’t be transparent about everything. So the ideal model in scholarship is that you can see how it is they generated their evidence, how they interpreted their evidence, what the evidence actually is, and then independent people can interrogate that. And so to the extent that fraud intrudes and actually the evidence isn’t there, it isn’t actual evidence, then the whole edifice of that scholarly debate and tangling with ideas falls apart, because you’re actually tangling with ideas that aren’t based on anything.
DUBNER: How familiar are you with the Joachim Boldt situation?
NOSEK: I’m not recalling that name, but I may know the case if you describe it.
DUBNER: This was the German anesthesiologist who had almost 200 papers retracted.
NOSEK: Oh, yeah.
DUBNER: And I gather there were people actually dying as a result of this faulty research. So I’m curious to ask you, in which academic fields or disciplines do you think fraud or sloppiness is most prominent?
NOSEK: We can’t say with any confidence where it’s most prominent. We can only say that the incentives for doing it are everywhere. And some of them gain more attention because, for example, Francesca’s findings are interesting. They’re interesting to everyone. So of course they’re going to get some attention to that. Whereas the anesthesiologists’ findings are not interesting. They put people to sleep.
DUBNER: Until they kill you.
NOSEK: Well, yeah, I guess they put you to sleep and then they kill you.
DUBNER: But it does seem like your field of training — psychology, and especially social psychology — is a hotbed of, I wouldn’t say fraud, but certainly controversy and overturned findings over the years.
NOSEK: Yeah, I would say that there is a lot of attention to social psychology for two reasons. One is that it has public engagement interest value.
DUBNER: That’s a way of saying that people care about your findings.
NOSEK: People care about it. At least in the sense that, “Oh, that’s interesting to learn,” right? But the other reason is that social psychology has bothered to look. And I think social psychology became a hotbed for this because the actual challenges in the social system of science that need to be addressed are social- psychological problems.
DUBNER: What do you mean by that?
NOSEK: I mean, like, the reward system. How it is that people might rationalize and use motivated reasoning to come to findings that are less credible. A lot of these problems are ones that social psychologists spend every day thinking about.
I asked Nosek why he thinks that presumably honest people might, over time, come to behave dishonestly.
NOSEK: The case that people make to say that this is a bigger problem now is that the competition for attention, jobs, advancement is very high, perhaps greater than it’s ever been.
DUBNER: Do you think that’s been driven by the shrinking of tenured positions at universities?
NOSEK: Yeah, so there are many more people and many fewer positions, which is an obvious challenge for a competitive marketplace. And there are now pathways for public attention that have even bigger impact. Academics, by and large, didn’t think about ways to get rich. They looked for ways to have time to think about the problems that they want to think about. But now they have pathways to get rich.
Those pathways have benefited many people — myself included, even though I’m not an academic. Thanks to my partnership with Steve Levitt, who’s an economist at the University of Chicago, I’ve had more opportunities than I ever would have thought possible — including this show! I have wondered why there seems to be so much less shady research in economics than in psychology and other fields. When you talk to economists, they’ll give you several reasons. Economic research is very mathy, and it comes with a lot of what they call robustness checks. There’s also a tradition of, let’s say, aggressive debate within academic economics: long before you publish a paper, you typically present it to your peers and elders at seminars, and they are only too happy to point out any possible flaw, and call you an idiot if you disagree. I’m not saying this is the best way to conduct business, but it certainly makes shaky data more costly. Also, economists tend to work with big data sets, much bigger than in the rest of the social sciences, and it’s often publicly available data — so, no cheating opportunity there.
This doesn’t mean there is no controversy within economic research, and no overturned results — there are a lot! And if you hang around with economists — as I tend to do — you will hear whispers about a few researchers who are suspected of fudging their data. But it does seem that most researchers in econ are mostly honest. And for all the honest researchers out there, in whatever field, there’s one more twist. When you are a playing a game by the rules but you see that the people winning the game are cheating, you feel like a sucker. And no one enjoys feeling like a sucker. But it’s bigger than that. If the cheaters are winning, that means non-cheaters get smaller rewards, and it means all their hard work may also be viewed with suspicion. So what can be done about that problem? This might require something more invasive than a reproducibility study. This might require interrogating the data or research methodology on suspicious research papers, and making public accusations of fraud. This brings us to the whistleblowers we heard about earlier; they are co-directors of the Credibility Lab at the University of Pennsylvania, and they collectively run a blog called Data Colada.
Leif NELSON: My name is Leif Nelson, and I’m a professor of business administration at University of California, Berkeley.
Uri SIMONSOHN: Uri Simonsohn. I’m a professor of behavioral science at the Esade Business School in Barcelona.
Joe SIMMONS: Joe Simmons. I’m a faculty member at the Wharton School at the University of Pennsylvania.
Nelson, Simonsohn, and Simmons are, let’s call it, mid-career academic researchers: they’ve been at it for a while, and they hold high-status positions within the ecosystem they are concerned about. They’ve all published widely in top journals of psychology and decision science. So they’re coming at this not only from the inside, but with inside knowledge of how academic research works, and doesn’t work. So they focus on examining the methodology used in this kind of research.
SIMONSOHN: What motivated our whole journey into methodology was that we would go to conferences and read papers, and not believe them. And we would find that whenever a finding didn’t align with our intuition, we would trust our intuition over the finding, and that was — it sort of defeated the whole purpose. Like, if you’re only believing things you already believe, then why bother?
SIMMONS: This guy Daryl Bem published a nine-study paper with eight studies’ worth of statistically significant evidence that people have E.S.P. And most people were like, “What is going on? This cannot be a true finding.”
SIMONSOHN: So the idea was, like, how do we show people that you can really very easily produce evidence of anything? So we thought, let’s start with something that’s obviously false. We said, okay, something that’s quite hard to do is to make people younger. We have been trying forever, and we never succeeded. So let’s show that we can do that in a silly way. So we decided to show that we can make people younger by listening to a song by the Beatles.
DUBNER: The song was “When I’m Sixty-Four,” correct?
SIMONSOHN: That’s right. And so the idea is, if we can make anything significant, one way to prove it is to say, “I’m going to show with statistically significant evidence that people got younger after listening to ‘When I’m Sixty-Four.’”
They ran real lab experiments with real research subjects, who had real birthdates, and played them real songs — “When I’m Sixty-Four” and two others.
SIMONSOHN: Our control song was, it’s called I believe “Kalimba” by Mr. Scruff. And then we had another song that was meant to go in the other direction, and it didn’t work, so we just didn’t report it, which was “Hot Potato.”
They essentially manipulated and cherry-picked their data to produce the absurd finding they wanted — that listening to “When I’m Sixty-Four” does lower your age, by a full year-and-a-half, it turns out. They published their article in Psychological Science, one of the top journals in the field. Their piece was called: “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant.” They wrote: “These … studies were conducted with real participants, employed legitimate statistical analyses, and are reported truthfully. Nevertheless, they seem to support hypotheses that are unlikely or necessarily false.”
SIMMONS: So, we were surprised when the paper got accepted. And then the immediate aftermath was shocking. There were so many people for whom this resonated. And then there were lots of people who were really not very happy. Like, why are you giving the field a bad name?
But Simmons, Nelson, and Simonsohn felt their field already had a bad name.
SIMONSOHN: Yeah, we thought it was very bad.
NELSON: We started our blog in late 2013. We decided we wanted to have a blog because we thought it would be fun to write things that were shorter than a journal article, and that we did not have to wait two-and-a-half years for the review process to play out. And so with that in mind, we just needed to name it, and we wanted something that would be related to what we do. That’s maybe the “data” part, but would definitely not be sending signals of self-seriousness. So we tried out a few things, and somewhere in there “Data Colada” was the one that we obviously landed on. It had this nice, entertaining feature that — Uri is Chilean, and so when he had suggested the name, he thought it rhymed, which still tickles me and Joe, because for him it’s “Data Colada.”
The Data Colada team, they weren’t going to look just for cases of outright fraud. They were concerned, as Brian Nosek had been, that the pressure to publish interesting results might produce unreliable findings even if the researcher had mostly followed the rules. Consider, for instance, a practice they came to call “p-hacking,” with the “p” standing for “probability.” This is Leif Nelson:
NELSON: The classic forms of what we had characterized as p-hacking — they’re not quite errors. They’re decisions that are accidentally self-serving. It’s if you measure multiple things, but only report the one you like the most. Or you run a study where there’s three treatments — Condition A, Condition B, and Condition C — but in the end, you drop Condition B and you don’t even talk about it. You just compare A to C.
That, remember, is one of the things the Data Colada guys did in their pranky paper about “When I’m Sixty-Four.” They just left out “Hot Potato.”
NELSON: And then there are things that are mildly statistical, but in a very relaxed way. “Well, we collected this data, but it’s kind of skewed, it has some outliers.” And you say, “We should eliminate those outliers or we should winsorize the outliers,” which is basically truncating them down to a lower high number. Or you could run them through an algorithm where you say, “Oh, let’s transform them with a logarithm or with a square root.” And those are all decisions that are justifiable. They’re not crazy. It’s just, if you have a consideration of reporting one variable or the other, and one variable makes your hypothesis look good, and the other variable makes your hypothesis look less good, you end up reporting the one that looks good, either because you’re being self-serving or, honestly, because you’d say, like, “I’m not sure which one is better, but my hypothesis tells me it should be the one that looks good, and that one looks good, it’s probably the better measure.”
And here’s Simonsohn:
SIMONSOHN: There’s a few approaches. Some of them that we’ve done, like just do statistics and say, “This is statistically impossible.” The other is, you see associations in the data or lack of associations in the data, that they’re not mathematical properties, but just anybody looking at data who’s familiar with that would realize this is not right. Imagine you have data on weight and height, and you correlate it, and you find zero correlation. That cannot be right. People who are bigger are heavier. And so if you found zero correlation or a negative correlation, you think maybe these are not real weight measures. Another one is you see rounding or precision that is suspicious. You see rounded values where there shouldn’t be any rounding or absence of rounding where there should be. So, for example, in one case that we worked on, there was data of supposedly when people were asked, “How much would you pay for this T-shirt?” And the very curious thing is, there was no rounding. People were equally likely to say $7, $8, or $10. But if you’ve ever collected data like that, you know that people round. People say 10 or 20, they don’t say 17.
There’s another category that you might call “convenient errors.” Here’s Nelson again:
NELSON: These will be things that can be as simple as a typo, where someone’s writing up their report and the means are actually 5.1 and 5.12, but instead, someone writes it down as 51.2 and you’re like, “Wow, that’s a huge effect.” Right? And no one corrects it because it’s a huge effect in the direction that they were expecting. And so, literally a typo might end up in print. And that’s before we get to anything like fraud, like the active fabrication of data or manipulation of data.
DUBNER: Do you think that the first set of conditions that you described are quite likely to lead to fraud? I mean, how slippery is the slope? Is the kind of person who’s willing to do those things willing to go ahead and commit fraud, especially if they’ve gotten away with it for a while?
NELSON: Uh, well, Stephen, you’re asking a question that is pretty heavy, and one that I’m not particularly well-equipped to answer. If you’d asked me five years ago, I think I would have been more refined in my answer, and I would have said, “No, that’s not a slippery slope problem.” There’s a slippery slope between, I collect five measures and report one versus I collect 10 measures and I report one. That’s a slippery slope. But making up data feels qualitatively different, and I still largely stand by that view. But there have been enough anecdotes that other people — whistleblower types — have presented to us that sound a lot more like someone says, “Yeah, at first, you do the thing where you drop some measures or drop a condition or you remove the outliers, and then also you take participant 35, and you change their answer from a seven to a nine.” You’re like, “Whoa, that last one doesn’t sound the same.” But maybe there’s some psychology for that, that it feels like it’s an extension.
So these three self-appointed data detectives looked into the work done by their peers, and publicly posted their findings.
NELSON: The very first blog post we posted was about identification of fraudulent data in a paper published 10 years ago. And that one was discovered because Uri had made a chart for a totally different paper where he was mining data from multiple published studies to just make a chart. And I looked at his figure of this other research group’s data and said, “That seems unusual, I want to go read that paper.” And so I read the paper and then looked at that dataset. In that one, it had collected data on a nine-point interval scale, so people can answer one, two, three up through nine. And there were numbers in the dataset that were things like -1.7. And so we say, “Oh, okay, we’re done.” Nothing fancy. Once you open the dataset, you can then close it, and say, “It’s broken.”
The paper Nelson was describing had been published by four Taiwanese researchers in 2012 in the journal Judgment and Decision Making. After the Data Colada investigation, the paper was retracted — although, as far as we can tell, the researchers weren’t sanctioned or punished. So how much fraud is there? I asked Uri Simonsohn.
SIMONSOHN: I would estimate the share of fraud in the order of, say, 5 percent of articles.
DUBNER: What’s the difference between, let’s say, high-profile academic journals versus mid-tier or lower? Is fraud more likely to be prominent in the higher or the lower?
SIMONSOHN: I don’t read really low-tier ones, so I don’t know. Sometimes I will. But if I come across fraud there, I will ignore it, because the cost is so high of pursuing a case of fraud that it’s just not worth it. If it’s a paper that has seven citations after three years, and it’s published in a journal that nobody knows, I just let it be. And I’m sure other people do that, too.
Simonsohn, Nelson, and Simmons kept on with their investigative work as a sideline to their regular research and teaching responsibilities. Within psychology and behavioral-science circles, Data Colada became well-known, and a bit feared. But they didn’t have much reach beyond those circles. That changed a couple years ago. They published a post called “Evidence of Fraud in an Influential Field Experiment About Dishonesty.” The fraud they claimed to identify was in a paper published years earlier in a top journal known as P.N.A.S., or the Proceedings of the National Academy of Sciences. The paper was called “Signing at the Beginning Makes Ethics Salient and Decreases Dishonest Self-Reports in Comparison to Signing at the End.” Let’s put that in English: the paper claimed that if you are asked to sign a form at the top of the form, before you fill in the information, you are more likely to be truthful than if you sign at the bottom. You can see how that might work, right?
There are four things you might want to know about this paper. One: the central finding had proved to be extraordinarily popular — a lot of firms and institutions started putting the signature line at the top of tax statements, insurance forms, things like that. Number two: the article had been edited for P.N.A.S. by Danny Kahneman, perhaps the best-known living psychologist, and one of the most highly regarded. Number three: two of the five co-authors on the paper were among the best-known people in this field: Dan Ariely and Francesca Gino. And, four, there was already evidence that something was up with the original paper, because its authors had published a second paper saying that their original findings didn’t replicate. The failure to replicate, as we heard earlier, does not necessarily mean fraud. But now, the Data Colada investigators claimed to have proof that yes, there was fraud in the original paper. That original paper was written by Dan Ariely, Francesca Gino along with Nina Mazar, Lisa Shu, and this man:
Max BAZERMAN: I’m Max Bazerman, and I’m a professor at the Harvard Business School.
Coming up: Max Bazerman takes us through the collaboration that triggered a crisis. Also: Freakonomics Radio had a great 2023; you are one of more than two million people who listen every month. Let’s make it three million for 2024, okay? If you like the work we do, please tell people about it — that is the best way to support the podcasts you love. Thanks in advance.
* * *
Max Bazerman, a professor of business administration at the Harvard Business School, is considered an elder statesman in the field of behavioral science; for decades, he’s been publishing well-regarded research papers and books, and he’s known as a wise and caring mentor to younger scholars. That last bit, it would seem, is the best explanation for how Bazerman wound up being a co-author on what would turn out to be a very, very, very problematic research paper. This is the “signing at the top” paper published in P.N.A.S. in 2012 which claimed that you’re more likely to be honest if you sign a form at the top, before you fill in the information, than if you sign at the end. This paper actually started out as two separate research projects. Here’s Bazerman:
BAZERMAN: So 2011, Lisa Shu and Francesca Gino and I had a working paper that got rejected from a couple of journals, that basically claimed to show if you sign a document before you before you fill it out, you’re more likely to tell the truth. Our studies were done in the laboratory.
Lisa Shu was a doctoral student at the time; Bazerman was her advisor, one of the chairs of her dissertation committee. As for Francesca Gino:
BAZERMAN: Francesca started visiting the Harvard Business School as a doctoral student from Italy. By 2004, she was attending my doctoral seminar, and we started to interact pretty regularly. And eventually I was on her dissertation committee and played a pretty active role in advising her.
Bazerman liked Gino a great deal; in fact, nearly everyone liked Gino, and admired her intellect and work ethic.
BAZERMAN: We once even got to the point of our two families making an offer to a developer on a project to have houses connected to each other.
Their signing-at-the-top paper was based on data from two lab experiments from the University of North Carolina at Chapel Hill, where Gino taught before coming to Harvard. In those experiments, research subjects tried to solve a number of puzzles, with a financial reward for every right answer. And afterward, they filled out a form to tell the experimenter how much money they had earned. Some of the research subjects were asked to sign an honesty pledge at the top of the form; others, at the bottom. In the experiment, the ones who signed at the top were more truthful about their earnings. At least, that’s what these data said.
BAZERMAN: They were brought to our threesome from Francesca. And the exact role of Francesca and the lab manager remains somewhat unclear. But it’s certainly safe to say that Lisa Shu and I had little to do with the collection of the data. So, this goes back to a time when Francesca was developing an excellent career, but she was not the superstar that she came to be. And I was a senior person and probably doing the least amount of work and working more after the document came to fruition.
DUBNER: Do you think that most people, let’s say, the median American who maybe holds a decent opinion of university life and academic research — which, maybe that’s not the median person, maybe the median person doesn’t hold such a decent opinion — but for someone who might read an article that’s based on an academic study and say, “Oh, that’s interesting, I’m going to file that away as a useful, probably true piece of information,” how surprised do you think that person would be to learn that a senior colleague like you, who’s co-author on a lot of papers with junior colleagues, that you personally don’t interact at all with the original data? How surprising do you think most people would find that?
BAZERMAN: So I wouldn’t say I don’t interact with it at all. I certainly would read the results section pretty carefully. But I would read it with the intent of seeing if there was any error along the way.
DUBNER: But how can you tell if there’s error if you’re not in the — you know, it’s like, this whole issue reminds me a little bit of being, let’s say I’m a chef in a restaurant, and I’m given the ingredients to cook, but I’m not allowed to examine them. So I don’t know if they’re rancid or fresh or even fake.
BAZERMAN: So, I like that example. So instead of being a chef over a restaurant, let’s imagine that you’re the owner of 12 different restaurants, and you have a head chef in each. And that head chef is going to be what I think of as the most senior of my more junior colleagues on the project. And over time, I’ve come to trust that they’re going to be doing a really good job of overseeing the ingredients that go into the research process. And by doing so, there’s other things that I can do. I can work on making sure that we have the funds available. I could work on whatever particular problems come up administratively. I can work with more young scholars because my time is more available. So there’s lots of good by this efficiency of trusting the assistant professor on the project, or the head chef at a particular restaurant so that I’m not examining the specific ways in which the sausage is made.
Now, remember, this original paper was rejected by multiple journals. Bazerman says they got feedback suggesting their argument about signing at the top would be more believable if, in addition to the lab results that Francesca Gino provided, they had some results from the real world as well. That’s what researchers would call a field experiment versus a lab experiment. As luck would have it, another researcher — a friend of Gino’s, no less — apparently had some good field results.
BAZERMAN: We collectively heard that Dan Ariely was presenting a very similar result based on a field experiment having to do with an insurance company. And Francesca reached out to Dan and we basically combined efforts to pull the three studies into one paper.
Ariely’s data included the number of miles the customers of this insurance company reported having driven in a year. If you think about how insurance works, a customer might have an incentive to underreport their mileage, in the hopes of lowering their insurance bill. Ariely’s data showed that customers who were asked to sign this mileage statement at the top reported having driven more miles than customers who were asked to sign at the bottom — again, suggesting that signing at the top makes people more honest. Max Bazerman now received the first draft of a new paper that combined the Ariely and Gino studies.
BAZERMAN: I’m reading the insurance field experiment for the first time at this point, and as I’m reading it, I have some questions. I wasn’t remotely thinking about fraud as an issue. I just thought there was something wrong. And what I saw as wrong was that we were reporting that the average driver in the database had driven between 24- and 27,000 miles per year. And I just looked at that and I said, “That seems off.”
It seemed off to Bazerman because the average American only drives around 13,000 miles a year.
BAZERMAN: So I asked some questions about it. And Ariely, who was the point person for that data, the origins of which are not completely clear, sent back a very quick email saying the mileage is correct. And I continued to say, “Well, we need to clarify what’s going on here. It seems off that people have driven so many miles, particularly when you’re talking about tens of thousands of drivers.” And eventually Ariely comes back with, “The drivers are senior citizens in Florida.”
DUBNER: Sounds like they should drive even less than 24,000 miles then.
BAZERMAN: Exactly my thought. So my questions continue, and I don’t get very good answers. And literally, this goes on for months, and I’m seriously considering taking my name off the paper. At the time, Lisa Shu is a doctoral student on the job market, and she’s presenting this work.
DUBNER: And how concerned were you about damaging her prospects?
BAZERMAN: I was very concerned that if I dropped off the paper, there’s something suspicious about Lisa’s presentation. So I keep on asking questions, but I don’t withdraw. And by early 2012, I’m attending a conference, and I arrive at the conference and in the big hallway I run into Lisa, who’s my advisee, my friend, my coauthor, somebody who I like a lot. And she’s with Nina Mazar, who I had never met before. So Lisa introduces us, and I believe I was expressing my unhappiness with the lack of clarity on this mileage issue.
DUBNER: Nina Mazar is Ariely’s collaborator on the insurance study, correct?
BAZERMAN: So that’s a little bit unclear. So, um, when we asked Ariely to join forces, he said fine, but Nina would be part of the project as well. So I always had thought she was part of the insurance study. Later on in life, Nina claimed she had no more connection to the insurance study than I did, that she first connected to it when this five-author paper comes together.
DUBNER: But in any case, at this conference she assuaged you to some degree?
BAZERMAN: Yeah, exactly. So, she basically pleasantly and openly pulled up the database on her computer. And I said, “So what’s going on?” And she said, “I think what’s going on is that we don’t know that the period between time 1 and time 2 for assessing the number of miles driven was one year. We know when time 2 was collected, but it may have been more than one year for when time one was collected.” And in my mind, what becomes clear is that that makes our study noisier, but as long as the real experiment was run, this is actually pretty good news. All we need to do is correct the presentation in the paper, which we did. The paper is submitted, it’s published. I develop a belief that this effect is true, and people love this result. And from a theoretical standpoint, it’s a shockingly simple idea. From a practical standpoint, it’s just perfect. It’s so simple that organizations can easily implement it.
DUBNER: And who did implement it?
BAZERMAN: A lot of people implemented it. You know, I think Lemonade Insurance, under Ariely’s advice, implemented it.
Lemonade Insurance, by the way, didn’t just implement Ariely’s advice; they hired him as their Chief Behavioral Officer.
BAZERMAN: And many government agencies implemented, including the U.S. government.
Indeed, the first sentence of the paper that Bazerman and the others published in 2012 reads like this: “The annual tax gap between actual and claimed taxes due in the United States amounts to roughly $345 billion.” Now, imagine working for the I.R.S. — or any tax agency, anywhere in the world — and learning that these brilliant academic researchers from Harvard and Duke had found that if you simply have people sign their tax forms at the top rather than the bottom, that millions, maybe billions of extra dollars will suddenly flow your way.
BAZERMAN: And by 2016, I get an email that’s actually critical to the whole evolution of what happens later. The email is from Stuart Baserman — only he spelled his name with an S instead of a Z — and he basically is a pretty low-key guy who is working on an insurance startup, to do insurance online. And his wife Sue encouraged him to email me because he was working on the problem of how do we get people to tell the truth online? And he had read the 2012 signing-first paper and he said, “This looks like I might even be related to a guy who knows how to get people to tell the truth.” So Stewie emails me, and we develop a very good relationship. It turns out he’s a fifth cousin, according to 23andme, and I also develop a consulting relationship with Slice insurance, the company that he was developing.
DUBNER: Now, sorry to ask the rude question, but does that seem like a conflict of interest at all, in retrospect?
BAZERMAN: To work for my cousin?
DUBNER: Well, to go be a consultant for an insurance company based on the findings of a paper which turns out to have been fraudulent, that you were co-author on.
BAZERMAN: I didn’t know it was fraudulent in 2016, I even believed it by now.
DUBNER: Yeah. Are you still a consultant for Slice?
DUBNER: Did you leave because the finding was fraudulent?
BAZERMAN: No, no. I have a terrific relationship with Slice.
Okay, so going back to 2016, Max Bazerman wanted to help his newfound cousin learn whether signing at the top would be as effective in an online setting, the way it seemed to work with paper documents. So Bazerman and some junior colleagues set out to test that question. And how was Bazerman feeling at the time about the original sign-at-the-top finding?
BAZERMAN: We know it works. We know the effects are big. We know the world is intrigued by it. Seems perfect.
DUBNER: Were you worried about a placebo effect at all? Which is to say, if enough people have heard through media descriptions of this signing-first phenomenon that if they encounter a form where they’re asked to sign first, they will now know that they are under some kind of scrutiny, perhaps, and therefore they’re more likely to be honest because they know about it?
BAZERMAN: That’s a terrific methodological critique. So what you just said logically makes sense. I think that, as of the time we were doing these online studies, and there are many of them, I don’t think that there was widespread public awareness of the signing-first effect. You know, Ariely, and Gino, and I talked to lots of executive classes, but I wouldn’t say it was a well-known social phenomenon. But I could be wrong. So and so your methodological critique could be valuable. But anyhow, signing first — with or without a placebo — doesn’t work online, we get nothing.
DUBNER: How surprised were you?
BAZERMAN: Very. And I kind of said, “Well, let’s take a look at what we did. Let’s see sort of how we might have messed up the design, let’s try again.” So we make some changes, we do it a second time, we make some changes, we do it a third time. Still no effect. No effect. No effect. And recall, the 2012 paper not only has effects, it has effects across three different studies that are all statistically significant, and the effects are large. So the project clearly transforms somewhere between replication failure three to five — it’s transforming from “how do we get people to tell the truth online,” to a massive replication failure of a pretty visible academic effect. And so after we fail six times, we decide, well, let’s go back and do a large-scale replication of one of the original lab studies.
Bazerman and the collaborators he had brought on to do the online replication work now convened with the authors of the original paper, including Dan Ariely and Francesca Gino, and they set out to replicate one of that paper’s lab studies, but using more than 10 times as many research subjects as in the original. One feature of many academic studies, especially in a field like psychology, is that they often use a small set of research subjects to run this kind of experiment, often just students from their own universities. A small research pool is cheaper, and faster — and if the goal is to produce a lot of publishable research, fast is good. But small sample sets are more likely to return a skewed result. So now, with a bigger sample and much more scrutiny, they get no effect. Signing at the top doesn’t seem to do much of anything.
BAZERMAN: So certainly, I felt a moral obligation to correct the record.
DUBNER: And not just moral obligation, but as you’ve told us, there are institutions — government institutions — and firms that are using this research. Did you feel — I mean, I don’t want to put a word in your mouth — but was it a sense of guilt, or panic, or fear, or anything like that?
BAZERMAN: Certainly I would feel some sense of — maybe, maybe guilt is the right word, that my name’s on a paper that people are using when I no longer think that they should be using it. But, you know, I didn’t think that I was doing anything wrong. And quite honestly, I’m still not thinking fraud at this point. I just don’t know what’s going on. I’m thinking of cleaning up the record.
In 2020, Bazerman, along with all the original authors and his two more recent collaborators, published a follow-up paper in P.N.A.S., the same journal where the original piece was published in 2012. This paper was called “Signing at the Beginning Versus at the End Does Not Decrease Dishonesty.” Now, from the outside, you may think — this is not a very brave position to take. You’re simply putting out a new paper saying that the paper you put out eight years ago — the one that got so much attention and advanced so many careers — that that one didn’t actually work. On the other hand, you could say, this is how science is supposed to work. You have a hypothesis, you run experiments to test your hypothesis, you gather and analyze the data, and you present your findings. If new information comes along that overturns your finding — well, that’s what needs to happen to correct the scientific record. But it’s worth noting what the original authors didn’t do. They didn’t retract the original paper, nor did the journal retract it — at least not at this point. So, from the outside, this looked like a story of science that was perhaps conducted sloppily, but not a story about fraud. And at least some of the original authors hadn’t given up on the original finding.
BAZERMAN: Ariely and Nina Mazar both continued to argue, “Well, it worked some of the time, it doesn’t work other times, and we need to do more studies to find out when it works and when it doesn’t.” And my attitude — so, I won’t speak for other co-authors — was to basically say, “We have plenty of evidence to conclude that we should tell the world that we have no faith that this effect works.” So, we don’t retract. And life moves forward. And in June of 2021, I believe, I get an email from one of the members of the Data Colada team saying, “Max, can the three of us meet with you on Zoom to talk about something important?”
Coming up: what it’s like to be on the other end of a Data Colada Zoom call. And: how much does academic fraud contribute to the public’s ever-sinking opinion of universities in general?
* * *
Okay, so Max Bazerman, a well-regarded senior researcher and professor at the Harvard Business School, gets an email asking him to meet with the Data Colada team, and they say it’s important.
BAZERMAN: So, there’s a Zoom. And the first part of the Zoom is the Data Colada team showing me the evidence for fraud in the insurance paper. And it’s kind of overwhelming. These guys are careful and they’re thorough, and they convinced me that there was fraud in this study.
This insurance study was one of three studies in the sign-at-the-top paper that Bazerman had co-authored some years back. The Data Colada researchers had scrutinized the data that Dan Ariely had used, and found several things suspicious. The most obvious one was a data chart, called a histogram, showing the number of miles driven each year by the people in his study. For data like this, a histogram will typically look like a bell curve, with a lot of people clustered around the center, and then some outliers sloping off toward the upper end and lower end. But this histogram showed a nearly uniform distribution of drivers from 0 miles to 50,000 miles. “This is not what real data look like,” the Data Colada team wrote on their blog post, “and we can’t think of a plausible benign explanation for it.” And what’s Max Bazerman thinking now?
BAZERMAN: I’m just kind of overwhelmed with the fact that I’m the author of a fraudulent paper.
And later, there was new information from the insurance company that had given Dan Ariely the data. They told the Planet Money podcast that the data Ariely published was significantly different from what they’d given him; and in their original data, there was no difference between those who signed the forms at the top and those who signed at the end. Although Ariely declined to do an interview for this episode, he did send a written statement: “As someone who’s spent many years studying dishonesty,” he wrote, “I appreciate the irony of being accused of dishonesty. There’s no question that the data underlying the 2012 study I co-authored with four other researchers about dishonesty was, well, dishonest. I have looked diligently to figure out what went wrong, but given that this took place more than 15 years ago, I can’t tell for sure what happened.” He added, “All five co-authors of the study in question participated in review sessions with people from the insurance company and asked questions about the data. Ultimately, we all felt satisfied with the answers we got and collectively decided to move forward with the paper.” Max Bazerman, when we shared with him Ariely’s statement, replied: “I am extremely confident that I never participated in any such review sessions, and confirmed this with conversations with Lisa [Shu] and a thorough search of my email records.” The final part of Ariely’s statement reads: “The circumstances that led to the data being falsified are being investigated by Duke University. I’m confident that the investigation will find no evidence to suggest I was responsible for any data manipulation. I am sure that this matter will be behind me very soon and I will be resuming my research at Duke at full speed.” That’s the end of Ariely’s statement. As for the P.N.A.S. journal: they did finally retract the original paper. Meanwhile, on that Zoom call, the Data Colada team had some more news for Max Bazerman.
BAZERMAN: So after they presented the insurance evidence to me, they said, “And now for worse news.” That’s when they introduced the allegation of data fabrication in one of the lab studies.
“One of the lab studies” meaning one of the separate studies in the same paper, whose data had come from Francesca Gino. Data Colada said they had found serious problems with her data, too.
BAZERMAN: And to make matters worse, there was evidence of data fabrication in three other projects that Gino was a coauthor of.
DUBNER: And when you say they said, “Now for the worse news,” I’m assuming — but tell me if I’m wrong — they’re saying that because they know that you have had a long and close relationship with Francesca Gino?
BAZERMAN: Yeah, and I was clearly more closely connected to those lab studies.
DUBNER: And you’re coauthor on more than one of the papers?
BAZERMAN: I was only a coauthor of one of the four papers that they were showing me. But I, by then, had been a co-author of eight different empirical papers that have Francesca Gino as a coauthor.
DUBNER: So what happened next on the Zoom call?
BAZERMAN: So they provide the evidence, and so I’m now aware that there is what Data Colada later called a clusterfake — there’s at least two frauds in the same paper, or that that was likely to be the case. And then Data Colada basically said, “So, Max, you’re the Harvard professor. We think that Harvard should have access to this information. Are you the person who will take it to them?” And I said, “No thank you.”
DUBNER: Because why?
BAZERMAN: Because I didn’t — I certainly thought that Harvard should be aware of what I was looking at, but, um, I didn’t want to play a central role in making that happen.
Data Colada had begun investigating Francesca Gino after they received a tip from a graduate student named Zoe Ziani, and another, anonymous researcher. In addition to the sign-at-the-top paper, the Data Colada team wrote, “We believe that many more Gino-authored papers contain fake data. Perhaps dozens.”
NELSON: Professor Gino has indicated that she has done nothing wrong. And we have said that the data in those four papers contain evidence that strongly suggests that there is fraud.
That, again, is Leif Nelson, one of the three Data Colada researchers, along with Uri Simonsohn and Joe Simmons.
NELSON: Bridging the gap between those two positions is this other entity: Harvard University. And they have said, although we — Joe and Uri and I — have not had access to any of their internal investigation documents, we only know what they’ve said outwardly, which is that they’ve put her on administrative leave, and they’ve recommended the retraction of those four of her papers, or the retraction of three plus an amendment to a previously retracted paper.
And how confident are the Data Colada researchers in the accuracy of their analysis? Here’s Simonsohn:
SIMONSOHN: So I would say I have — on the falsity of the findings, I don’t have reasonable doubt.
Not long after Harvard Business School placed Francesca Gino on leave, she filed a lawsuit.
SIMONSOHN: So, we were sued, together with Harvard, for $25 million. We were sued for defamation.
Again, Francesca Gino declined our request for an interview. On a website called Francesca v. Harvard, she wrote “I absolutely did not commit academic fraud. … Harvard has ruined my career, wrongfully. … The only way to right this wrong is for me to sue Harvard.” As for Data Colada, Gino wrote: “The decision to sue Data Colada was more difficult. … I have long admired Data Colada’s work … I have particularly respected its commitment to sharing any negative findings with the author before going public with the charges. Yet in my case, Data Colada … changed its procedure.” Francesca Gino also claimed, in her lawsuit, that “Harvard Business School discriminated against [her] on the basis of sex,” and later her allies wondered why Gino’s punishment had been so prompt and severe when, for instance, Harvard President Claudine Gay hadn’t been disciplined immediately for outright plagiarism in her research — but then, as you likely know, Gay was recently forced to resign from the presidency. Yet more academic fraud in the headlines! I asked Leif Nelson how it felt when he heard that Francesca Gino was suing him and the other members of Data Colada.
NELSON: Certainly scary. Scary because it’s so unfamiliar. I found out from talking — basically I was exchanging emails with a reporter. And so between these emails, she came back to me and was like, “Well, now, given the lawsuit, would you like to add a new comment?” And I was basically like, “What? What lawsuit are you talking about?” And so it’s this devastating thing to be like, “Oh my God, the whole house is collapsing, and no one told me.”
SIMMONS: Look, I definitely had moments after this where I was scared for myself and my family.
And that is Joe Simmons.
SIMMONS: Like, just the amount of money involved, I didn’t quite appreciate that at the very first moment. I mean, this is not the way to adjudicate these things. There’s a million chances to show we’re wrong. A million, like earlier. A lot. There’s a lot of chances. And this is what you’re going to do? You’re going to sue three individuals for $25 million? That seems — that seems not great.
The Data Colada team found out it’s expensive to defend yourself in a lawsuit like this. Some colleagues set up a GoFundMe campaign. Here is Simonsohn:
SIMONSOHN: Within 24 hours, they had $200,000. We found a First Amendment expert lawyer who is representing us. We’ve learned a lot of the boring stuff that happens with lawyers that you don’t get from TV shows, like the timelines and the language and how long the judge — like, things take forever. I mean, makes academia seem expedient in comparison. It’s not just nice to have money, but it’s also nice to know that thousands of people are willing to at least somewhat publicly support what you’re doing. So that was a big boost.
SIMMONS: My very first thoughts were like, “Oh my God. How is anyone going to be able to do this again?” If you can get sued for conducting these kinds of investigations — and we are business school professors at very good institutions that have a lot of resources, we’re definitely in a position to bear the brunt of this more than, say, the average person in the field. And so just the chilling effect on scientific inquiry and criticism. It’s like, “Is everything we’ve been working for for ten years, is that now gone?”
VAZIRE: Our field doesn’t have a culture of open criticism. It’s not considered okay.
And that is Simine Vazire.
VAZIRE: If I name a specific theory or finding, that’s considered a personal attack on the people associated with that theory or finding. Even if I don’t talk about the people behind it, and that’s considered not okay.
Vazire is a psychology professor at the University of Melbourne. She’s also the new editor-in-chief of the top journal Psychological Science, and she has already been a central figure in the push to reform behavioral science. It was Vazire who set up the GoFundMe campaign for the Data Colada team after Francesca Gino sued them.
VAZIRE: I think one thing we can say for sure is that humans are quite good at self-deception. There’s a lot of reasons why researchers want to believe that they have found the answers to these problems, right? One is a pro-social reason. They want to help solve these problems. They want to help people. Another one is more self-interested, that they want a seat at the policy table. They want attention for themselves. They want to promote their theory and their brand. And it’s also a kind of a survival thing. To stay in academia, to be able to continue to do the research, you need to have successes, and those successes often mean selling your work, and sometimes overselling your work. We want to be taken seriously as scientists and be scientific, and that means being calibrated, and careful, and not exaggerating. But at the same time, the people who do exaggerate are probably going to get more of those successes that get them attention, get them a seat at the table, get them the next grant, the next job, and so on.
DUBNER: When you describe the incentives like that, it sounds to me as if those incentives conspire against the scientific method, do they not?
VAZIRE: Yeah, so if you were just a rational agent acting in the most self-interested way possible, as a researcher in academia, I think you would cheat. I think that is absolutely the way the incentives are set up. I don’t think most people do, but not because of the incentives.
DUBNER: So the more I hear you talk about the, let’s call them perverse or at least mixed incentives that conspire against the pure pursuit of knowledge, even within academia, I just think, if people like you, if fellow scientists can’t successfully evaluate these claims, what’s the public supposed to do? What are we supposed to think when we hear a claim that is often amplified through media? Doesn’t it just make everyone skeptical or even cynical about everything?
VAZIRE: I think that maybe that’s appropriate for fields like psychology, where we’re tackling really messy, complicated things that are multiply determined, that have many causes, it’s hard to measure even one of the causes. And so I think if you hear a claim that sounds too good to be true or sounds too simplistic, I think it’s appropriate to use common sense to be skeptical of it. Generally I think, we’re taught that science can overturn common sense. And you shouldn’t just not believe science just because your common sense goes against it. But I think that should be different for different sciences, and psychology — psychology is just a lot harder to do, and it’s a relatively young science. We haven’t perfected even the measurement of many of these concepts, much less all the other steps to studying them.
DUBNER: Do you think of your field as being in crisis?
VAZIRE: I do.
DUBNER: What kind of crisis? I mean, we hear the phrase “replication crisis,” which is a specific response to a fear that research is garbage, but I assume it’s broader than that?
VAZIRE: Yeah, I mean, I think to our credit, it’s broader than that. I think that our field is really in a period of intense self-examination. It’s going to tell the world, I think, how committed we are to scientific values, how we deal with this crisis. There’s so many branches to this crisis, and I think it’s just a crisis of, I don’t know, integrity, or credibility, or whatever the most fundamental word you could use to describe what it means to be scientific. There was a lot of debate in the early days of the crisis of whether we should air our dirty laundry in public. The people who were arguing against it, I lost a lot of respect for them. But the other side won. We did air it in public. And I think that we deserve some credit for that. I don’t think we should rest on our laurels there. Identifying the problem is not the same thing as changing our practices. We should do some soul-searching about how we got there and how we can prevent that from happening again.
Max Bazerman has got a head start on the soul-searching.
BAZERMAN: You know, I’ve been connected to so many research projects where, as the most senior member of the team, I never looked at the database. So, does that make me guilty of something? I think yes. I think that it makes me complicit that I didn’t do more verification, that I thoroughly trusted. And I think that I should be somewhat accountable for the fact that I didn’t do a better job of verifying, not just in the signing-first paper, but in my research more broadly. And when — you know, in the signing-first paper, I was bothered by some aspects of the data, and I asked a lot of questions back in 2011, but I eventually got an answer that I wanted to be true, and I accepted it. And I never looked at that database at that time either. Do I wish I did that? Absolutely. I’m not sure that this story would be unfolding today had I looked at that data in 2011. There’s so many positive influences that social science could have, from getting us to eat healthier foods, to exercising more, to saving for our retirement. And social scientists have been remarkably good at helping figure out how do we get people moving in the right direction. And the vast majority of that work is honest, credible research that we should pay attention to. And if we now end up having sort of the big story being fraud in social science, then all of the credible stuff is going to have less value and less impact than it should.
Coming up next time, we are going to keep this conversation going, but from some different angles. Including: the money! You may have heard of diploma mills, or puppy mills — but how about research-paper mills?
Ivan ORANSKY: It could be anywhere from hundreds of dollars to even thousands of dollars per paper. And they’re publishing, tens of thousands and sometimes even more papers per year. So you can start to do that math.
We will start it, maybe even finish it. That’s next time. Until then, take care of yourself and, if you can, someone else too.
* * *
Freakonomics Radio is produced by Stitcher and Renbud Radio. This episode was produced by Alina Kulman. Our staff also includes Eleanor Osborne, Elsa Hernandez, Gabriel Roth, Greg Rippin, Jasmin Klinger, Jeremy Johnston, Julie Kanfer, Lyric Bowditch, Morgan Levey, Neal Carruth, Rebecca Lee Douglas, Ryan Kelley, Sarah Lilley, and Zack Lapinski. Our theme song is “Mr. Fortune,” by the Hitchhikers; all the other music was composed by Luis Guerra.
- Max Bazerman, professor of business administration at Harvard Business School.
- Leif Nelson, professor of business administration at the University of California, Berkeley Haas School of Business.
- Brian Nosek, professor of psychology at the University of Virginia and executive director at the Center for Open Science.
- Joseph Simmons, professor of applied statistics and operations, information, and decisions at the Wharton School at the University of Pennsylvania.
- Uri Simonsohn, professor of behavioral science at Esade Business School.
- Simine Vazire, professor of psychology at the University of Melbourne and editor-in-chief of Psychological Science.
- “More Than 10,000 Research Papers Were Retracted in 2023 — a New Record,” by Richard Van Noorden (Nature, 2023).
- “Data Falsificada (Part 1): ‘Clusterfake,'” by Joseph Simmons, Leif Nelson, and Uri Simonsohn (Data Colada, 2023).
- “Fabricated Data in Research About Honesty. You Can’t Make This Stuff Up. Or, Can You?” by Nick Fountain, Jeff Guo, Keith Romer, and Emma Peaslee (Planet Money, 2023).
- Complicit: How We Enable the Unethical and How to Stop, by Max Bazerman (2022).
- “Evidence of Fraud in an Influential Field Experiment About Dishonesty,” by Joseph Simmons, Leif Nelson, and Uri Simonsohn (Data Colada, 2021).
- “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant,” by Joseph Simmons, Leif Nelson, and Uri Simonsohn (Psychological Science, 2011).