This week, U.S. News and World Report released its list of Best Hospitals for 2022. Every year, the publication attempts to rank and rate health care centers in this country. It’s a monumental task for data journalist Ben Harder and his team.

HARDER: The way we rank hospitals is by studying tens of millions of receipts that the federal government receives from the several thousand U.S. hospitals each year.

Their work draws both praise and criticism, depending on where a hospital might fall on the list. It also raises a lot of questions. Like, how do you compare very different kinds of hospitals? And how do you get people to trust your rankings?

Some people don’t trust them — and some argue that they do more harm than good. Dr. Karen Joynt Maddox of Washington University in St. Louis has “rated the raters.” And, she’s skeptical.

MADDOX: Even with all the data in the world, you couldn’t get it right

From the Freakonomics Radio Network, this is Freakonomics, M.D. I’m Bapu Jena. I’m an economist and I’m also a medical doctor. Each episode, I dissect an interesting question at the sweet spot between health and economics. Today on the show, we’re going to talk about hospital rankings. What do they get right, and what do they get wrong, when it comes to hospital quality? We’ll talk to Ben Harder about how he tries to get an accurate picture of what hospitals are doing, and to Karen Joynt Maddox about why that’s so hard to do.

HARDER: I’m Ben Harder. I’m the Chief of Health Analysis at U.S. News and World Report. I am a data journalist by background. With data journalism we’re focused on understanding how data can shed light on the questions we’re asking. It’s not all that different, actually, than being a healthcare economist, perhaps. I may be a little bit more of a, you know, a one trick, researcher, because we’ve been doing the same study for 33 years.

Every year since 1990, U.S. News and World Report has released its list of Best Hospitals around the country. There’s the Honor Roll, which designates the 20 highest-performing hospitals overall. Then there are rankings for particular specialties, like cancer and orthopedics. Hospitals are also rated on how well they perform certain procedures and treat certain conditions.

Ben Harder has been overseeing the U.S. News Best Hospitals list for more than 12 years. A lot has changed in medicine during that time, and Ben and his team have tried to keep up. This year that includes a new rating that almost every hospital will be paying attention to. But first, I had a really straightforward question: who are hospital rankings for?

HARDER: They’re for consumers. And the reason hospitals pay a lot of attention to them is that consumers pay a lot of attention to them. We’re helping them make data-informed decisions. It’s what I call patient decision support.

JENA: We don’t have decision support for other things we do, right? What is it about hospital care that you think is fundamentally different?

HARDER: Medicine is kind of a black art to the average patient, whereas, you know, most of us would feel comfortable rating the swim goggles we got on Amazon/ When it comes to healthcare, it’s more challenging. it’s hard for an individual patient to tell, “Did I get the right care? Did I get the right care, at the right time, in the right place?” So, that’s why we can bring these other data sets and these other methodologies to provide them with a more comprehensive answer.

JENA: How do you envision people actually using these rankings?

HARDER: Great question. So, most healthcare is delivered locally. And it should be if you need a hip replacement or even heart surgery, there is most likely a hospital in your community or nearby, that will be a good choice for you. We are not looking to, you know, send people traveling across the country for routine care. Certainly, geography limits our choices and our health insurance can also limit our choices in certain ways. We’re using Medicare data and Medicare beneficiaries do actually have quite a bit of hospital choice, even within their region or their community. And so, we want to make sure that when they’re choosing, they understand which hospitals have strong quality in the service they need, and which ones may not have the same strength

JENA: When you do the hospital rankings, how do you make sure that what you’re doing is in line with what the most recent evidence and methodological knowledge is?

HARDER: I read a lot of medical journals, particularly those that focus on hospital quality and understanding variations in care and disparities in care. I also talk to academics, to researchers who have studied the sorts of data that we study. We learn a lot from them. We also talked a lot to hospital leaders, which I think is sort of a peer review on steroids, right? We publish our methods openly, and then we take feedback from anybody who wants to give it.

JENA: Tell me a little bit about the methods that you use.

HARDER: So, the way we rank hospitals is by studying tens of millions of receipts that the federal government receives from the several thousand U.S. hospitals each year. So, every time a person who has Medicare as their insurance gets hospitalized, the hospital charges the U.S. taxpayer for their care. and then we at U.S. News get a copy of the resulting receipt. Those receipts give us important details about each patient. We know how old they are. We know what diseases they had. We know what procedures doctors used to treat them. We know what happened to them after they were discharged from the hospital. So, you might think at first, “Well receipts, that doesn’t sound very helpful in understanding hospital quality.” But it’s actually a very powerful tool. And I think an analogy might help your listeners here.

Imagine for a moment that you had millions of receipts for meals eaten in various restaurants. One family of four goes to the same restaurant every Friday night and they get a couple of cheeseburgers and a couple items off a kid’s menu. But every Friday, like clockwork, they go back to the same restaurant and they order that cheeseburger. You might infer that that family likes that cheeseburger and multiply that across thousands of families. And you’ll pretty quickly get an idea of which restaurants make good cheeseburgers and which ones don’t. Imagine a couple members of the same family go to a different restaurant, once a year on the same date. And they splurge on a multi-course meal and a pricey bottle of wine. You get a picture of that restaurant already from just the behavior of the diners who are using it and what they do there. In both cases you get a sense of what kind of meal the restaurant serves, what kind of clientele it might have, just from studying the receipts. And when you add millions of data points like that together, if you analyze them in a thoughtful and sophisticated way, you can infer a great deal about which hospitals are good and what they’re good at.

JENA: I feel like there is a difference though, between that analogy and how something like that gets operationalized in the real world. You know, we have things like Yelp that provide consumer-based reviews of restaurants. But that’s different than what you do because you’re trying to actually measure the quality of Hospital A and Hospital B.

HARDER: When we are evaluating hospitals, we ask not just which hospitals are good, but what they’re good at, right? So, if a patient needs knee replacement, for example, we want to identify which hospitals are likely to get the best outcomes for them: lowest chance of mortality, lowest chance of ending up in the emergency room the next day, lowest chance of an infection resulting from the surgery. We also look at patient experience. So, there is actually a national survey of patients who’ve been in hospitals that is conducted by the federal government. And so, we take into account whether patients tend to have a good experience or a bad experience in a particular hospital.

JENA: So, there are actually a number of different organizations, including the federal government through Centers for Medicare, Medicaid services or C.M.S., that rate hospitals. I’m curious as to how the US News approach differs from C.M.S. and other ranking systems.

HARDER: We use some of the same data that C.M.S. uses. And yet we do arrive at quite different answers about hospital quality, in some cases. And I think there are a couple of reasons for this. For one thing, U.S. News focuses on specific services that hospitals provide. So, we build several different indicators of quality around a particular service, — say heart attack care or stroke care — and evaluate those hospitals in that service line. C.M.S. takes a broader approach looking at a bunch of different measures and kind of mushing them together into an overall assessment, which I think is less meaningful for patients because it doesn’t help them make the decision that they’re facing, which is: “Where do I go to get care for the thing that’s ailing me?”

The other difference is that We get lots of input both from, researchers and from clinicians and hospital leaders. And as a result, we’ve really been able to refine our methods over many years. Whereas I think C.M.S. has a bit more inertia when it comes to making improvements to its methodology. And I think one of the limitations that C.M.S. has to deal with is that it has a bunch of measures that it uses that are not very strong measures. They actually probably provide misleading information about hospital quality. An example of this would be infection rates. You’d think infection’s something you want to avoid, a hospital that reports a higher infection rate is a worse hospital — makes sense — but in fact, what the evidence shows is that hospitals that report higher infection rates are tracking their infections better and they actually seem to be making more progress at reducing infections, where some other hospitals may think they have low infection rates and may tell the government they have low infection rates, but actually they’ve got raging problems with infections that they’re just unaware of, and they’re not doing anything to remediate.

JENA: When I look at the hospital rankings, I sort of think: “Is there really a meaningful difference to me between a hospital that’s ranked two or three versus six or seven?” And I’m curious, what do you think is the discriminatory ability of the rankings?

HARDER: Yeah, that’s a great question. I would not make much of the difference between the number five and number six hospital, or the number 30 and number 31 hospital. In each specialty we identify 50 best hospitals. and we are quite convinced that those 50 hospitals are significantly better than your average hospital that you might go to for similar care. Now, is the 51st hospital, any different than the top 50? Maybe not. Those thresholds are somewhat arbitrary. but I think the rankings give people sort of continuous information that is useful from a heuristic standpoint. Yeah, the top 10 are really, really good. And the top 50 are exceptionally good, too.

JENA: In this year’s rankings you’re including a new health-equity measurement. How does it change the rankings?

HARDER: Social disparities, racial disparities, economic disparities are the most important issues of the day when it comes to evaluating healthcare. Patients end up being assigned — in a sense — to different hospitals, whether you consider that patient choice or a, you know, result of historical structural-racism and contemporary structural-racism that pushes certain types of patients away from certain hospitals. There is a great deal of segregation by dimensions of race, of socioeconomic status, of language, within our healthcare system today, even though hospitals have been forbidden from being segregated for more than 50 years. As a result, there’s a great deal of opportunity for hospitals to be mismeasured if you’re not taking appropriate accounting of the differences in these disparities in the patient populations that they treat.

JENA: And have you gotten any feedback already from health systems about this measure?

HARDER: Ah, we have gotten feedback from health systems and just to be clear about this it’s not a factor in our rankings in the sense that our honor roll is not yet influenced by these health equity measures, we are naming names here. We’re looking at different dimensions of disparities and over time we will better understand what component of these disparities is attributable to the hospital. So, for example, this year we have identified, by hospital, the racial disparity in outcomes for a number of surgical conditions. So, if a patient has a knee replacement surgery, for example, or a colon cancer surgery, are they more likely to end up needing to come back to the hospital for follow-up care, if they’re Black than if they’re white? That disparity exists across the country. And in some ways, it’s actually worse at the honor roll hospitals. It’s clear that across the nation, these disparities are deeply entrenched, very prevalent, and they certainly need to be addressed.

JENA: And so, why have those health-equity measures that you’re reporting not made it into the honor roll rankings yet?

HARDER: When they’re mature — they will be. We’re still working on them. We’re still taking feedback, from researchers and members of the public and hospital leaders. You know, I think some hospitals have said when we pointed out, “Hey, your patient population is much whiter and much wealthier than the surrounding community.” They said, “Well, we really can’t help that it’s the patient’s choice.” And on some level that may be true, but I think it’s important to understand why are patients choosing perhaps to go to one hospital or another and if they’re Black and they tend to go to this hospital and not that hospital why is that?

JENA: I mean, is it going to be the case that hospitals that perform very well will fall in their ranking somewhat depending on how much you weight that equity measure?

HARDER: Some of them will, some of them will not, but I think the more important impact is that by making it transparent to the public and making it matter to hospitals that we can help drive healthcare as a whole in the right direction. Last year when we first debuted our health-equity measures. I spoke with the C.E.O. of Boston Medical Center, Kate Walsh. you know, Boston Medical Center is not necessarily a hospital that foreign dignitaries fly to, but treats an incredibly diverse population of Boston residents. and its waiting rooms look a lot different than the waiting rooms in neighboring hospitals. And so, I asked her why, Boston Medical Center draws Black, Latino immigrant patients at a higher rate than these other hospitals. And she said, “You know, it’s, it’s not because we’re easier to get to. It’s not because we’re closer to their home. it’s because we have translators for immigrant patients. It’s because we provide a three-day food supply for families who are experiencing food insecurity. It’s because we help find jobs for patients who are unemployed.” And addressing those social determinants of health is something that is really important to many patients. So, I think hospitals make choices, too, and they can make choices to create the right opportunities to draw the patients that they want to serve.

Ben Harder and his team at U.S. News and World Report are trying to help people decide where to get the best care. They’re also trying to accurately assess quality in medicine. It’s not easy work, and Ben knows there will always be things they can do better. Coming up: if hospital rankings and ratings are imperfect, do we need them?

MADDOX: They come to really different decisions about which hospitals are best, which I think points out the trouble with the whole enterprise, really.

I’m Bapu Jena, and this is Freakonomics, M.D.

U.S. News and World Report’s Best Hospitals list for 2022 was published this week and this year, the top five include: Mayo Clinic in Minnesota at No. 1; followed by Cedars-Sinai Medical Center in Los Angeles; N.Y.U. Langone Hospitals in New York; Cleveland Clinic in Ohio; and then Johns Hopkins Hospital in Baltimore and U.C.L.A. Medical Center tied for fifth.

MADDOX: So, you see these rankings out in the world, and you want to use them as a consumer, but at the same time, since quality measurement is what I study, it made me wonder a lot about what are the things that actually underly these rankings? Why are they like they are? And should they be that way?

That’s Dr. Karen Joynt Maddox, a cardiologist and researcher at Washington University in St. Louis. She thinks a lot about hospital quality: how to measure it, how to improve it. And, she thinks a lot about hospital rankings.

As you heard in my conversation with Ben Harder, U.S. News isn’t the only organization measuring and comparing hospitals. The government-run Centers for Medicare and Medicaid Services issues its own ratings. So, do the private company Healthgrades and the nonprofit watchdog Leapfrog.

In 2019 Karen and some colleagues published a review of those four ranking systems called “Rating the Raters.” It was important work that has raised as many questions as it tried to answer. Including this one:

JENA: Do you trust the rankings? On a scale of one to 10, how much would you trust them? Let’s start with the U.S. News rankings.

MADDOX: I’d say a six.

JENA: Six. Okay.

MADDOX: But let me unpack that a little bit. So, I don’t think that looking at the difference between who ranks number two and who ranks number five on those rankings is meaningful. We’re comparing hospitals in different areas with different patients, with different payment systems, with different weather. I mean, who knows? Right?

JNEA: Yeah.

MADDOX: And it’s not meaningful for patients either. And I actually don’t think that the primary purpose of those reports, of U.S. News in particular, is for people to make choices between the entities that are listed. I think its primary benefit is to get people talking about quality. So, it’s popular and it’s financially viable because it’s catchy and cause people want to compare things — you can compare T.V.’s or microwaves. Like of course, you want to compare hospitals, but I’m not sure how practically useful that actually is. But changing the conversation and saying we should measure this and we should compete is actually incredibly valuable.

JENA: How do you think about quality of hospital care?

MADDOX: Hospital care and quality are concepts that seem so simple. And then when you dig into them, it turns out that they’re not nearly as straightforward as you think they ought to be. So, measuring something like does a hospital give everyone having a heart attack, the appropriate care within the appropriate time is reasonably straightforward. But then when you think about how does a hospital provide care for diabetes or what is their care for cancer? It becomes much, much more complicated. I think, probably even with all the data in the world, you couldn’t get it right. And I think it’s because fundamentally people aren’t widgets. And so, when we’re measuring what happens to people with lots of complex diseases who are getting lots of complex care, understanding how much their outcomes reflect what a hospital does versus the community in which someone lives, their ability to access care, all of the other comorbidities they have, all the other complexities in their life, you sort of stop being able to attribute everything to a hospital or a health system. And so, almost no matter how good the data was coming out of a hospital, I’m not sure that you could ever really understand how well a certain healthcare entity delivers care.

JENA: I know you’ve done some work where you’ve actually tried to rate the actual raters.

MADDOX: So, our purpose in undertaking the project to look across the available systems was to sort of get a better handle for who’s doing what well, not to say that anyone was terrific or terrible. And when we ended up digging into the methodology and the way that they’re presented, I think where we landed was actually that all of them left quite a bit to be desired. But they had different strengths and maybe serve different purposes. So, U.S. News and World Report, for example, relies very heavily on reputation. And in doing so essentially ranks the big academic and some non-academic, but big wealthy well-resourced hospital systems around the country. And they do so through a combination of pretty methodologically rigorous work around some outcome, some processes, and then a big black box reputation survey.which is frankly what gets the Mass Generals and the Cedar Sinais and the big names points, right? Because that’s what people mention is reputation. But that’s really what drives a lot of the rankings.

And so, I think that’s of some utility. If you want to know who are the big leaders in medicine. It’s not unreasonable to use that ranking system. On the other hand, you have something like C.M.S. So. Medicare has a series of star ratings, and they’ve undergone a lot of methodologic change over the last five years, but they essentially are trying to rank everybody on one scale. So, they essentially take a whole bunch of different ratings on safety, mortality, readmissions, processes, et cetera, and they roll them up into a star. And the downside of that was that you actually had a bunch of small hospitals that provided very few services getting the highest rankings, because they didn’t have enough patients to contribute to the tough measures. So, if you’re only graded on 10 out of a hundred questions and you’re graded on the 10 easy ones and you do well, you look terrific. You haven’t actually taken the same test. So, C.M.S. has done some work to try to break hospitals into buckets, to make each group sort of competing against each other seem a little more alike, which has helped.

But the biggest limitation for them has really been trying to figure out how to compare hospitals that are so very different from each other on a single scale. Then there’s Leapfrog, which is very focused on patient safety, and so has the pros of having some information about safety that none of the other groups do. And then Healthgrades —the way that their methodology works, because they tend to reward a lot of individual things, is that you see a lot of winners who are very different There’s like 700 “Best 100 Hospitals in the U.S.” And some people might say that’s actually real. That’s how quality works. Places are good at individual things and you should get community hospitals showing up as the safest and the best for some, you know, relatively straightforward procedure. So, those are the four biggies and they really do have very offsetting strengths and weaknesses, but they come to really different decisions about which hospitals are best, which I think points out the trouble with the whole enterprise, really.

JENA: Do you think that the ranking systems, whether U.S. News and World Report or C.M.S., have one of the intended effects which is to get hospitals to improve?

MADDOX: I think there’s good evidence that they don’t drive a dramatic improvement in patient outcomes.

MADDOX: People have been pushing towards improving care for decades. Millennia, probably in some, in some sense, I mean, this is medicine, right? You keep sort of pushing forward to make people’s care better. But I think where we tend to push is in high tech, exciting new areas. And maybe not as much on, are people washing their hands, right?

MADDOX: And so, that I think is the benefit of these programs is moving the conversation forward on measurement and on the sort of system-ness of all this. But I certainly don’t think we’ve got it optimally done.

JENA: Do you have a sense of what the ranking systems do well? And what do they just not get right at all?

MADDOX: So, I have a slightly different answer than I would have a year ago.

JENA: Okay.

MADDOX: But I’ll start with what they do well. So, I do think there are places where there are good reasonably valid process measures, reasonably valid safety measures, reasonably valid outcome measures. I think U.S. News and World Report does the best job with some of those, because they do some accounting for social risk.

MADDOX: Places where they have really fallen down in my opinion, is equity. So, the old measures and many of the current measures are not only inequitable, but potentially equity reducing, like actively equity reducing as opposed to being leveraged to try to improve equity.

JENA: Can you unpack that a little bit? Because that’s an important statement and a strong statement. What do you mean by the systems are equity reducing?

MADDOX: So, if you set up strong incentives to make your care and outcomes look better and you don’t appropriately control for how patients differ between hospitals? You set up incentives for people to avoid sick or otherwise high-risk patients. And so, you can set up systems where you don’t control, for example, for poverty. If you don’t control for poverty and you’re holding hospitals accountable for, say, readmission rates? You are going to see worse performance in hospitals that serve a high proportion of poor or otherwise disadvantaged patients. You’re creating disincentives to go find the sickest, most vulnerable patients who need that care the most. We should be doing the opposite, which is to say how can we incent these big, powerful hospitals and health systems to go find the people who need them and start actually keeping people healthy. Having a set of quality measures that ignores equity and that actually sets up incentives to stay away from high-risk patients, I think is not how we want to be driving our health systems forward.

JENA: U.S. News and World Report is moving towards including various measures of equality, at least reporting it. It might find its way into the actual rankings at some point in time. Do you think that that’s something that should be weighted heavily in the rankings?

MADDOX: Ideally. You have to say that’s what we’re driving towards, and you have to measure it, and you have to report it. You can’t just take readmissions and say, “Here’s your readmission rate for Black patients. Here is your readmission rate for white patients. And if that’s high, you are bad.” Some of the equity measures that have been shared by U.S. News, for example, look at, disparities in preventable hospitalizations in a community. So, you’re looking at the difference for Black versus white patients in St. Louis versus Detroit, versus Denver, versus Miami, versus Portland, Maine. The racial composition of those places — the degree to which residential segregation and other historically racist practices have influenced health outcomes in those places — is very, very different from each other. And so, it creates all sorts of difficult questions about what does equity look like?

JENA: if there’s one thing then that you could do differently with all of the rankings, what would it be?

MADDOX: I’ll take two, if that’s okay.

JENA: Two, you could take 26. Two. Okay. Let’s get two.

MADDOX: So, equity for sure. And I think that includes picking equity sensitive measures. So, for example, we know that Black patients are much more likely to suffer, disproportionate burden of cardiovascular disease and stroke. So, diabetes and hypertension, chronic kidney disease. Find things where you know that, if we improve, you’re going to disproportionately benefit people who have been so disproportionately harmed in the past. So, that would be No. 1.

The second thing is that So, right now you get a readmission rate or score or a safety score based on three years of data with a two-year lag without figuring out what to do about it. That doesn’t incent a hospital to start working on it. There is no reason in this day and age that we can’t be getting hospitals near real-time feedback on the stuff that is important.

JENA: Is there a role that patients play in all of this?

MADDOX: I think people expect that you should be able to get better information than you can. Lots of hospitals have gone to something like MyChart, where patients have access to their own data? Now, when I round in the hospital in the mornings, often patients have seen their labs before I have. So, they’re logged into their MyChart and the results pop up as soon as they’re back from the lab. And so, we walk into a room and I’m like, “Oh, we’re waiting to see what your kidney function is.: And the patient will be like, “Oh yeah, my creatinine’s 1.5 this morning down from 1.7 yesterday.” I’m like, “What?” So, there is actually a degree of engagement and, you know, folks can understand and follow along and sort of be a little bit more part of their care, if we give them the data to do it.

As Ben Harder told us earlier, these rankings are for patients. And according to U.S. News and World Report, nearly 100,000 people visit their Best Hospitals website each day looking for information about health care providers. Karen Joynt Maddox and her colleagues have raised red flags about hospital rankings, but she acknowledges that while we might not learn as much as we’d like to from ranking hospitals, we still need to try to measure how good a job they’re doing.

Figuring out what quality means in medicine seems like it should be easy. There’s a lot of data to work with, and a lot of smart people analyzing it. But hospitals are big busy places, spread out across a large country, and full of all different kinds of patients. As you peruse this year’s annual list of Best Hospitals and wonder, as I did, why Massachusetts General Hospital is not ranked No. 1, you can also maybe appreciate the work that goes into assembling this list—and why the raters need to keep on improving too, just like the hospitals they judge.

That's it for today's show. I want to thank my guests Ben Harder and Dr. Karen Joynt Maddox.

JENA: If you had to go to a hospital and be treated by a physician economist, what hospital would you go to?

HARDER: I don’t know. Do physician economists practice?

JENA: Well, they, they call it practice for a reason.

  • Ben Harder, managing editor and chief of health analysis at U.S. News & World Report.
  • Karen Joynt Maddox, professor of medicine at Washington University in St. Louis.