Diversity in Research

A new NBER paper by Richard B. Freeman and Wei Huang looks at the ethnic diversity of research collaborators. They find that papers with more authors in more locations tend to be cited more:

This study examines the ethnic identify of the authors of over 1.5 million scientific papers written solely in the US from 1985 to 2008. In this period the proportion of US-based authors with English and European names fell while the proportion of US-based authors with names from China and other developing countries increased. The evidence shows that persons of similar ethnicity co- author together more frequently than can be explained by chance given their proportions in the population of authors. This homophily in research collaborations is associated with weaker scientific contributions. Researchers with weaker past publication records are more likely to write with members of ethnicity than other researchers. Papers with greater homophily tend to be published in lower impact journals and to receive fewer citations than others, even holding fixed the previous publishing performance of the authors. Going beyond ethnic homophily, we find that papers with more authors in more locations and with longer lists of references tend to be published in relatively high impact journals and to receive more citations than other papers. These findings and those on homophily suggest that diversity in inputs into papers leads to greater contributions to science, as measured by impact factors and citations.

Enter your name...

Receiving more citations could simply be an effect of being in different social circles.

You cite papers that you know about. You know about most of the papers that your friends and local colleagues publish. (You also know about a smaller proportion of the papers published outside your circles, which you learned about through other people's citations, journals you follow, and searching).

Consequently, if you have three people from three institution co-author a paper, then you have three institution's worth of people who might hear about your paper through the local grapevine (plus all the potential citers who learned about it in other ways). If you have three people from one institution co-author a paper, then you have only one institutions' worth of people who happen to chat with an author in the hallway.

M.Y. Name

Does # of reference in the paper directly correlate with impact factor of the publishing journel regardless of diversity of authorship?


For that matter, are impact factor and quantity of citations a valid measure of greater contribution to science?

Byung Kyu Park

Also, the abstract (not paying $5 for the full text) doesn't say how "weaker" is being determined. If that's just another way of saying lower impact number (i.e. fewer citations, among others), then it's a *poor* word choice making it sound like there is some sort of an independent control/variable where there isn't.


Hmm... their methodology is interesting (harvesting author ethnicities from Web of Science by comparing names to addresses), I'll give them that, but based on a brief skim of the paper, I'm not sure I buy most of their assumptions or their conclusions.

Their "homophily index" calculation seems appropriate (basically, the score is higher for ethnic combinations less likely to arise than a random selection of authors.) Of course, we have to assume the accuracy of their ethnicity determinations. Even if we do, I am troubled by their assumption that an "English" first name with a "Chinese" last name indicates US-born people of Chinese decent, although I have no data with which to counter that assumption (just my own experience that plenty, though of course not all, of Chinese-born, US-based researchers publish under an adopted "English" first name; I have known some to publish under both). I also think they put too much stock in the ability of addresses published on a paper (which would indicate the authors' institutions, not homes or places of birth) to predict ethnicity.

My major problem, though, is that they are judging the relative "strength" of these papers based on impact factor and number of citations, and doing so for the whole of Web of Science. The glaring flaw in that approach is that impact factor comparisons across different fields are absolutely meaningless (page through the Journal Citation Reports by field, if you don't believe me), which means that their results are all confounded by any correlation there may be with between field of research and ethnicity. Essentially, it's like comparing a chemistry paper with a mathematics paper and saying that because the impact factor of the journal the chemistry paper is published in is higher than the journal the math paper was published in, the chemistry paper is "stronger". That's just not a valid comparison, and counting the number of citations of the paper itself doesn't help either, for much the same reasons.

That's even ignoring the bias that results from counting review and methods papers (which tend to have higher citations than research papers, without that being an indicator of their strength). I also didn't see (although I may have missed it) whether they used the impact factor of the journal at the time the papers were published, as opposed to the current number (which could make a big difference, and again, would bias by field), and whether they adjust the citation numbers by the age of the paper (if not, new papers would generally appear "weaker").

The other problem is that even if their impact factor and citation count data were legitimate measures of "strength", they still don't really address the quality of the science. Not every useful paper is a blockbuster, and small papers or short communications that may be very useful within their limited scope tend to have shorter author lists and those authors are more likely to be at the same institution, but this doesn't really speak at all to the overall research collaborations that those researchers may be involved in. In other words, smaller papers with less measurable impact may very well have, on average, closer ethnicities than big, heavy-hitting, multi-institutional papers, but that doesn't necessarily mean what this paper says it means. It may simply mean that "weaker" papers generally have shorter author lists, and those shorter lists may be less ethnically diverse for mostly geographic reasons.


Rolling Eyes

"...we find that papers with more authors in more locations and with longer lists of references tend to be published in relatively high impact journals and to receive more citations than other papers."

Haven't these guys heard of the network effect? Being published in "high impact journals" and receiving more citations is a measure of popularity, not necessarily quality.

Rolling Eyes

Let me clarify: their methodologies notwithstanding, their conclusion is that authors with wider social circles are more popular. Duh!


I would imagine the causality runs in a different direction. Perhaps high inducements are required for authors to work with others farther outside their comfort zone--and one of those inducements might be a highly promising collaboration likely to lead to much-cited research.