It might seem obvious that we want bold new ideas in science. But in fact, really novel work poses a tradeoff. While novel ideas might sometimes be much better than the status quo, they might usually be much worse. Moreover, it is hard to assess the quality of novel ideas because they’re so, well, novel. Existing knowledge is not as applicable to sizing them up. For those reasons, it might be better to actually discourage novel ideas, and to instead encourage slow and incremental expansion of the knowledge frontier. Or maybe not.
For better or worse, the scientific community has settled on a set of norms that appear to encourage safe and creeping science, rather than risky and leaping science.
Blocking New Approaches
How might you identify whether a scientific community blocks new ideas? Merely observing the absence of new ideas isn’t enough; it could be that there just aren’t any new ideas worth pursuing. But what if you could identify sets of scientific fields that are quite similar to each other? Next, imagine you suddenly and randomly exile the dominant researchers in half the fields in order to stymie their ability to block new research. You could then compare what happens in fields that lost these dominant researchers to ones that didn’t. In particular, do new people enter these fields and pursue new and novel ideas? If so, that suggests there were new ideas worth investigating, but the dominant researchers in the field were blocking investigation of them. Alternatively, if new people enter these fields but basically continue doing the same thing as the previous generation, that suggests the opposite; the dominant researchers were not blocking anything and everyone in the field was just pursuing what they thought were the best ideas available at the time.
You can’t really do this experiment because you’re not going to get IRB approval to exile dominant researchers. It’s a bad thing to do. But Azoulay, Fons-Rosen, and Graff Zivin (2019) try to learn something from the sad fact that we live in a world where bad things do happen all the time. They begin by identifying a large set of superstar researchers in the life sciences. It turns out, 452 of these superstars died in the midst of an active research career. This is going to be analogous to the “exile” I described above in this hypothetical idealized experiment, though applying only to one prominent researcher per field, rather than the full set of dominant ones. (Note – as I’ll discuss later, the findings discussed below should not be interpreted as implying these superstars were behaving unethically or something. It’s more complicated than that.)
Azoulay, Fons-Rosen and Graff Zivin next identify a large set of small microfields in which these deceased researchers were active in the five years before their deaths. Each of these microfields consists of about 75 papers on average. For each of these microfields, they find sets of matching microfields that are similar in various ways. Most importantly, these matched microfields are ones that include active participation by a superstar researcher who did not pass away. The idea is to compare what happens in a microfield where a superstar researcher passed away to what happens in similar microfields where a superstar researcher did not pass away.
The first thing that happens is new people do begin to enter the field. Following the death of a superstar researcher, there is a slight increase in the number of new articles published in that microfield by people other than the superstar, as compared to microfields where the superstar lives. These new papers are disproportionately written by people who were not previously publishing in the microfield, rather than by people who were active publishing more.
Most relevant for our inquiry today, these new researchers appear to bring new ideas and new approaches with them. They’re less likely to cite the existing work in this microfield, including the work of the superstar. And their work is assigned a greater share of novel scientific keywords from the MeSH lexicon (a standardized classification system used in biomedicine). Also important – the new work tends to be well cited. There’s a greater influx of highly cited work than low-citation work.1
Can we say anything more about the specific mechanisms going on here?
Well, we can say it’s not simply a case of these dominant researchers vetoing grant proposals and publication from rival researchers. Only a tiny fraction of them were in positions of formal academic power, such as sitting on NIH grant review committees or serving as journal editors, when they passed away. So what else could it be? I’m going to tentatively suggest it’s about the dominance of the ideas the superstar researcher promoted. We’ll return to this notion later. But first, let’s look at some evidence about how scientists resist the influence of new ideas in a field.
Citing novel work
Let’s start at the end. Suppose we’ve recently published an article on an unusual new idea. How is it received by the scientific community?
Several papers look into this, but let’s focus on two. Wang, Veugelers, and Stephan (2017) and Carayol, Lahatte, and Llopis (2019) each look at academic papers across most scientific fields, with the former looking at papers published in 2001 and the latter at papers published between 1999 and 2013. They each try and devise a way to try and measure how novel these papers are and then look at how novel papers are subsequently cited (or not).
To measure the novelty of a paper, they both rely on the notion that novelty is about combining pre-existing ideas in new and unexpected ways. But they use two different proxies for the ideas that are being combined. Carayol and coauthors use the keywords authors are asked to submit to describe their work when they submit to journals or paper repositories. Since most authors supply more than one keyword, they can look at how frequently a pair of keywords appears (among papers in the same field) as compared to how frequently we would expect them to pair up if keywords were just randomly assigned to papers.
Wang and coauthors instead use the references cited as a proxy for the ideas that a paper grapples with, and look for papers that cite pairs of journals that have not previously been jointly cited. The 11.5% of papers with at least one pair of journals never previously cited together in the same paper are called “moderately” novel in their paper.
But they also go a bit further. Some new combinations are more unexpected than others. For example, it might be that I am the first to cite a paper from a monetary policy journal and an international trade journal. That’s kind of creative, maybe. But it would be really weird if I cited a paper from a monetary policy journal and a cell biology journal. Wang, Veuglers, and Stephan, create a new category for “highly” novel papers, which cite a pair of journals that have never been cited together in the past, and also are not even in the same neighborhood.2
Echoing what I said at the beginning, highly novel work seems desirable; it’s much more likely to become one of the top cited papers in its field by either measure. Both figures below sort papers from least to most novel as we move from left-to-right, and give the proportion of papers in a given category who are among the top cited in their field (top 1% cited left, top 5% cited right).
But despite the fact that published highly novel work tends to result in more citations, papers that are really novel face some challenges in the publication game. Both Wang, Veugelers and Stephan (2017) and Carayol, Lahatte, and Llopis (2019) try to look at the prestige of the journals that ultimately publish highly novel work. They both use the impact factor of journals as a proxy for their prestige: this is meant to capture the average number of citations an article published in the journal receives, on average, over some time span.
At first, all seems well: Carayol, Lahatte, and Llopis find that the most novel work is actually more likely to be published in the highest impact journals, and that the average novelty of papers goes up with the impact factor. High impact journals like novel work! But Carayol, Lahatte, and Llopis find evidence that they don’t really like them enough, at least if their goal is to promote the most highly cited work. Across a large swath of journals, novel papers tend to get about 10% more citations than the average article published in the same journals. To oversimplify, that means that if we think citations are a good proxy for the quality of an article, it’s like a novel paper has to be 10% better than it’s more conventional peers in order to get into the same journal as them.
Wang, Veugelers, and Stephan’s evidence on this point is even stronger. They show moderately and highly novel papers are actually just less likely to be published in the best journals (as measured by the journal impact factor). Between the two, that suggests to me that highly novel work is also less likely to be published at all, and so the results we’re seeing might be over-estimating the citations received, since they rely on novel work that was good enough to clear skeptical peer review.
Wang, Veugelers, and Stephan additionally present evidence one reason for this publication penalty might be because peers in your home field are less likely to recognize the merits of highly novel work (at least measured by unusual combinations of cited journals). In fact, this recognition disproportionately comes from other fields. Restricting attention to citations received from within the same field, novelty isn’t rewarded at all! (Carayol and coauthors do not investigate which field cites papers).
Looking across all fields, Wang, Veugelers, and Stephan additionally find these citations take longer to roll in: restricting attention to citations received in just the first three years, again, novelty isn’t really rewarded at all. That said, this finding appears to depend on how you measure novelty. Carayol, Lahatte, and Llopis do not find novel papers only get their citations with a delay. The papers may be picking up different flavors of novelty, with Wang, Veugelers, and Stephan specifically focusing on the novelty of using ideas that come from other fields (whose journals are rarely cited) while Carayol, Lahatte, and Llopis are not.
From Citations to Grants
All this is important because the reception of your prior work plays a role in your ability to do future work. Li (2017) provides evidence on this by looking at about 100,000 NIH grant applications filed between 1992 and 2005. Funding for research from the NIH is limited - only 20% of applicants succeed - and so the NIH tries to prioritize funding projects that will have the biggest impact. The trouble here is that assessing the quality of frontier research is hard; those with enough knowledge to serve as good evaluators tends to be people who are also using that knowledge to actually do frontier research. Accordingly, grant applications are assessed by a review committee drawn from other active researchers in the area. They’re looking for proposals they think will generate new and significant scientific findings, taking into account the proposed research questions, methodologies, and capability of the researcher to follow through.
But what constitutes the most impactful work? Li begins by showing a proposal is more likely to be funded if the reviewer has previously cited the applicant’s prior work. Across applicants who are judged by the same committee, and have similar numbers of prior citations and grant awards and similar coarse demographic characteristics, the probability of funding increases by 3.3 percentage points for every (permanent) committee member who has cited the applicant’s prior work. That’s not too surprising on it’s own; if reviewers liked the applicant’s prior work enough to cite it, they probably like proposals for more work in the same vein.
But this result does imply something worrying. Proposals are not assigned to reviewers at random. Instead, they are usually given to whoever has the closest and most relevant expertise. That means it’s likely that the reviewers of a grant application will be researchers drawn from the applicant’s own scientific field. And if the applicant has a history of doing highly novel research, Wang, Veugelers, and Stephan’s work suggests these reviewers are less likely to have previously cited it. And Li (2017) suggests that means these applicants will face a tougher time getting funding.
Ayoubi, Pezzoni, and Visentin (2021) provides further evidence those concerns are warranted. Ayoubi, Pezzoni, and Visentin look at 775 scientists who applied for a grant from the Swedish SINERGIA program over 2008-2012. For each scientist, they assess their prior work for novelty of the sort examined by Wang, Veugelers, and Stephan (2017) - did the scientist publish any papers that cited highly unusual combinations of journals in the preceding three years? They then show scientists with a novel publication are less likely to win a research grant than more conventional scientists who have an otherwise similar professional track record.
Evidence of less indirect bias against novel ideas come from Boudreau et al. (2016). They convinced a university to run an experiment in grant-making for them, getting 150 applicants to submit a short proposal for a research project on endocrine-related disease, dangling a $2,500 grant for successful applicants and higher probability of winning a second stage grant with significantly more money available. Boudreau and coauthors also use a similar strategy as Wang, Veugelers, and Stephan to measure the novelty of these proposals, looking for the number of novel combinations of scientific keywords (again, from the MeSH lexicon) attached to the grant applications. While a little novelty is desirable, overall reviewers looked less favorably on more novel proposals.
Where does anti-novelty bias come from?
Scientists are curious folk; why would they be biased against novel ideas?
People have suggested several possibilities:
Scientists themselves might simply be risk averse
The way peer review scores are aggregated could induce biases against polarizing (and novel?) work
When sharing views, it might be more common to revise views down to the lowest assessment, rather than up to the highest one
Li (2017) and Boudreau et al. (2016) both find some evidence that this anti-novelty bias stems less from malice than from a greater ability of reviewers to identify high quality proposals when they are closer to their existing knowledge. In other words, when proposals are less novel and closer to existing work, reviewers have an easier time separating the really good proposals from the mediocre and poor ones. But when proposals are further from existing work, it’s hard for reviewers to separate the good from the bad, and all such proposals are more likely to be judged as just average. But when grant making is really competitive, only applications judged as being really good might be funded, leaving the hard-to-evaluate novel proposals persistently underfunded.
To show reviewers are better able to identify the quality of less novel papers, Li (2017) and Boudreau et al. (2016) need some kind of “objective” measure of the quality of grant proposals. This isn’t possible, but each paper uses an interesting proxy. In Boudreau et al. (2016)’s experiment with grant-making, they created a way to measure the reviewer’s intellectual “distance” from the proposal, based on the MeSH keywords describing the proposal and MeSH keywords attached to the reviewer’s prior published work. Then, rather than assigning each grant application to the reviewers with the most relevant prior expertise, they randomized who the reviewers were. Each proposal was reviewed by 15 reviewers, and Boudreau and coauthors compare the score assigned by the reviewer who is “closest” to the proposal (i.e., has the most relevant domain expertise) with the average of all the other reviewers. In general, they find there was more disagreement between these expert reviewers and everyone else about which proposals should get high scores rather than low ones. Those with close expertise seemed to know “something” that led them to more divergent conclusions about high scoring proposals.
I think what might be going on is even more clearly illustrated in Li (2017). Li exploits an unusual feature of the grant process to try and find an “objective” measure of NIH grant proposal quality. In the process of preparing an NIH grant proposal, researchers do a lot of work that is publishable. Indeed, Li argues for the period she studies, the arms race in NIH application quality had progressed to the point where the majority of the application was actually based on work that was nearly completed and ready for publication. This means a lot of the material that was in the grant application is already barreling towards publication, whether or not the proposal is funded (presumably the proposal builds on and extends this work). Li tries to identify this proposal material that is spun out into academic articles by looking at work from the lab that is published in a short interval after the proposal is submitted but before new experiments could plausibly have been completed, and which is about the same topic as the grant application (as judged by shared keywords). She then looks at how well this “spinoff” work gets cited. This provides a proxy for the quality of the grant applications, both those that are ultimately funded and those that are not.
The takeaway is well summarized in the following figure. On the vertical axis, we have the probability an application gets funded, after taking into account the applicant’s prior record (how much they’ve been cited, how many grants they’ve won, their demographic and educational background, and so on). On the horizontal axis we’ve got an estimate of the number of citations spinoff work from the grant application receives. The scatterplot drifts up and to the right, indicating that grants that are higher quality (i.e., spinoff publications get more citations) are more likely to get funded. But Li divides these into two groups. In red, we have applications that have also been cited by a permanent member of the grant review committee, and in blue we have ones that weren’t. Note for the red dots there is a stronger relationship between quality and funding than for the blue. If the reviewers have previously cited your work, they are better at discerning high and low quality proposals, compared to when they have not cited it.
One way to interpret this is that committee members are better able to judge quality for work that is closely related to their own work. If you work on similar stuff as a committee member, you are more likely to be cited by them. If you go on to propose a bad research project, then it doesn’t really help you that the committee member has previously cited you. You get rejected, and your spinoff publications receive few citations. On the other hand, if you propose a great idea that will garner lots of citations, it’s probability of getting funded isn’t much better than your bad idea if no one on the committee is familiar with your work, but it is much more likely to be funded if they are.
Uncertainty + Competition = Novelty Penalty?
To sum up, while I don’t doubt scientists have biases towards their own chosen fields of study (how could they not? They chose to study it because they liked it!), I suspect at least part of the bias against new ideas comes from a more subtle process. When assessing a new idea, we can make more confident assessments when the idea is closer to our own expertise. All else equal, this gives an edge to less novel ideas when we’re ranking ideas from most to least promising. But in a world with scarce scientific resources (whether those be funds or attention), we can’t give resources to every idea that merits them. Instead, we start at the top and work our way down. But that can mean valuable novel ideas are less likely to get resources than less valuable but less novel ideas.
(As an aside, this bias against novel work might be a reason why, paradoxically, novel work is actually more likely to be highly cited. If you know there’s a bias against novelty, then you also might believe it’s not worth working on novel stuff unless it has a decent shot of ultimately being really important)
A 2023 working paper by Carson, Graff Zivin, and Shrader provides some further support for the notion that, when budget constraints bite, proposals with a greater degree of uncertainty are the first to be dropped. Carson and coauthors conduct a series of experiments on scientists with experience serving as NIH peer reviewers. In one experiment with 250 participants, they showed reviewers a set of ten grant proposals. The title and abstract of these proposals were drawn from real NIH grants, but in the experiment participants were provided with a set of 30 fictional peer review scores, ranging from 1 (best) to 9 (worst). They were then asked to pick four to (hypothetically) fund.
We don’t have a measure of novelty here, but the variance of peer review scores is a potentially informative related measure, as it indicates disagreement among peer reviewers about the merits of a proposal. Carson and coauthors show that, among proposals with the same average score, participants are actually more likely to select proposals with a greater variance in their peer review scores to be funded! But in the next stage of their experiment, they ask participants to imagine their research budget has been cut and now they have to drop one of the four proposals they selected to fund. When asked to tighten their belts, which projects do reviewers in this experiment choose to drop? As we might expect, they cut the ones with the lowest average. But above and beyond that, participants are also more likely to choose to cut the ones with the more variable scores.
(As another aside, the preference to cut more uncertain proposals when budgets are tight is probably an additional reason we should fund more R&D; maybe it’ll increase out willingness to fund novel stuff when competition for resources isn’t so fierce.)
To close, I think we can see something like this dynamic in Azoulay, Fons-Rosen, and Graff-Zivin’s study of the impact of superstar deaths. Recall again, that it’s probably not the case that superstar’s are blocking rival research by personally denying grants or publications, since they weren’t actually sitting in positions of authority when they died in most cases. In the paper, they actually go through the effort of dropping everyone in one of these positions of authority from the analysis, in order to show it doesn’t drive their results. But even if they aren’t personally blocking research, their ideas might be.
For one, the ability of researchers to enter with new ideas seems weaker when the superstar’s former collaborators retain positions of influence, sitting on more NIH grant review committees and editorships (though the latter can only be assessed really imperfectly). Second, Azoulay, Fons-Rosen, and Graff-Zivin provide some suggestive evidence that when the microfield has more firmly consolidated itself around the ideas promoted by the superstar, it remains hard for newcomers to enter even after the superstar has passed away. They attempt to measure this by looking at measures of closely connected coauthorship networks are, how intensely the field cites its own work, and how similar papers in the field are to each other according to an algorithm.
But when a field is not highly consolidated around the superstar’s ideas and there is still some active debate about the way a microfield might go, there appears to be a big effect on having a superstar active in the field prematurely pass away. After the superstar researcher passes away, clearly they aren’t around anymore to publish, thereby expanding and clarifying the idea. And indeed, the entry of newcomers is strongest in fields where the superstar had an outsized role in the microfield, as measured by the share of publications, citations, and research funding they garnered. But the death of a superstar has additional knock-on effects that might further shake the dominance of their ideas. Even the former collaborators of the superstar publish significantly less work after the superstar dies. And this effect is stronger for collaborators who work on more similar topics as the superstar.
Taken together, I think it suggests that when a superstar prematurely dies, their ideas may fade in importance and salience, as compared to fields where the superstar stuck around. That could create space for alternative approaches to get resources.
To close, I want to be sure and caution against a sort of fatalism. This isn’t absolute, it’s probabilistic. Novel ideas do get published, they can attract attention, and old paradigms do fall. It just happens less often than we might like (and perhaps less often over time). Research has an inertia that is real, but not insurmountable.
New articles and updates to existing articles are typically added every two weeks. To learn what’s new on New Things Under the Sun, subscribe to the newsletter.
Azoulay, Pierre, Christian Fons-Rosen, and Joshua S. Graff Zivin. 2019. Does Science Advance One Funeral at a Time? American Economic Review 109(8): 2889-2920. https://doi.org/10.1257/aer.20161574
Wang, Jian, Reinhilde Veugelers, and Paula Stephan. 2017. Bias against novelty in science: A cautionary tale for users of bibliometric indicators. Research Policy 46(8): 1416-1436. https://doi.org/10.1016/j.respol.2017.06.006
Carayol, Nicolas, Agenor Lahatte, and Oscar Llopis. 2019. The Right Job and the Job Right: Novelty, Impact and Journal Stratification in Science. SSRN working paper.http://dx.doi.org/10.2139/ssrn.3347326
Ayoubi, Charles, Michele Pezzoni, and Fabiana Visentin. 2021. Does i pay to do novel science? The selectivity patterns in science funding. Science and Public Policy 48(5): 635-648. https://doi.org/10.1093/scipol/scab031
Boudreau, Kevin J., Eva C. Guinan, Karim R. Lakhani, Christoph Riedl. 2016. Looking across and looking beyond the knowledge frontier: intellectual distance, novelty, and resource allocation in science. Management Science 62(10): 2765-2783. https://doi.org/10.1287/mnsc.2015.2285
Carson, Richard T., Joshua S. Graff Zivin, and Jeffrey G. Shrader. 2023. Choose your moments: Peer review and scientific risk taking. NBER Working Paper 31409. https://doi.org/10.3386/w31409