They are usually more than half the point of R&D
Knowledge spillovers, in the economics of innovation literature, are when an inventor or scientist makes use of knowledge discovered by others. The existence of knowledge spillovers is a classic reason why there may be underinvestment in R&D. When a firm decides how much R&D to do, it weighs the costs it bears against the benefits it expects to receive - not the benefits all firms expect to receive.
Of course, just because something could happen in theory doesn’t mean it happens in practice. So how big a deal are spillovers anyway? A couple studies using data on patents suggests spillovers are really important. More knowledge spills over than stays put.
Clancy (me), Heisey, Moschini, and Ji (2021) looks at this question for the specific case of agriculture. We wanted to see how much agricultural innovation draws on ideas developed outside of agriculture. So we identified all US patents for a variety of agricultural subsectors over 1976-2018 and tried to measure the share of “knowledge flows” from outside agriculture to inside it.
First, we looked at the citations patents make to other patents. In most cases, more than half of the citations go to patents we don’t classify as belonging to agriculture, or to patents belonging to firms whose patent portfolio is mostly non-agricultural. But patent citations are a highly imperfect measure of knowledge flows. So we didn’t stop there.
Second, we looked at the citations patents make to academic journal articles. As with the citations to patents, most of the citations to academic articles don’t go to journals we classify as agricultural science journals.
Third, we look for “novel concepts” in the text of the abstracts and titles of agricultural patents. For our purposes, a “novel concept” is a string of one-to-three word text that is common after 1996, but absent from agricultural patents before then. (We also manually go through all these concepts to verify they correspond to technological ideas). We then look to see how many of these novel concepts are already out there, in the text of non-agricultural patents. It turns out, most of them are.
The figure below summarizes all these different approaches. Each bar represents a different way of measuring the sources of ideas, in different agricultural subsections. The higher the bar, the greater the share that come from within agriculture (and not from spillovers from other fields).
As the figure above makes clear, there is some significant variation, even within agriculture: newly patented plants really do mostly rely mostly on agricultural R&D. Veterinary drugs really don’t. But if you pick one of the six fields below at random and one of our five indicators at random, the chance it corresponds to a flow from out of agriculture to inside it is about 65%. That suggests spillovers are the main source of ideas in agriculture!
What about outside of agriculture? Aslan et al. (2023) show pretty similar results in biomedicine. Since 2008, the NIH has classified its research grants into hundreds of different research categories, such as “cerebral palsy”, “vector-borne diseases”, and “lead poisoning” (to pick three examples at random). How often do grants for one category result in research publications in other categories? Quite often it turns out.
To see how often this kind of unexpected spillover happens, Aslan and coauthors get data on 90,000 funded NIH grants over 2008-2016, and 1.2mn associated publications. If the NIH and journals used the same classification system, it would then be a simple question of seeing how often a grant and its publications are assigned the same category (minimal spillovers) versus different categories (large spillovers). But there are two challenges.
First, unfortunately journals do not classify articles into categories using the same system that the NIH uses to classify its grants. Aslan and coauthors instead use machine learning algorithms to assign journal articles to the NIH’s categories, based on the text of the journal abstracts. Second, the NIH classification system can be too granular for identifying significant knowledge spillovers.
For example, there are categories for both “tobacco” and “tobacco smoke and health.” If research dollars are spent on a proposal assigned to the category “tobacco” but then generate a publication tagged as “tobacco smoke and health”, then while it is technically true that the grant generated knowledge applicable to a different category of knowledge than expected, the new category is so similar to the original that it doesn’t really feel like a significant knowledge spillover. To reduce this worry, Aslan and coauthors use a clustering algorithm to cluster categories frequently assigned to the same grants. This results in 32 different clusters of NIH topics. “Tobacco” and “tobacco smoke and health” now fall under the same category, for example, so that a grant assigned to “tobacco” but generating research assigned to “tobacco smoke and health” would no longer be classified as a knowledge spillover, since both categories are part of the same cluster.
In the end, 58% of publications are assigned at least one category that is different from the ones assigned to the grant. In other words, more than half of the publications emerging from NIH grants are at least partially about a topic significantly different from the topics that the research grant was originally assumed to be about.
Agriculture and biomedicine; maybe this is a life science thing? Myers and Lanahan (2021) use a quasi-experimental methodology to provide some evidence from the Department of Energy’s Small Business Innovation Research (SBIR) program that spillovers are also a big deal in energy research. Every year, the Department of Energy’s SBIR program solicits proposals related to specific kinds of technology. Small businesses submit proposals for R&D projects related to these Department of Energy priorities, and the best ones get funded. These businesses go on to use the funding to do R&D.
But Myers and Lanahan show that the funding also leads to more innovation (or at least more patents) in technology fields other than the one funded. For data, Myers and Lanahan use the cooperative patent classification system, which divides patents up into thousands of different technology categories, ranging from bridges to artificial intelligence. Each one of the SBIR program’s grant competitions is focused on a different technology, but the SBIR’s interests do not cleanly line up with the patent office’s classification system. So Myers and Lanahan use natural language processing to compute the degree of similarity between the text of patents in each technology category with the text of the SBIR’s solicitation for proposals (a patent and the SBIR program will be counted as more similar if they share more words that are usually not common). That means at the end of the day, for every SBIR request for proposals related to a certain technology, Myers and Lanahan can order each of the thousands of cooperative patent classifications from most to least similar to the SBIR request.
The last piece of the puzzle is Myers and Lanahan’s measure of how much funding is given out for R&D in different technology areas. To get as close to possible to a pure experiment, where money is just randomly given to some technologies and not others, Myers and Lanahan painstakingly construct a set of data on money that comes not directly from the SBIR program, but from state-level matching programs. As Myers and Lanahan argue, the basic idea is that money from these state-level matches (where, say, Iowa agrees to top up the R&D funds won by an Iowa company with additional money from state revenues) is pretty randomly distributed among different kinds of technology. It’s mostly down to luck that, say, winning solar technology companies happen to reside in states with these match programs and, say, wind turbine technology companies do not.
Comparing the patent output of technology classes that are textually “close” to classes that get money to the output of classes that aren’t, they can measure how many patents are generated for every million dollars in funding, both in the technology classes closest to the SBIR program’s intention, and those farther away. The figure below shows how many extra patents come from different patent classes, with classes on the left closer to the SBIR proposal and classes to the right farther away. Adding everything up, they find that for every patent directly induced by R&D funding to a specific firm working on a specific technology, another 3 patents are created by other firms working in other technologies. Spillovers account for the majority of the benefits from the Department of Energy’s SBIR program!
But in all three of these cases, we looked at spillovers within a specific domain: agriculture, biomedicine, and energy. Can we say anything systematic?
Bloom, Schankerman, and Van Reenen (2013) tries. This paper uses the set of all publicly traded US firms over 1980-2001 in an effort to assess how R&D by one firm affects others. The basic idea is to use a firm’s patents to tell us the kinds of technologies they do research on. Bloom and coauthors show that when firms spend more on R&D, firms holding patents in closely related technology categories1 file for more patents and are more productive and that this is widespread across sectors.
We might be concerned that this a spurious correlation though. For example, imagine there has been some kind of breakthrough in machine learning. That could spur all the firms working in that space to do extra R&D, and to get more than the usual amount of patents and productivity per dollar for their trouble. In this example, a technological breakthrough causes both elevated R&D productivity and industry-wide R&D. But in Bloom and coauthor’s data, they’ll just see that elevated R&D productivity and industry-wide R&D are correlated. They might spuriously conclude the elevated industry-wide R&D causes the elevated R&D productivity via increased spillovers, since their data doesn’t measure breakthroughs in the field.
To see if this is driving their results, they want to find something that pushes around R&D spending that isn’t related to technology and they settle on US state-level tax policy. If a company has operations in a state that changes taxes in a way to make R&D less costly, then that firm is likely to respond by increasing its R&D spending. We can then see if that kind of tax-induced R&D change (which shouldn’t have anything to do with technological breakthroughs) is associated with increased patenting and productivity in firms working on similar technologies - and indeed it is. The fact that an increase (or a decrease) in R&D following a tax change also leads to changes in the apparent innovation outcomes of other firms operating in similar technology spaces but who do not change their R&D, increases the likelihood this really is telling us something about knowledge spillovers.
One nice thing about this paper is that in 2019, Bloom and Van Reenen worked with Brian Lucking to repeat their analysis on an expanded dataset (more companies and more years). The expanded dataset has 2-3x as many observations. In this update, they continue to find a correlation between R&D by other related firms and patenting/productivity, including when the R&D changes are plausibly driven merely by state tax changes.
So there is a correlation - but how quantitatively are these spillovers? The statistical analysis described above means Bloom and coauthors have nice estimates of how R&D by one firm affects every other firm in their sample. This lets them estimate the private return on R&D and the social return on R&D.
To see the difference, suppose Apple is deciding whether to spend another dollar on R&D. The increase in Apple’s profits due to that dollar are the private returns to R&D. The increase in Apple’s profits, and Google’s profits, and all other publicly traded firms, is the social return on R&D, as measured in this paper. If there’s a strong link between spillovers and profits, than the social return might be large. If there’s no link between their hypothesized measure of spillovers, then the social return might simply be the private return.
In their 2013 paper, they find the private return on R&D is 21%; but the social return is 55%. In the 2019 update, they estimate the private returns to R&D to be 14%, against a 58% social return. Again - more than half the value of R&D comes from its impact on other firms!
New articles and updates to existing articles are typically added to this site every two weeks. To learn what’s new on New Things Under the Sun, subscribe to the newsletter.
Clancy, Matthew, Paul Heisey, Yongjie Ji, and GianCarlo Moschini. 2019. The Roots of Agricultural Innovation: Patent Evidence of Knowledge Spillovers. NBER Working Paper 27011. https://www.nber.org/papers/w27011
Aslan, Yasmin, Ohid Yaqub, Daniele Rotolo, and Bhaven N. Sampat. 2023. Cross-category spillovers in medical research. SocArXiv. https://doi.org/10.31235/osf.io/hpmxd
Myers, Kyle, and Lauren Lanahan. 2021. Estimating spillovers from publicly funded R&D: Evidence from the US Department of Energy. Working paper.
Bloom, Nicholas, Mark Schankerman, and John Van Reenen. 2013. Identifying Technology Spillovers and Product Market Rivalry. Econometrica 81(4): 1347-1393. https://doi.org/10.3982/ECTA9466
Lucking, Brian, Nicholas Bloom, and John Van Reenen. 2019. Have R&D Spillovers Declined in the 21st Century? Fiscal Studies 40(4): 561-590. https://doi.org/10.1111/1475-5890.12195