20 years is a good rule of thumb
It seems clear that a better understanding of the regularities that govern our world leads, in time, to better technology: science leads to innovation. But how long does this process take?
Two complementary lines of evidence suggest 20 years is a good rule of thumb.
James Adams was one of the first to take a crack at this in a serious quantitative way. In 1990, Adams published a paper titled "Fundamental Stocks of Knowledge and Productivity Growth" in the Journal of Political Economy. Adams wanted to see how strong was the link between academic science and the performance of private industry in subsequent decades.
Adams had two pieces of data that he needed to knit together. On the science side, he had data on the annual number of new journal articles in nine different fields, from 1908 to 1980. On the private industry side, he had data on the productivity of 18 different manufacturing industries over 1966-1980. Productivity here is a measure of how much output a firm can squeeze out of the same amount of capital, labor, and other inputs. If a firm can get more output (or higher quality outputs) from the same amount of capital and labor, economists usually assume that reflects an improved technology (though it can also mean other things). What Adams basically wanted to do was see if industries experienced a jump in productivity sometime after a jump in the number of relevant scientific articles.
The trouble is that pesky word “relevant.” Adams has data on the number of journal articles in fields like biology, chemistry, and mathematics, but that's not how industry is organized. Industry is divided into sectors like textiles, transportation, and petroleum. What scientific fields are most relevant to the textiles industry? To transportation equipment? To petroleum?
To knit the data together, Adams used a third set of data: the number of scientists in different fields that work for each industry. To see how much the textiles sector relies on biology, chemistry, or mathematics, he looked at how many biologists, chemists, and mathematicians the sector employed. That data did exist. If they employed a lot of chemists, they probably used chemistry; if they employed lots of biologists, they probably used biology, and so on. He weighted the number of articles in each field by the number of scientists working in that field to get a measure of how much each industry relies on basic science.
So now Adams has data on the productivity and relevant scientific base of 18 different manufacturing sectors. He expects more science will eventually lead to more productivity, but he doesn’t know how long that will take. If there was a surge in scientific articles in a given year, at what point would Adams expect to see a surge in productivity? If scientific insight can be instantly applied, then the surge in productivity should be instantaneous. But if the science has to work it's way through a long series of development, then the benefits to industry might show up later. How much later?
To come up with an estimate, Adams basically looked at how strong the correlations were for 5 years, 10 years, and 20 years. Specifically, he looked to see which one gives him the strongest statistical fit between scientific articles produced in a five-year span, and the productivity increase in a five-year span for industries that use that field's knowledge intensively. Of the time lags he tried, he found the strongest correlation was 20 years.
Adams’ study is an important first step, but recent work has largely validated Adams’ original findings.
Nearly thirty years later, Baldos et al. (2018) tackled a similar problem with different data and a more sophisticated statistical technique. Unlike Adams, they focused on a single sector - agriculture. Like Adams, they had two pieces of data they wanted to knit together.
On the science side, a small group of ag economists has spent a long time assembling a data series on total agricultural R&D spending by US states and the federal government. Until recently, governments were a really big source of agricultural research dollars and those dollars can be relatively easily identified since they frequently flow through the department of agriculture or state experiment stations. The upshot is Baldos et al. have data on public sector agricultural R&D going back to 1908. Meanwhile, on the technology side, the USDA maintains a data series on the total factor productivity of US agriculture from 1949 to present day. So like Adams, they’re going to try and look for a statistical correlation between productivity and research. Unlike Adams, they’re going to use dollars to measure science and unlike Adams they’re going to focus on a single sector (and not a manufacturing sector).
To deal with the fact that we don’t know when research spending impacts productivity growth, they’re going to adopt a Bayesian approach. How long does it take agricultural spending to influence productivity? They don’t know, but they make a few starting assumptions:
They assume impact will follow an upside “U” shape. The idea here is that new knowledge takes time to be developed into applications, and then it takes time for those applications to be adopted across the industry. During this time, the impact of R&D done in one year on productivity in subsequent years is rising. At some point - maybe 20 years later - the impact of that R&D on productivity growth hits its peak. But after that point, the R&D becomes less relevant, so that it’s impact on productivity growth in subsequent years declines. Eventually, the ideas become obsolete and have no additional impact on increasing productivity.
They assume the impact of R&D on productivity after 50 years is basically zero.
They assume the peak impact will occur sometime between 10 and 40 years, with the most likely outcome somewhere around 20. This assumption is based on earlier work, similar in spirit to Adams.
Given these estimates, they are basically assuming there’s a bunch of different possible distributions for the relationship between R&D spending and productivity. They assume the most likely distribution is one peaking around 20 years, and the farther the distribution is from that, the more unlikely. They then use Bayes’ rule to update their beliefs, given the data on R&D spending and agricultural productivity. It will turn out that some of those distributions fit the data quite well, in the sense that if that distribution of R&D impacts is true, then the R&D spending data matches productivity pretty well. Others fit quite poorly. We update our beliefs after observing the data, increasing our belief in the ones that fit the data well, and decreasing our beliefs in the ones that don’t.
They find, with 95% probability, the best fitting distribution indicates science impacts productivity the strongest after 15-24 years, with the best point estimate around 20 years.
Let’s turn to one more industry study: pharmaceuticals. Toole (2012) wants to assess the link between NIH funded research and the invention of new drugs (rather than productivity growth). He has data on NIH funding for biomedical research from 1955 through 1996, broken down into six broad therapeutic areas: endocrine/cancer, central nervous system, cardiovascular, anti-infective, gastro-intestinal/genito-urinary, and respiratory. He’s also got data on the annual number of FDA approved drugs in these six categories. His study focuses on new molecular entities, that is, approved drugs featuring a novel active ingredient. He’s also got information on private sector R&D in these six categories, as well as the potential market for different therapeutic treatments. Toole shows both of these affect drug development too, but we’ll be focusing on the impact of scientific research.
The basic idea is to see if, in a particular therapeutic area, the number of new molecular entities submitted for approval to the FDA increase in the years after an increase in R&D spending by industry and the NIH in the same category, and if so, how long is the gap between the two. To pin down the right lag, for each therapeutic area, Toole creates four different “knowledge stocks.” Each of these is basically the sum of past research spending, but giving more weight to more recent spending. One of these stocks includes all research funded more than 24 years ago. Another includes all research funded more than 16 years ago. More than 8 years ago for another, and one more that includes all prior research spending. When he includes all four in a statistical regression, the only one significantly correlated with new drug approvals is the one built from NIH research funded more than 16 years ago. In other words, the best predictor of new molecular entity approval applications in a given therapeutic area is the funding conducted roughly 17-24 years previously.
So whether for manufacturing, agriculture, or pharmaceuticals, using slightly different data and statistical techniques, we find a correlation between productivity growth and basic science that is strongest around 20 years. But at the end of the day, these methods look for correlations between two messy variables separated in time by decades. It’s hard to make this completely convincing.
A cleaner alternative is to look at the citations made by patents to scientific articles. If we assume patents are a decent measure of technology (more on that later) and journal articles a decent measure of science, then citations of journal articles by patents could be a direct measure of technology’s use of science. In the last few years, data on patent citations to journal articles has become available at scale, thanks to advances in natural language processing (see Marx and Fuegi 2019, Marx and Fuegi 2020, and the patCite project).
Do citations to science actually mean patented technologies use the science? Maybe the citations are just meant to pad out the application? There's a lot of suggestive evidence that they do. For one; they say they do. Arora, Belenzon, and Sheer (2017) use an old 1994 survey from Carnegie Mellon about the use of science by firms. Firms that cite a lot of academic literature in their patents are also more likely to report in the survey that they use science in the development of new innovations.
There's also a variety of evidence that patents that cite scientific papers are different from patents that don't. Watzinger and Schnitzer (2019) scan the text of patents, and find patents that cite science are more likely to include combinations of words that have been rare up until the year the patent was filed. This suggests these patents are doing something new; they don't read like older patents. Maybe they are using brand new ideas or insights that they have obtained from science? They also find these patents tend to be worth more money. Ahmadpoor and Jones (2017) find these science-ey patents also tend to be more highly cited by other patents.
So let’s go ahead and assume citations to journal articles are a decent measure of how technology uses science. What do we learn from doing that?
Marx and Fuegi (2020) apply natural language processing and machine learning algorithms to the raw text of all US patents, to pull out all citations made to the academic literature. They find about 29% of patents make some kind of citation to science. More importantly for our purposes, for patents granted since 1976, the average gap in time between a patent’s filing date and the publication date of the scientific articles it cites is 17 years.
This is pretty close to the twenty years estimated using the other techniques; especially when we consider that the date a patent is filed is not necessarily the date it begins to affect productivity (which is what the other studies were measuring). Any new invention that seeks patent protection might take a few years before it is widely diffused and begins to affect productivity. In that case, a twenty year estimate is pretty close to spot on.
Of course, there is variation around that 17 year figure. Ahmadpoor and Jones (2017) found that the average shortest time gap between a patent and a cited journal was just 7 years. And on the other side, there are also longer and more indirect paths from science to technology than direct citation. Ahmadpoor and Jones cite Riemannian geometry as an example. Riemannian geometry was developed by Bernhard Riemann in the 19th century as an abstract mathematical concept, with little or no real world application until it was incorporated in Einstein's general theory of relativity. Later, those ideas were used to develop the time dilation corrections in GPS satellites. That technology, in turn, has been useful in the development of autonomous vehicles. In a sense then, patents for self-driving tractors owe some of their development to Riemannian geometry, though this would only be detectable by following a long and circuitous route of citation.
When these longer and more circuitous chains of citation are followed, Ahmadpoor and Jones find that 61% of patents are "connected" to science; patents that do not directly cite scientific articles may cite patents that do, or they may cite patents that cite patents that do, and so on. About 80% of science and engineering articles are "connected" to patents, in the sense that some chain of citation links them to a patent. (Technical note: this probably understates the actual linkages, because it is based only on citations listed on a patent’s front page; Marx and Fuegi 2020 show additional citations are commonly found in a patent’s text).
The authors compute metrics for the average "distance" of a scientific field from technological application and these distance metrics largely line up with our intuitions. For example, material science and computer science papers are fields where we might expect the results to be quite applicable to technology, and indeed, they tend to be among the closest to patents - just 2 steps away from a patent (cited by a paper that is, in turn, cited by a patent). Atomic/molecular/chemical physics papers would also seem likely to have applications, but only after more investigation, and they tend to be 3 steps removed from technology (cited by a paper that is cited by a paper that is cited by a patent). And as we might expect, the field that is farthest from immediate application is pure mathematics (five steps removed). As expected, these longer citation paths also take longer when measured in time. The average gap between the shortest path from a math paper to a patent is more than 20 years.
On the other side of the divide, they also compute the average distance of technology fields to science, and these also align with our intuitions. Chemistry and molecular biology patents would probably be expected to rely heavily on science, and they tend to be slightly more than 1 step removed from science (most directly cite scientific papers). Further downstream, electrical computers tend to be two steps removed (they cite a patent that cites a scientific article) and ancient forms of technology like chairs and seats tend to be the farthest from science (five steps removed). The average gap between the shortest citation path from a chair/seat patent and a scientific article is also over 20 years.
All told, it’s reassuring that two distinct approaches arrive at a similar figure. The patent citation evidence is reasonably direct - we see exactly what technology at what date cites which scientific article, as well as the date that article was published. This line of evidence finds an average gap of about 17 years, with plenty of scope for shorter and longer gaps as well. The trouble with this evidence is that patents are far from a perfect measure of technology. Lots of things do not ever get patented, and lots of patents are for inventions of dubious quality.
For that reason, it’s nice to have a different set of evidence that does not rely on citations or patents at all. When we try to crudely measure “science”, either by counting government dollars or scientific articles, we can detect a correlation with increases in science and the productivity of industries expected to use it 20 years later - again, with plenty of scope for shorter and longer lags as well.
New articles and updates to existing articles are typically added every three weeks. To learn what’s new on New Things Under the Sun, subscribe to the newsletter.
More Science Leads to More Innovation
How to accelerate technological progress
More Science Leads to More Innovation
Ripples in the river of knowledge
Science is good at making useful knowledge
Medicine and the limits of market driven innovation
Pulling more fuel efficient cars into existence
Adams, James D. 1990. Fundamental stocks of knowledge and productivity growth. Journal of Political Economy 98(4): 673-702. https://www.jstor.org/stable/2937764
Baldos, Uris Lantz, Frederi G. Viens, Thomas W. Hertel, and Keith O. Fuglie. 2018. R&D spending, knowledge capital, and agricultural productivity growth: a Bayesian approach. American Journal of Agricultural Economics 101(1): 291-310. https://doi.org/10.1093/ajae/aay039
Toole, Andrew. 2012. The impact of public basic research on industrial innovation: Evidence from the pharmaceutical industry. Research Policy 41(1): 1-12. https://doi.org/10.1016/j.respol.2011.06.004
Marx, Matt, and Aaron Fuegi. 2020. Reliance on science: Worldwide front-page patent citations to scientific articles. Strategic Management Journal 41(9): 1572-1594. https://doi.org/10.1002/smj.3145
Marx, Matt, and Aaron Fuegi. 2020. Reliance on science by inventors: hybrid extraction of in-text patent-to-article citations. NBER Working Paper 27987. https://ssrn.com/abstract=3718899
Arora, Ashish, Sharon Belenzon, and Lia Sheer. 2017. Back to basics: why do firms invest in research? NBER Working Paper 23187. https://ssrn.com/abstract=2920404
Watzinger, Martin, and Monika Schnitzer. 2019. Standing on the Shoulders of Science. CEPR Discussion Paper No. DP13766. https://ssrn.com/abstract=3401853
Ahmadpoor, Mohammad, and Benjamin F. Jones. 2017. The Dual Frontier: Patented inventions and prior scientific advance. Science357(6351): 583-587. https://doi.org/10.1126/science.aam9527