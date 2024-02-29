Lots of social science research about innovation relies on patents as a way to measure innovation. But it’s not clear that patents are a great way to measure innovation. Probably only a relatively small share of inventions receive patent protection; moreover, while patenting does predict a lot of other measures of innovation, the linkage tends to be a pretty noisy one. Maybe the patent-based innovation literature is built on a foundation of sand?

One way to validate patents as a measure of innovation is to exploit the fact that tons of papers study the same phenomena with different datasets: some use patents, some don’t. Do they tend to arrive at different results? If so, that suggests the papers using patent data might be picking up something unique about patents, rather than something about innovation per se. On the other hand, if analyses built on patent and non-patent data tend to get similar results, that suggests patents are roughly as good a measure of innovation as the available alternatives.

I think New Things Under the Sun can itself be a useful data source on this particular question. At the time of writing (March 2024), New Things Under the Sun consists of 73 articles that synthesize multiple academic papers to examine various narrow claims about innovation. I count 37 articles that discuss studies built on both patent and non-patent data. Among these 37, how often do the patent-based analyses disagree with the non-patent analyses?

I looked them over to see.

My takeaway from this exercise is that studies relying on patent data tend to obtain similar results to those that don’t. In 31/37 (84%) of the claims I’ve looked at, I didn’t think there was meaningful disagreement between the patent and non-patent studies: regardless of which type of data is used for a problem, results were broadly consistent. In the other 6/37 (16%), I thought there was generally a mix of agreement and disagreement. The patent and non-patent data differed along some qualitatively important dimension, though even in these cases I didn’t find uniform disagreement. For example, in the article Are ideas getting harder to find because of the burden of knowledge?, non-patent data indicates first discoveries are being made at increasingly older ages, but patent data doesn’t show this. However, both the patent and non-patent data were consistent with team sizes increasing and specialization increasing. Nonetheless, because there was some disagreement, I classified this article as exhibiting some disagreement between the patent and non-patent evidence.

Actually, I’m not sure the differences I found between patent and non-patent data were any more severe than you would find if you were to explore the same phenomena with the same dataset (for example, two papers looking at the same thing with data on journal articles). That said, note that my definitions of agreement and disagreement are kind of loose and subjective; directionally the same, rather than numerically the same. Moreover, not all of the scope for agreement and disagreement was super substantive. Sometimes the bulk of the evidence comes from almost exclusively patent or almost exclusively non-patent data, and the data from the other source only covers a part of the overall claim. Even so, in many cases, it’s a bit surprising to me there aren’t more disagreements, since in some cases there are important differences between the kinds of innovation that are studied by patent or non-patent data.

In the next section, I display how I classified these 37 articles, along with a short description of where I saw agreement or disagreement. Feel free to skip it for some further discussion about potential biases with this exercise, due to selection effects.

Classifications of New Things Articles

At least some disagreement

Age and the impact of innovation: As scientists or inventors age, their work receives fewer citations, from a narrower set of inventors, and becomes less disruptive as measured by both papers and patents. But productivity over an academic lifecycle appears to remain high for a longer period of time (as measured by production of papers) than productivity of an inventor (as measured by patents). Are ideas getting harder to find because of the burden of knowledge? The age of first scientific discovery has steadily increased, while the age of first patent rose, but then fell. However both patents and academic papers find team size and specialization is on the rise. How common is independent invention? Evidence from both patents and papers finds the incidence of simultaneous independent discovery is quite rare; but the rate implied by patent interference hearings is orders of magnitude lower than for papers. At the same time, evidence from both patents and papers suggests multiple independent discovery is more likely for more valuable research ideas. Innovation (mostly) gets harder: The same level of research effort yields fewer successively smaller improvements by most measures. This is not true for raw patent counts, but is true for one measure of particularly innovative patents. Teaching innovative entrepreneurship: One study of two particular entrepreneurship training programs looked at many different indicators of successful entrepreneurship. Neither program had a statistically significant effect on patenting by participants. For one of the programs, this was consistent with it having no impact on any other measures; for the other, it had a positive effect on some measures of successful entrepreneurship, but not patents and a few other measures. The best new ideas combine disparate old ideas: Patents and papers that comprise unusual combinations of ideas are associated with higher impact. There is some evidence that the highest impact papers also make some more conventional combinations than patents.

No disagreement

Selection Bias?

The above finds broad agreement between innovation studies that use patent data and those that don’t, where they study closely related phenomena. But we might reasonably worry: is this just an artifact of selection?

Indeed, there are multiple possible layers of selection bias.

The first level of selection bias is that researchers decide when and when not to use patent data. In this post’s exercise, I’m only observing the cases where the researcher thought patents would be an appropriate measure of innovation and where I thought the paper was a good fit for New Things Under the Sun. And so the claim that “patent and non-patent data tend to arrive at similar conclusions” only applies to the set of claims where researchers thought patents were an appropriate dataset (and I thought the researchers wrote a nice paper).

To give a concrete example, I have a series of posts about publication bias in the sciences - the notion that the research record gives us a biased picture of evidence since only positive findings are publishable. Only one of those posts features any studies reliant on patent data (see #19 above in the “No disagreements” list). It makes sense that few researchers thought it was appropriate to study publication bias with patents, since publication bias is typically assumed to be an outcome of incentives that are peculiar to academia, not private sector invention. If someone did try to study publication bias in patents, they might get quite a different result than if they had studied it with data on journal articles.

The upshot is that this post’s analysis implies that if you think a paper is by a good researcher and it uses patent data, the results of the paper would probably agree with another paper on the same topic that didn’t use patent data. But, if you instead start with a specific research question, these results don’t imply you would get the same results whether you use patents or not. They instead imply that you would, if it’s the kind of research question that researchers think patents are appropriate for. If it’s not, then the results of this post don’t really apply. The claim is not that patents measure innovation well in all cases. The claim is that innovation researchers have done a decent job of restricting their attention to cases when patents do work well.

There is a second potential layer of selection bias though, above the researcher’s own decision about whether to use patents. Publication bias might actually be giving us a skewed perception of how reliable is patent data itself! Suppose that patents really are a bad measure of innovation, and accordingly they rarely deliver positive findings. It might be the case that we only observe the papers that do get positive results, since those are the only ones that are publishable. If this issue is serious, it would mean I’m overstating the extent to which research using patent data arrives at similar conclusions as papers that do not. I think the popularity of patent data as a data source is some evidence against this concern - if the data had a reputation for leading disproportionately often to unpublishable null results, it probably wouldn’t be so popular. But it is something to bear in mind.

Lastly, there could be bias from the fact that my choice of topics on New Things Under the Sun isn’t random. I like writing about topics I think are important or where I think academic research can tell us something useful. The latter preference is potentially a serious source of bias. All else equal I feel less enthusiastic writing about a field where there is a muddle of different findings depending on which dataset you use (though I would still write a post if I thought the topic was important). That might mean my selection of topics is biased towards claims where patent and non-patent data obtain similar results, since those are the ones where I’m most confident social science research can tell us something.

There’s at least one way to evaluate how much of a concern this should be. New Things Under the Sun is a living literature review. There might well be a selection bias in how I choose which articles to write. But after the articles are written, there is a lot less bias in my choice about what articles to update. One of my goals for this project is for the posts to provide an honest account of the state of the literature. That means if new studies come out that contradict what I’ve already written, I do feel obliged to update the post to reflect this. That presents an opportunity to check for this last form of selection bias. If updates tend to find more disagreement between patent and non-patent data than original articles, that would suggest my choice of what to initially write about is overstating the extent to which patent and non-patent studies agree.

Going through my newsletter archive, I found 20 updates to existing articles that include patent and non-patent data. Of these updates, 3 have at least some disagreement between the patent and non-patent analyses. The other 17 do not have any meaningful disagreement, in my judgment. This is pretty close to the ratio I found in my original survey of 37 articles that examine both patent and non-patent data. About 15% (3/15) of the time, there is some disagreement between analyses reliant on patent data and those that do not, compared to 16% in my main analysis. See the appendix for how I classified each of these 20 updates, along with a short description of the nature of agreement or disagreement.

All in all, this exercise formalizes an intuition I’ve had for a long time. I’ve noticed that when I write about studies that use patent data, I often encounter some skepticism. For that very reason, I often go out of my way to try and find articles that do not rely on patent data, but which study the same phenomena as the patent-based papers I’m writing about. And in my experience, that exercise rarely leads me to substantively revise my original views. In the academic literature, if it’s possible and sensible to study a question with both patent data and non-patent data, in my experience results are subjectively similar.

Appendix

In the following, I classify updates to existing articles to see whether the update disagreed with evidence in the original article.

At least some disagreement

April 2022 Updates: This post discussed a major revision to one of the first articles written for New Things Under the Sun. This update introduced some of the nuances about the differential reliance on conventional combinations of ideas between papers and patents. December 2022 Updates: Expanded the post Innovation (mostly) gets harder to talk about patents. While many measures of innovation suggest a constant level of research effort has diminishing returns, raw patent counts do not display this trend; but a measure of particularly innovative patents does. September 2023 Updates: Survey evidence shows the chances a firm has at least one process innovation grows faster than the probability it has at least one product innovation as the firm gets larger, until the size of the firm hits 50 employees, at which point the relationship reverses. In contrast patent evidence shows that as firms get larger a bigger share of patents are processes. That said, this result is a bit ambiguous, since the survey data doesn’t tell us the share of innovations that are process or product, just whether firms have at least one.

No Disagreement

January 2024 Updates: An update to the post Geography and What Gets Researched presents some evidence that management science scholars are more likely to study their own countries; patent evidence from the original post documented similar phenomena for the inventors of new agricultural technologies.