Skip to main content
SearchLoginLogin or Signup

Free Knowledge and Innovation

Access to libraries boosts local patent rates; access to wikipedia shapes science

Published onMar 01, 2021
Free Knowledge and Innovation

You're viewing an older Release (#8) of this Pub.

  • This Release (#8) was created on Jan 20, 2022 ()
  • The latest Release (#11) was created on Apr 18, 2024 ().

Sometimes obvious ideas work. If you want to encourage more innovation, give people better access to knowledge.

Let's start in the 1800s. Over 1883-1919 (but mostly after 1899) Andrew Carnegie provided the funds for the construction of ~1700 free public libraries, scattered across the USA. There was even one in my hometown.

Berkes and Nencka (2020) sets out to measure the impact of this library building spree on innovation. They want to compare the patenting rate of cities and towns that received libraries to otherwise identical ones that did not. The challenge is finding cities that did not receive libraries, but otherwise serve as a good control group. What Berkes and Nencka use is a set of 200 towns that applied for library funding and were approved for funding, but then changed their mind and rejected Carnegie’s money.

Why would they do that? There are a variety of possible reasons, but a big one is distaste for Carnegie himself after he hired a militia to violently put down a mining strike (many people died). Berkes and Nencka show that these rejecting cities were, on average, no different than the acceptors on various observable metrics such as education levels, racial makeup, job mix, age, population. For Berkes and Nencka’s comparison to work, we have to believe that whatever the reason these cities rejected funding, it’s largely uncorrelated with a tendency to change patenting behavior after an application is accepted. Berkes and Nencka do a couple other things to establish this pretty convincingly in my view.

Because it turns out patenting does change pretty noticeably for the towns that get libraries, compared to the ones that apply and do not. The main story is visible in the figure below, which shows the average number of patents per city-year on a log scale: towns that ended up with libraries looked pretty similar to towns without ones up until they both applied for libraries, but then afterwards the towns that got libraries tended to have 8-12% more patents than the ones that didn’t. In the figure below, the solid line is when funds for a library are granted, and the dashed line is when libraries were typically opened (three years later).

(Aside: why does the figure above have this inverted U-shape? That has to do with larger trends of patent activity shifting away from towns and into big cities during the period under study - having a library did not stop that trend)

Moreover, Berkes and Nencka also provide some supportive evidence that the increase in patenting really does come from library access: patents from cities with libraries are more likely to contain words associated with citing a book (e.g., "vol.", "his book", "pp.", "pages", etc).

Let’s fast forward to 1975. In that year, the US Patent and Trademark Office began a program to dramatically increase the number of patent depository libraries around the country, with a goal of having at least one library in every state. Patents provide inventors with exclusive rights over their invention for a period of time, in exchange for disclosing how the invention works. In theory, that should let other people build on the underlying principles expressed in new inventions. But prior to the internet, to easily read published patent documents, you needed to go to one of these libraries and access to them was very unequally distributed.

Furman, Nagler, and Watzinger (2021) measure the impact of getting a patent library by following a similar strategy as Berkes and Nencka (2020): they compare regions that got a patent depository library to otherwise similar regions that did not. In their case, they use the fact that federal depository libraries serve as a potentially obvious control group. Federal depository libraries are libraries that provide access to federal regulation and laws, and they were also the most common library sites where patent depositories were set up. They compare patent rates in the 15 miles around federal depositories that get a patent depository library to the patent rates in the 15 miles around federal depository libraries that did not, but which are close to ones that did. (Why did some federal depository libraries get patents and others didn’t? The patent office basically followed a principle of first come, first served, so reasons were often idiosyncratic, like the person running a federal depository library wanted to travel to DC for annual patent training).

As with Berkes and Nencka, this paper finds the patent rate of regions that get a patent library diverged from those that didn’t in the subsequent years, as illustrated in the figure below. All told, getting a library seems to boost patent rates by about 18%.

From Furman, Nagler, and Watzinger (2021)

Furman, Nagler, and Watzinger are also able to provide a lot of supporting evidence that this increase in patenting is driven by improved access to patents. For example, the effect is strongest for young firms and small firms, which we might assume are less likely to have alternative ways of accessing patents. The effect is also strongest for technologies that disclose the most information in patents (chemistry patents)

And they also look at the words in patents. After all, a lot of what we learn from patents we learn by reading the words. Furman, Nagler, and Watzinger try to tease out evidence that inventors learn by reading patents by breaking patents down into four categories:

  • Patents that feature globally new words; words that never appeared before in any other patent

  • Patents that feature regionally new words; words new to any patents of inventors who reside within 15 miles of the patent library or its control, but not new in the wider world

  • Patents that feature regionally learned words; words that aren’t necessarily new to the patents of inventors who live within 15 miles of the library, but which were not used on any patents before the library showed up

  • Patents that feature regionally familiar words; those that were already present in patents of inventors residing within 15 miles of the library, even prior to its opening.

To take an example, the word “internet” first appeared in a patent title in patent 5309437, which was filed in 1990 by inventors residing in Maine and New Hampshire. So patent 5309437 features a global new word (Furman, Nagler, and Watzinger actually look at more than just the patent title, but this is just to illustrate the idea). I live in Des Moines, Iowa, where a patent depository library opened in the late 1980s. The first patent (title) mentioning the word “internet” with a Des Moines based inventor was filed in 2011. We would say that patent features a regionally new word, since no other Des Moines patents had the word “internet” in their title prior to 2011 but patents outside Des Moines did. If, in 2012, another Des Moines based-inventor later used the word “internet” in their patent we would classify that patent as a regionally learned word, since the word “internet” did not appear before our patent library was founded. Finally, a Des Moines based patent without the word “internet” or any other words that are new to the local patent corpus since we got our library would be classified as a familiar words patent.

We would expect patent libraries to be especially helpful with regionally new and regionally learned words. These are signals that inventors in, say, Des Moines, are reading about patents from outside Des Moines and adopting new ideas they learn from them. And indeed, when you break patents down in this way, you see more patents of precisely the type you would expect, if people are reading patents and using what they learn to invent new things.

From Furman, Nagler, and Watzinger (2021)

On the other hand, we wouldn’t necessarily expect patent libraries to be as much help for globally new words, since those words are not found in any library - they are completely new to the world of patenting. Nor would we expect them to be much help for regionally familiar words, since those pertain to knowledge that was already available before the patent library was founded. And when we look at changes in the trend of these kinds of patents, we see patent libraries had no detectable impact.

From Furman, Nagler, and Watzinger (2021)

If reading technical information facilitates invention, that also suggests improving online access to knowledge might be another way to boost innovation. Enter Wikipedia. Anyone with access to the internet can now read a free encyclopedia that has 6.3mn articles; for comparison, the encyclopedia Britannica never had more than 100,000 articles, even in it's digital incarnations. Wikipedia also has extremely detailed scientific articles: Thompson and Hanley (2020) find Wikipedia covered 93% of the topics in upper level undergraduate chemistry classes and nearly half of the topics in masters' level graduate school. Does access to Wikipedia have a similar effect on innovation as access to libraries?

To test this, Thompson and Hanley perform an experiment. They commission 43 new chemistry articles, written by PhD students, and then they randomly post half to Wikipedia. They want to see if access to these articles (as compared to the unposted ones) exerts an influence on science.

Only 0.01% of academic papers directly cite Wikipedia (I guess it’s embarrassing), but Thompson and Hanley provide a lot of evidence that scientists read and are influenced by these Wikipedia articles. First off, these are quite specialized chemistry topics - for their experiment, they focus on material from chemistry grad school that wasn't already on Wikipedia. But even though the topic is quite niche, the readership is huge: 4,400 views per month, 2 million total views as of February 2017. Moreover, while people may be shy to cite Wikipedia, they are not shy about citing the scholarly literature that is listed in the Wikipedia reference list. Thompson and Hanley show references in articles they published to Wikipedia got 91% more citations on average than reference in their control group of unpublished articles. People are reading these articles and citing the referenced literature, rather than Wikipedia itself.

Finally, the bulk of Thompson and Hanley's paper uses a textual similarity metric to identify the influence of Wikipedia. For each of their chemistry articles, they compute the similarity of the wiki text to the text of published academic work in Elsevier. Basically, they have a method of checking things like the extent to which both articles use the same unusual words. They compare the similarity of Elsevier and wiki articles published 6 months before a wiki article is published to the similarity of Elsevier and wiki articles published 6 months after the wiki article. The figure below shows how the distribution of similarity changed over that period. The blue line corresponds to similarity with wiki articles that were posted and the green to similarity with the control wiki articles that were not posted (until after the experiment ended). How you can interpret this is that after wiki articles get posted, you see more Elsevier articles that have high similarity to wiki articles (the ones in the top 10% of similarity). In the control, you see the opposite; more Elsevier articles dissimilar to the (unposted) wiki articles are published over time.

So, Thompson and Hanley provide lots of evidence that these Wikipedia articles shape the direction of science. They get read a bunch; the things they cite get cited a bunch; and after they are published, you see more peer-reviewed articles using similar words and phrases as the wiki article. Another thing the paper does is generalize this approach to all of chemistry Wikipedia. Looking at 27,000 chemistry articles published on Wikipedia and 326,000 chemistry articles published on Elsevier, they ask how does the distribution of similarity change for Elsevier articles published before and after Wikipedia articles? What they find is quite similar to their much smaller experiment comprised of 43 new Wikipedia articles. There is an increase in the number of Elsevier articles using similar language as a given Wikipedia article, after it gets published, as compared to the Elsevier articles published before the Wikipedia article.

(As the author of a free website about academic research that strives to be accessible to lay readers, I have chosen to believe I too exert an inescapable gravitational pull on the direction of research)

Public libraries, patent depository libraries, and chemistry articles in Wikipedia; in all three cases, new access seems to have had a measurable impact on innovation.

New articles and updates to existing articles are typically added to this site every two weeks. To learn what’s new on New Things Under the Sun, subscribe to the newsletter.

Cites the above

How to accelerate technological progress

Remote work and the future of innovation

More Science Leads to More Innovation

Urban social infrastructure and innovation

Related to the above

An example of high returns to publicly funded R&D

Science is good at making useful knowledge

Articles cited:

Berkes, Enrico, and Peter Nencka. 2020. Knowledge Access: The Effects of Carnegie Libraries on Innovation. Working Paper.

Furman, Jeffrey L., Markus Nagler, and Martin Watzinger. 2021. Disclosure and Subsequent Innovation: Evidence from the Patent Depository Library Program. American Economic Journal: Economic Policy 13(4): 239-270.

Thompson, Neil C., and Douglas Hanley. 2020. Science is Shaped by Wikipedia: Evidence from a Randomized Control Trial. MIT Sloan Research Paper No. 5238-17

No comments here
Why not start the discussion?