Here’s a striking fact: through 2022, one in two Nobel prize winners in physics, chemistry, and medicine also had a Nobel prize winner as their academic advisor.1
What accounts for this extraordinary transmission rate of scientific excellence?
There’s two main possibilities. Maybe great students and great teachers seek each other out and tend to work together. Or maybe great teachers give their students resources that make them better scientists: teaching, access to networks, support, etc.
Both are probably important to one degree or another. But in this post I’ll focus on an aspect of the second channel: what do we know about how innovative teachers influence their students, and their students’ subsequent innovative career? I’ll focus on two strands of literatures: roughly speaking, how teachers influence what their students are interested in and the impact of their work.
Interesting Correlations
To start, we’ll establish some correlations between the interests of students and their teachers. Borowiecki (2022) focuses on teacher to student transmission of interests among musical composers from 1450-1975; Koschnick (2023) among undergraduates and faculty at Oxford and Cambridge over 1600-1800; Azoulay, Liu, and Stuart (2017) on modern post-docs and their advisors in the life sciences. In the next section, we’ll try to go further and show that these correlations are likely to be in large part about the teacher’s influence on student interests, rather than students sorting themselves to work with teachers who share their interests.
All three papers involve heroic data construction efforts. Borowiecki’s core analysis relies on data about 341 composers, where they lived, what music they wrote, and how impactful their music is (measured by either modern Spotify follows, length of their biographies in a major musical dictionary, or rankings by Charles Murray). Borowiecki also identifies 221 student-teacher connections among this group, when the one taught the other at a music conservatory. Lastly, because Borowiecki has detailed information on the musical themes of his composers, he can algorithmically assess how similar are the musical themes of any two composers.
Borowiecki’s main analysis shows that composers write music with themes that are more similar to the themes of their teachers, than to other composers. This effect holds when you restrict the comparisons to other composers living in the same country and alive at the same time as the teacher. He finds this similarity persists for around 20 years, and even across generations: composers write music more similar to the teacher of their teacher than to other composers who might have taught their teacher but didn’t.
Let’s turn to interests in science, which are studied by Koschnick (2023). Koschnick’s analysis builds on a dataset that matches students and faculty at Cambridge and Oxford (over 1600-1800) to a database of publications in England, based on names and birth and death dates (where available). He wants to use these matched publications to infer student and faculty’s interest in different areas of science (or other topics): for example, students/faculty with more publications about astronomy are probably more interested in astronomy. To do so, Koschnick trains a large language model to classify publications into topics - he’s helped here by the era’s propensity to write very long and descriptive titles of their works.2 Finally, he wants to match students to teachers, to see if being around teachers more interested in a specific area of science makes the student more likely to work on that area. For that, he relies on the college system employed by these universities. Students at these universities belong to one of dozens of colleges, where they live with their college peers and are primarily taught by faculty from their college. Since Koschnick knows which college each faculty belongs to, he knows with a high degree of certainty which faculty are teaching which students.
Koschnick documents that after they graduate, students tend to publish more on scientific topics which were more common among the publications of the faculty at the college they attended. If the share of faculty publications at your college in one scientific field doubles, then the share of publications in that field written by its students rises by 1-3%. That doesn’t sound like much, but note the average college share of science in any field is tiny - only 0.6%. So doubling the share is quite easy. In fact, the variation across colleges can vary by much more than double. One standard deviation in this data is more like a 6x increase over the average.
Finally, Azoulay, Liu, and Stuart (2017) build a dataset on 489 elite life scientist post-doctoral students and their 333 advisors. These post-docs are Pew or Searle Scholars, which is useful because the Pew Scholar Oral History and Archives Project provides extensive documentation on the biography of Pew scholars, which Azoulay, Liu, and Stuart will draw on in the analysis discussed in the next section. For now, suffice it to say Azoulay and coauthors show that post-docs who work with advisors that have previously held patents are more likely to seek patents of their own in the future.
Birds of a Feather?
These three papers establish that students appear to share interests with their teachers, whether that interest be a particular style of music, a field of science, or commercializing research. But we haven’t done anything to establish this correlation is down to teacher influence. It might just as easily be that young composers seek out teachers whose music they like, that students go to colleges strong in the subject area they are interested in, and that budding entrepreneurial scientists seek out mentors with experience commercializing their research. All three papers present evidence that these kinds of explanations are probably not the main story.
To begin with, both Borowiecki and Koschnick’s papers involve students making decisions at a relatively young age, before we might imagine they have deeply developed personal preferences. In Borowiecki (2022), 75% of students begin their training at a music conservatory, with their advisor, before the age of 22. Koschnick’s paper focuses on undergraduates. Both papers also primarily take place in eras that predate the information technology revolutions, when information about potential teachers was less readily available.
Koschnick’s paper goes on to argue that, instead, undergraduates to Oxford often selected their college based on geographical affinities. For example, in his data, students from Devon and Cornwall are more likely to go to Exeter college and students from Pembroke more likely to go to Jesus college. In one analytical exercise, he shows that students are more likely to write about a given scientific topic if the faculty of the college people in their region usually go to happen to be stronger in that field, during the years the student is at uni. In that particular exercise, he doesn’t even need to know where students actually ended up going to school, just where they would be predicted to go based on where they live.
For Azoulay, Liu, and Stuart’s study of postdocs and their advisors, they have access to an unusually rich source of information about the decision-making process of their subjects: the oral histories of Pew scholars. The authors read a sample of 62 such histories (each is long; 100-400 pages) to see what kinds of factors Pew scholars self report as being important in their decision of which postdoc mentor to work with. The overwhelmingly most important factor cited was the scientific topic being investigated, followed by geography (where the lab was), the advisor’s prestige in the field, and interpersonal rapport. None mentioned the commercial orientation of the advisor, or their interest in patenting. And this wasn’t simply because they were shy to talk about non-academic goals; when asked about their own patents, interviewees were apparently quite candid.
Azoulay, Liu, and Stuart use this qualitative analysis to form the basis of some additional quantitative exercises. They come up with measures of scientific similarity, geographical proximity, and prestige, which they use to derive statistical models of the matching process between postdocs and mentors. They can then see if matches that are poorly explained by these stated factors seem to be unusually correlated with the decision to patent, which would be evidence that people left their true motivations - a desire to work with a scientist who patents - unstated. But they don’t really find any evidence of this. The statistics back up what the scholars say: recent graduates don’t really think about patenting when deciding who to work with for their postdocs. But if they “accidentally” end up working with an advisor with a history of patenting, they’re more likely to patent themselves, later in their career.
Both Borowiecki and Koschnick also perform an exercise based on teacher composition at conservatories and colleges. In one exercise, Borowiecki looks at how similar are the musical styles of a student and teacher, as compared to teachers at the same conservatory who either left shortly before the student joined or arrived shortly after the student left. The idea here is that if students had started at conservatory at a slightly different time they might well have ended up working with this alternative teacher. Koschnick’s study exploits an even more abrupt change in the faculty: the ouster of roughly half the fellows of the University of Oxford following the English civil war (they didn’t support the winning side) and their replacement, which he argues was random at least as regards to scientific field interest. He then looks to see if student interests in specific scientific fields are also correlated with interests of the replacement faculty (who students could not have anticipated would be their teachers). Both these exercises support the notion that teacher influence is the main reason for the correlation between student and teacher interests. Musical composers are more similar to the teachers they actually had, as compared to teachers at the same conservatory who weren’t available, and student interests in different scientific topics are correlated with the interests of the replacement faculty who unexpectedly ended up teaching them. The size of the effects is comparable to the main analysis, though estimated with less precision (because these analyses are necessarily based on smaller samples).
Lastly, a final reason to believe that these correlations arise from teacher influence, rather than sorting, is that we see similar effects in other domains. The New Things Under the Sun posts Entrepreneurship is contagious and The “idea” of being an entrepreneur cover a related literature that argues role models play an important part in building an interest in becoming an entrepreneur. That literature is pretty big, and pulls together observational, quasi-experimental, and experimental data. So we already had evidence that preferences could be powerfully shaped by role models.
So across a few contexts, we have evidence that students pick up the interests of their teachers. How big is this effect? Each paper frames effect sizes in different ways that are hard to compare against each other. Koschnick finds that across colleges, if faculty writing on scientific topics rose by about 1 standard deviation - or about 650% - then students would increase their share of writing on these topics by 5-15% (from a low base). Borowiecki finds that composers are 10-30% of one standard deviation closer to the music of their teachers than the comparison group. And Azoulay, Liu, and Stuart find having a patenting mentor has an effect on patenting comparable to the gender divide on patent rates.
Training for excellence
So teachers can transmit something about their “style” of innovation to their students, though I would argue the above indicates teacher influence isn’t the dominant determinant of interests. That provides some evidence that one reason Nobel laureates are so likely to mentor other Nobel laureates might be, in part, because they transmit good taste about research topics and practices.
Borowiecki’s work on composers provides some more direct evidence on teachers and the transmission of excellence, in addition to the transmission of interests. Borowiecki measures achievement in composing a couple of different ways, one of which is the number of followers a composer has today on Spotify. He shows that students are more likely to be in the top 25% of Spotify followers if they are more similar to their teacher and their teacher is also in the top 25%.
Let’s turn to two other studies that shed some light on the influence of teachers on the achievements of their students. A first topic we might ask is: are there differences in what top researchers teach their students? Biasi and Ma (2023) is a study about what topics college students learn about, based on the text of 1.7mn academic syllabi. It’s a descriptive paper, rather than one trying to tease out causal pathways, but it does provide strong evidence that researchers on the academic frontier teach their students about things closer to the frontier.
To measure what students are learning about in their classes, Biasi and Ma develop a way to measure if a syllabus draws more extensively on new or older academic research. First, they start with a list of all words associated knowledge concepts or skills - they use words that have ever been assigned to a journal article as its keywords. Second, they assume an article and a syllabus are more similar if the syllabus and article abstract share more of these “knowledge” words, and if those words are less commonly used. Third, they calculate the average similarity between a syllabus and all the academic articles published in a recent three year window. Fourth, they repeat step three, but using an older three year window. The ratio of the similarity to the older articles and the newer articles is their measure of how much a syllabus stays close to the academic frontier. Essentially it’s a measure of the extent to which your syllabus shares more research lingo with recent research, compared to older research.3
So with all that work to build a big dataset of what teachers are doing in their classes, what do we learn? It turns out faculty vary a lot in what they teach, even within the same course. Consider the following figures, which plot this syllabus measure against measures of research productivity. On the horizontal axis we have either the number of publications or the number of citations the instructor received in the preceding five years. On the vertical axis, we have Biasi and Ma’s measure of how related a syllabus is to recent research - lower means a syllabus’ language is closer to the language of recent articles than older ones.
In the figure, we can see that more productive researchers, and researchers who are highly cited, tend to teach their students about topics that are most similar to recent academic research. Importantly, the above figure controls for year and course effects. It is not merely that active researchers are assigned to teach, for example, graduate students, and less active researchers are assigned to teach the 101 classes. Instead, within a particular course, the faculty who excel at research (as measured imperfectly by publications and citations) teach their students different things (or at least their syllabi say they do). And the effects aren’t small either. Biasi and Ma estimate that every one unit change in the syllabus measure corresponds to a change in 26% of the words on that syllabus.
To close, let’s turn to Waldinger (2010), one of the most direct studies of the impact of teachers on PhD students. Waldinger is looking at the link between the academic productivity of German math departments and the academic outcomes of their students. For 33 different math departments, Waldinger computes the average number of citations per faculty at each department (restricting attention to articles published in top journals). Across 690 PhD students, those in math departments where the professors have more citations on average are more likely to publish later in life and their work receives more citations.
Is that because the best students go to the best departments, or because the best departments create the best future mathematicians? What makes this paper extraordinary is that after students had already sorted themselves into different departments, Nazi policy banning Jews and “politically undesirable” individuals from the civil service capriciously and all-but-randomly hurt or decimated different math departments.4 Of 33 departments, nine lost more than a quarter of their faculty - often very good faculty. Two lost half their faculty or more! Meanwhile, fifteen departments were untouched by the policy at all, since they had no Jewish faculty.
These dismissals hurt the ability of faculty to influence their students, while holding constant the sorting of students into departments of different perceived quality. As a first pass indication that this mattered, consider the following figure. Prior to the Nazi dismissals, students at departments who would go on to lose faculty had persistently higher probabilities of publishing cited academic work. After the dismissal, students in those departments lost this advantage.
But this figure understates things by lumping together the departments that lost small and large shares of their faculty. For a more complete analysis, Waldinger uses unexpected dismissals to develop a measure of research quality that depends only on these dismissals and not pre-existing departmental quality. So, if you went to a department that expectedly lost a lot of good faculty, you might have been a very good student (since you got in), but Waldinger shows the loss of these mentors had substantial negative effects on long-run student outcomes. Students in departments that lost talented faculty were less likely to publish their dissertations, less likely to become full professors, and received fewer citations over their life. Echoing the results on transmission of teacher interests, Waldinger’s results are consistent with teacher influence, rather than sorting, as the main reasons faculty and student outcomes are correlated. His results indicate a one standard deviation increase in the research productivity of departmental faculty is associated with an extra 6.3 lifetime citations for students (against an average of 11.2 lifetime citations).
Selection versus influence
We started this post by noting that half of the STEM Nobel laureates had a STEM Nobel laureate as their advisor. I speculated that this might be down to talented students seeking out talented teachers or teachers giving advantages to their students (either by teaching them or perhaps by giving them other advantages).
The papers covered in this post suggests a non-negligible portion of that is down to what the Nobel laureates pass on to their students. The most productive and highly cited academic researchers tend to teach their students about work that is closer to the academic frontier than less productive colleagues. And among elite musicians, life scientists, and mathematicians, mentors seem to exert significant influence on the character of their proteges work, even after we try to reduce the role of self-sorting. If you end up going to a musical conservatory in a year when a top composer is teaching, you’re more likely to compose music similar to them, and more likely to end up a highly influential composer decades and centuries later. If you end up as a postdoc in the lab of a life scientist who happens to have a patent, you’re more likely to patent your own research in the future. And if you go to a top math department, but can’t work with top faculty because of Nazi interference, then that imposes a strongly negative impact on your long-run career. Dropping down to only slightly less elite circles, if undergraduates to Oxford and Cambridge in the 1600s and 1700s happen to attend colleges with more faculty interested in science, their students are more likely to share their interests. It’s not too much of a stretch from there to assume that if you work with a scientist who does research that merits a Nobel prize, you’re more likely to learn things that help you do research that merits a prize of your own.
A surprising result from these papers is that when assessing the correlation between teacher and student outcomes, these papers generally find it doesn’t seem to matter if students and teachers are matched together (or forced apart) by some quasi-random reason or not. That’s surprising. To see why, imagine that there are two students, one who is good and one who is great, and two teachers, one of whom is good and one of whom is great. Suppose the scientific impact of each student’s work depends on who teaches them as follows:
Good Teacher
Great Teacher
Good Student
1
2
Great Student
2
3
In this example, if the good student is taught by the good teacher, the student has a scientific impact of 1. If the great student is taught by the good teacher, their impact is 2. And if either one of them gets access to the great teacher, their impact is increased by 1.
Let’s initially suppose students and teachers sort themselves by type, so all the good students work with good teachers, and all the great students with the great teachers. In that setting if compare the scientific impact of the students of great teachers (3, since all their students are great) to the scientific impact of the students of good teachers (1, since all their students are good), then we might (incorrectly) conclude that having a great teacher increases your impact by 2. That would be twice as high as reality, since we are conflating a selection and influence effect.
Now let us suppose we have reason to believe students were randomly allocated to teachers. Now when we compare the scientific impacts of the students of good and great teachers, we will get a difference of 1, since each teacher will have the same proportion of good and great students, at least in expectation.
The papers we consider basically perform both of these exercises, conducting a naive comparison of the students of teachers with different characteristics to more careful comparisons that try to zero in on scenarios when students were more randomly allocated to teachers. But, in contrast to the above example, the papers tend to find the difference between students and teachers is the same in either setting! Does that mean learning is everything and sorting is something we should ignore?
Not so fast, in my view. I think there are two things going on here. First, for some of these papers, it may well be that students are close to randomly getting matched to teachers, at least with respect to the things being measured. Azoulay, Liu, and Stuart argue that, at least with respect to patent behavior, post-docs do randomly match to advisors. And for undergraduates at Oxford and Cambridge, it seems plausible to me that in the 1600s and 1700s, students mostly selected their colleges based on factors that had little to do with the scientific interests of the faculty. But we should be careful extrapolating too far; maybe students match with teachers randomly when it comes to teacher interests or patenting behavior, but not when it comes to the prestige or scientific impact of their teacher.
The other issues is that, for at least some of these papers, it’s as if all the students are great ones. For Borowiecki’s study of composers, all the students are those who ended up successful enough to make it into his dataset. Azoulay, Liu, and Stuart are looking at postdocs who are Pew and Searle scholars. Other studies are not quite so rarified, but we’re still looking at PhD math students or undergraduates to Oxford and Cambridge (in an era when higher education was not so common!). It might be that once you get beyond a certain level, further sorting isn’t really feasible. Or if it is, perhaps the effect of sorting is too small to show up in our noisy studies. Even so, you would want to be cautious extrapolating that to a population not composed of very high achieving students.
To close, what we’ve reviewed here suggests to me that among the groups of students who are talented enough to get positions where they work with Nobel laureates, the quality of mentor probably matters a lot (at least if you care about making outsized scientific impact).
Perhaps that’s not surprising. After all, doing high impact innovation often means understanding something that few others do; otherwise, someone else would have invented or discovered the thing before you. How does a great student learn something that few others know? One way is to conduct original research and push into under-explored parts of the knowledge landscape. No teacher needed! But as I’ve written elsewhere, that can take a long time. And since research on a topic often requires a critical mass of scholars to assess and validate claims, going into lonely parts of the knowledge landscape might be tough going. Plus it’s risky; there might be nothing there.
So how else can a great student learn things few others know? Highly innovative people, willing to take on apprentices, but who have not yet settled down to write textbooks seem like a good bet. If we extrapolate a bit from Biasi and Ma’s paper on syllabi, they might be the most likely to teach about brand new research ideas, the kind that are not yet widely understood. Or they might have acquired tacit knowledge that is difficult to codify and perhaps difficult to master. Perhaps they taught themselves that tacit knowledge. Or perhaps it was taught to them by their own teachers.
New articles and updates to existing articles are typically added to this site every three weeks. To learn what’s new on New Things Under the Sun, subscribe to the newsletter.
Borowiecki, Karol Jan. 2022. Good Reverberations? Teacher Influence in Music Composition since 1450. Journal of Political Economy 130(4): 991-1090. https://doi.org/10.1086/718370
Koschnick, Julius. 2023. Teacher-directed scientific change: The case of the English Scientific Revolution. PhD job market paper.
Azoulay, Pierre, Christopher C. Liu, and Toby E. Stuart. 2017. Social Influence Given (Partially) Deliberate Matching: Career Imprints in the Creation of Academic Entrepreneurs. American Journal of Sociology 122(4): 1223-1271. https://doi.org/10.1086/689890
Biasi, Barbara, and Song Ma. 2023. The Education-Innovation Gap. NBER Working Paper 29853. https://doi.org/10.3386/w29853
Waldinger, Fabian. 2010. Quality Matters: The Expulsion of Professors and the Consequences for PhD Student Outcomes in Nazi Germany. Journal of Political Economy 118(4): 787-831. https://doi.org/10.1086/655976
It seems hard to separate the effects of mentorship on student ability from the reputation/networking effects of having a famous mentor. I realize this accounts for mentors who didn’t win prizes until a decade later, but I’m not sure that’s enough to isolate the effect.
I’d be interested in seeing cases where unsuccessful mentors, or those with early death or retirement, still mentored a disproportionate share of prizewinning students. Can we identify excellent teachers who never became excellent researchers? Or the inverse, prizewinning researchers whose students did surprisingly poorly? A comparison of those two types of researchers might yield more precise information about the effects of mentorship.
The thesis here is plausible, I’d just like to see more confirmation of how exactly the details work.