Here’s a striking fact: through 2022, one in two Nobel prize winners in physics, chemistry, and medicine also had a Nobel prize winner as their academic advisor.1
What accounts for this extraordinary transmission rate of scientific excellence?
There’s two main possibilities. Maybe great students and great teachers seek each other out and tend to work together. Or maybe great teachers give their students resources that make them better scientists: teaching, access to networks, support, etc.
Both are probably important to one degree or another. But in this post I’ll focus on an aspect of the second channel: what do we know about the ability of mentors who are very successful innovators to transmit research excellence to their students?
Teaching at the Frontier
To start, we might wonder if there are differences in what top researchers teach their students. Biasi and Ma (2023) is a study about what topics college students learn about, based on the text of 1.7mn academic syllabi. It’s a descriptive paper, rather than one trying to tease out causal pathways, but it does provide strong evidence that researchers on the academic frontier teach their students about things closer to the frontier.
To measure what students are learning about in their classes, Biasi and Ma develop a way to measure if a syllabus draws more extensively on new or older academic research. First, they start with a list of all words associated knowledge concepts or skills - they use words that have ever been assigned to a journal article as its keywords. Second, they assume an article and a syllabus are more similar if the syllabus and article abstract share more of these “knowledge” words, and if those words are less commonly used. Third, they calculate the average similarity between a syllabus and all the academic articles published in a recent three year window. Fourth, they repeat step three, but using an older three year window. The ratio of the similarity to the older articles and the newer articles is their measure of how much a syllabus stays close to the academic frontier. Essentially it’s a measure of the extent to which your syllabus shares more research lingo with recent research, compared to older research.2
So with all that work to build a big dataset of what teachers are doing in their classes, what do we learn? It turns out faculty vary a lot in what they teach, even within the same course. Consider the following figures, which plot this syllabus measure against measures of research productivity. On the horizontal axis we have either the number of publications or the number of citations the instructor received in the preceding five years. On the vertical axis, we have Biasi and Ma’s measure of how related a syllabus is to recent research - lower means a syllabus’ language is closer to the language of recent articles than older ones.
In the figure, we can see that more productive researchers and researchers who are highly cited tend to teach their students about topics that are most similar to recent academic research. Importantly, the above figure controls for year and course effects. It is not merely that active researchers are assigned to teach, for example, graduate students, and less active researchers are assigned to teach the 101 classes. Instead, within a particular course, the faculty who excel at research (as measured imperfectly by publications and citations) teach their students different things (or at least their syllabi say they do). And the effects aren’t small either. Biasi and Ma estimate that every one unit change in the syllabus measure corresponds to a change in 26% of the words on that syllabus.
If productive researchers teach their students about more cutting edge research, then we should be able to see some indications of that in the subsequent activities of their students. For example, do the students of productive researchers tend to invent stuff that relies on cutting edge research? Do they found more innovative companies? Or we could flip it around: do students who lose access to productive researchers as teachers become less likely to found companies based on cutting edge research?
Gofman and Jin (2024) suggests they do. Gofman and Jin look at what happens to the probability alumni of a university found an AI-related startup, after AI faculty leave the university to work in the private sector. To do that, they gather information on AI startups from Crunchbase, including the university where the founder graduated from, and scour the LinkedIn profiles of AI professors to identify those who found startups or begin to work for industry. For any particular year, they show the alumni of a university are less likely to form AI startups if a greater share of its AI faculty left for industry in the preceding six years, and the startups that do form attract less investment.
Gofman and Jin provide a variety of evidence to suggest the reason for this startup shortfall is because students don’t learn as much about frontier AI research when faculty leave for industry. For example:
The departure of AI faculty only affects the propensity to start AI-related startups; it has no impact on the rate at which other kinds of IT-related startups are formed.
The effect is stronger for masters/PhD students, who are more likely to learn about frontier research, than for undergraduates.
The effect is stronger for faculty with expertise in deep learning (which is the dominant paradigm in AI at this time).
The effect is larger for tenured faculty, who Gofman and Jin argue are more likely to supervise students.
The effect of departure is stronger for faculty leaving 4-6 years before alumni graduate (which almost certainly impacted the ability to study with departing faculty) and weaker for those leaving 1-3 years before graduation (which leaves some time to study with them).
That’s all consistent with the notion that students actually do learn valuable stuff about frontier technology from their teachers who are studying frontier technology! On the other hand, as we worried at the outset of this post, perhaps this is an issue where good students who want to learn about frontier research become less likely to enroll in departments that lose AI faculty. If a department that loses AI faculty loses talented students, and those students are more likely to found startups that would account for the same finding, even if the teachers themselves didn’t influence their students. But Gofman and Jin show there is no impact of AI faculty departing on the number of graduate students who win prestigious fellowships, which they interpret as a rough proxy for the research potential of enrolled students.
Waldinger (2010) has an even cleaner study of the impact of access to faculty on career outcomes. Waldinger is looking at the link between the academic productivity of German math departments prior to World War II and the academic outcomes of their students. For 33 different math departments, Waldinger computes the average number of citations per faculty at each department (restricting attention to articles published in top journals). Across 690 PhD students, those in math departments where the professors have more citations on average are more likely to publish later in life and their work receives more citations.
Is that because the best students go to the best departments, or because the best departments create the best future mathematicians? What makes this paper extraordinary is that after students had already sorted themselves into different departments, Nazi policy banning Jews and “politically undesirable” individuals from the civil service capriciously and all-but-randomly hurt or decimated different math departments.3 Of 33 departments, nine lost more than a quarter of their faculty - often very good faculty. Two lost half their faculty or more! Meanwhile, fifteen departments were untouched by the policy at all, since they had no Jewish faculty.
These dismissals hurt the ability of faculty to influence their students, while holding constant the sorting of students into departments of different perceived quality. As a first pass indication that this mattered, consider the following figure. Prior to the Nazi dismissals, students at departments who would go on to lose faculty had persistently higher probabilities of publishing cited academic work. After the dismissal, students in those departments lost this advantage.
But this figure understates things by lumping together the departments that lost small and large shares of their faculty. For a more complete analysis, Waldinger uses unexpected dismissals to develop a measure of research quality that depends only on these dismissals and not pre-existing departmental quality. So, if you went to a department that expectedly lost a lot of good faculty, you might have been a very good student (since you got in), but Waldinger shows the loss of these mentors had substantial negative effects on long-run student outcomes. Students in departments that lost talented faculty were less likely to publish their dissertations, less likely to become full professors, and received fewer citations over their life. Waldinger’s results are consistent with teacher influence, rather than sorting, as the main reasons faculty and student outcomes are correlated. His results indicate a one standard deviation increase in the research productivity of departmental faculty is associated with an extra 6.3 lifetime citations for students (against an average of 11.2 lifetime citations).
I Knew Them Before They Were Cool
To close, let’s return to prize-winning research, but broaden the category beyond the Nobels. Ma, Mukherjee, and Uzzi (2020) identify 37,157 mentors and proteges in biomedicine, chemistry, math, and physics, from between 1960 and 2017 using a database on dissertation theses (and some other sources). Some of these mentors went on to win prestigious awards for their research - others did not. If we think prizes accurately identify research excellence, we can then see if research excellence can be transmitted.
To handle the problem that great students might be attracted to prize-winning teachers, Ma and coauthors focus on mentorship occurring before the mentor won a prize for their research. For each prize-winning mentor, they try to identify “matched” mentors in the same discipline who were similarly successful at this stage in their career (i.e., before winning the prize), based on things like their publication record, the ranking of their university, and so on. As some evidence that future prize-winners attracted similar grad students as their matched peers at this point in their career, Ma and coauthors show the proteges fared similarly well early in their careers. They get jobs at similarly ranked universities, have similarly sized labs, and even score similarly on IQ tests. But, even if the broader profession could not detect it, Ma and coauthors argue that mentors who go on to be recognized for their research could still train their proteges to be excellent researchers.
And that seems to be the case. As indicated in the figure below, the students who studied under mentors who would eventually be recognized for their research tended to have significantly higher research impact themselves. They were more likely to win prizes of their own, be elected to the National Academy of Science, have higher H-indices, and be “superstars” in their discipline (which Ma and coauthors define as someone who wins prizes, belongs to the National Academy of Science, and is in the top quartile for citations). Note this holds even for proteges who graduated ten years or more before their mentor won a prize for their research (the dotted yellow bar at right).
Mentorship works?
We started this post by noting that half of the STEM Nobel laureates had a STEM Nobel laureate as their advisor. I speculated that this might be down to talented students seeking out talented teachers or teachers giving advantages to their students (either by teaching them or perhaps by giving them other advantages).
The papers covered in this post suggests a non-negligible portion of that is down to what the Nobel laureates pass on to their students. The most productive and highly cited academic researchers tend to teach their students about work that is closer to the academic frontier than less productive colleagues. When students lose access to that training because of faculty departures, the effects linger on throughout their career. On the other hand, students who have the luck to work with great researchers before they are recognized as such (eventually) go on to become highly successful researchers themselves at a much higher rate.
The post Students get interested in what their mentors are interested in provides some additional evidence that innovators influence the taste of their proteges. As discussed in that post, if you end up going to a musical conservatory in a year when a top composer is teaching, you’re more likely to compose music similar to them, and more likely to end up a highly influential composer decades and centuries later. If you end up as a postdoc in the lab of a life scientist who happens to have a patent, you’re more likely to patent your own research in the future. If undergraduates to Oxford and Cambridge in the 1600s and 1700s happen to attend colleges with more faculty interested in science, their students are more likely to share their interests. It’s not too much of a stretch from there to assume that if you work with a scientist who does research that merits a Nobel prize, you’re more likely to learn things that help you do research that merits a prize of your own.
So the quality of mentor probably matters a lot (at least if you care about making outsized scientific impact). Perhaps that’s not surprising. After all, doing high impact innovation often means understanding something that few others do; otherwise, someone else would have invented or discovered the thing before you. How does a great student learn something that few others know? One way is to conduct original research and push into under-explored parts of the knowledge landscape. No teacher needed! But as I’ve written elsewhere, that can take a long time. And since research on a topic often requires a critical mass of scholars to assess and validate claims, going into lonely parts of the knowledge landscape might be tough going. Plus it’s risky; there might be nothing there.
So how else can a great student learn things few others know? Highly innovative people, willing to take on apprentices, but who have not yet settled down to write textbooks seem like a good bet. If we extrapolate a bit from Biasi and Ma’s paper on syllabi, they might be the most likely to teach about brand new research ideas, the kind that are not yet widely understood. Or they might have acquired tacit knowledge that is difficult to codify and perhaps difficult to master. Perhaps they taught themselves that tacit knowledge. Or perhaps it was taught to them by their own teachers.
New articles and updates to existing articles are typically added to this site every three weeks. To learn what’s new on New Things Under the Sun, subscribe to the newsletter.
Biasi, Barbara, and Song Ma. 2023. The Education-Innovation Gap. NBER Working Paper 29853. https://doi.org/10.3386/w29853
Gofman, Michael, and Zhao Jin. 2024. Artificial Intelligence, Education, and Entrepreneurship. The Journal of Finance 89(1): 631-637. https://doi.org/10.1111/jofi.13302
Waldinger, Fabian. 2010. Quality Matters: The Expulsion of Professors and the Consequences for PhD Student Outcomes in Nazi Germany. Journal of Political Economy 118(4): 787-831. https://doi.org/10.1086/655976
Ma, Yifang, Satyam Mukherjee, and Brian Uzzi. 2020. Mentorship and protégé success in STEM fields. PNAS 117(25): 14077-14083. https://doi.org/10.1073/pnas.1915516117
It seems hard to separate the effects of mentorship on student ability from the reputation/networking effects of having a famous mentor. I realize this accounts for mentors who didn’t win prizes until a decade later, but I’m not sure that’s enough to isolate the effect.
I’d be interested in seeing cases where unsuccessful mentors, or those with early death or retirement, still mentored a disproportionate share of prizewinning students. Can we identify excellent teachers who never became excellent researchers? Or the inverse, prizewinning researchers whose students did surprisingly poorly? A comparison of those two types of researchers might yield more precise information about the effects of mentorship.
The thesis here is plausible, I’d just like to see more confirmation of how exactly the details work.