80,000 Hours Podcast

#147 – Spencer Greenberg on stopping valueless papers from getting into top journals

80,000 Hours Podcast

Can you trust the things you read in published scientific research? Not really. About 40% of experiments in top social science journals don't get the same result if the experiments are repeated.

Two key reasons are 'p-hacking' and 'publication bias'. P-hacking is when researchers run a lot of slightly different statistical tests until they find a way to make findings appear statistically significant when they're actually not — a problem first discussed over 50 years ago. And because journals are more likely to publish positive than negative results, you might be reading about the one time an experiment worked, while the 10 times was run and got a 'null result' never saw the light of day. The resulting phenomenon of publication bias is one we've understood for 60 years.

Today's repeat guest, social scientist and entrepreneur Spencer Greenberg, has followed these issues closely for years.

Links to learn more, summary and full transcript.

He recently checked whether p-values, an indicator of how likely a result was to occur by pure chance, could tell us how likely an outcome would be to recur if an experiment were repeated. From his sample of 325 replications of psychology studies, the answer seemed to be yes. According to Spencer, "when the original study's p-value was less than 0.01 about 72% replicated — not bad. On the other hand, when the p-value is greater than 0.01, only about 48% replicated. A pretty big difference."

To do his bit to help get these numbers up, Spencer has launched an effort to repeat almost every social science experiment published in the journals Nature and Science, and see if they find the same results.

But while progress is being made on some fronts, Spencer thinks there are other serious problems with published research that aren't yet fully appreciated. One of these Spencer calls 'importance hacking': passing off obvious or unimportant results as surprising and meaningful.

Spencer suspects that importance hacking of this kind causes a similar amount of damage to the issues mentioned above, like p-hacking and publication bias, but is much less discussed. His replication project tries to identify importance hacking by comparing how a paper’s findings are described in the abstract to what the experiment actually showed. But the cat-and-mouse game between academics and journal reviewers is fierce, and it's far from easy to stop people exaggerating the importance of their work.

In this wide-ranging conversation, Rob and Spencer discuss the above as well as:

• When you should and shouldn't use intuition to make decisions.
• How to properly model why some people succeed more than others.
• The difference between “Soldier Altruists” and “Scout Altruists.”
• A paper that tested dozens of methods for forming the habit of going to the gym, why Spencer thinks it was presented in a very misleading way, and what it really found.
• Whether a 15-minute intervention could make people more likely to sustain a new habit two months later.
• The most common way for groups with good intentions to turn bad and cause harm.
• And Spencer's approach to a fulfilling life and doing good, which he calls “Valuism.”

Here are two flashcard decks that might make it easier to fully integrate the most important ideas they talk about:

• The first covers 18 core concepts from the episode
• The second includes 16 definitions of unusual terms.

Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app.

Producer: Keiran Harris
Audio mastering: Ben Cordell and Milo McGuire
Transcriptions: Katy Moore

Next Episodes