From Chance to Certainty: Solving Science’s Reproducibility Crisis

by

Jamie Moffa, host of In Plain English; Konrad Kording, Kaela Singleton and Arjun Raj

One of the pillars of science is the idea that experimental results can be replicated. If they cannot be reproduced, what if the findings of an experiment were due just to chance? Over the last two decades, a growing chorus of scientists has raised concerns about the “reproducibility crisis,” in which many published research findings can’t be independently validated, calling into question the rigor of contemporary science.

Two years ago, a group led by Konrad Kording, a Penn Integrates Knowledge Professor in Bioengineering and Neuroscience, founded the Community For Rigor (C4R) to build a grassroots movement to improve the rigor of scientific research.

Supported by a grant from the National Institutes of Health (NIH) and partners at Harvard, Duquesne, Smith College and Johns Hopkins, among other institutions, C4R creates educational materials that teach the principles of rigorous research, from data collection to pre-registration of results. “Everyone has done wrong things,” says Kording. “We’re all making these mistakes and we need to be able to talk about it.”

Last month, Kording appeared on In Plain English, a podcast devoted to making science more accessible, alongside Kaela Singleton, the co-founder and President of Black in Neuro; Arjun Raj, Professor in Bioengineering in Penn Engineering and in Genetics in Penn Medicine; and Jamie Moffa, a physician-scientist in training at Washington University in St. Louis, to discuss scientific rigor, including actionable strategies for students and faculty alike.

The conversation touched on everything from successfully managing the reams of data produced by experiments to the power of community to drive cultural change, as well as the difficulty of filtering useful feedback from the noise of social media. “I hope we can get to a point where people feel comfortable sharing what’s working and what’s not working,” says Raj.

Listen to the episode here.

Why is Machine Learning Trending in Medical Research but not in Our Doctor’s Offices?

by Melissa Pappas

Illustration of a robot in a white room with medical equipment.Machine learning (ML) programs computers to learn the way we do – through the continual assessment of data and identification of patterns based on past outcomes. ML can quickly pick out trends in big datasets, operate with little to no human interaction and improve its predictions over time. Due to these abilities, it is rapidly finding its way into medical research.

People with breast cancer may soon be diagnosed through ML faster than through a biopsy. Those suffering from depression might be able to predict mood changes through smart phone recordings of daily activities such as the time they wake up and amount of time they spend exercising. ML may also help paralyzed people regain autonomy using prosthetics controlled by patterns identified in brain scan data. ML research promises these and many other possibilities to help people lead healthier lives.

But while the number of ML studies grow, the actual use of it in doctors’ offices has not expanded much past simple functions such as converting voice to text for notetaking.

The limitations lie in medical research’s small sample sizes and unique datasets. This small data makes it hard for machines to identify meaningful patterns. The more data, the more accuracy in ML diagnoses and predictions. For many diagnostic uses, massive numbers of subjects in the thousands would be needed, but most studies use smaller numbers in the dozens of subjects.

But there are ways to find significant results from small datasets if you know how to manipulate the numbers. Running statistical tests over and over again with different subsets of your data can indicate significance in a dataset that in reality may be just random outliers.

This tactic, known as P-hacking or feature hacking in ML, leads to the creation of predictive models that are too limited to be useful in the real world. What looks good on paper doesn’t translate to a doctor’s ability to diagnose or treat us.

These statistical mistakes, oftentimes done unknowingly, can lead to dangerous conclusions.

To help scientists avoid these mistakes and push ML applications forward, Konrad Kording, Nathan Francis Mossell University Professor with appointments in the Departments of Bioengineering and Computer and Information Science in Penn Engineering and the Department of Neuroscience at Penn’s Perelman School of Medicine, is leading an aspect of a large, NIH-funded program known as CENTER – Creating an Educational Nexus for Training in Experimental Rigor. Kording will lead Penn’s cohort by creating the Community for Rigor which will provide open-access resources on conducting sound science. Members of this inclusive scientific community will be able to engage with ML simulations and discussion-based courses.

“The reason for the lack of ML in real-world scenarios is due to statistical misuse rather than the limitations of the tool itself,” says Kording. “If a study publishes a claim that seems too good to be true, it usually is, and many times we can track that back to their use of statistics.”

Such studies that make their way into peer-reviewed journals contribute to misinformation and mistrust in science and are more common than one might expect.

Read the full story in Penn Engineering Today.