Why is Machine Learning Trending in Medical Research but not in Our Doctor’s Offices?

by Melissa Pappas

Illustration of a robot in a white room with medical equipment.Machine learning (ML) programs computers to learn the way we do – through the continual assessment of data and identification of patterns based on past outcomes. ML can quickly pick out trends in big datasets, operate with little to no human interaction and improve its predictions over time. Due to these abilities, it is rapidly finding its way into medical research.

People with breast cancer may soon be diagnosed through ML faster than through a biopsy. Those suffering from depression might be able to predict mood changes through smart phone recordings of daily activities such as the time they wake up and amount of time they spend exercising. ML may also help paralyzed people regain autonomy using prosthetics controlled by patterns identified in brain scan data. ML research promises these and many other possibilities to help people lead healthier lives.

But while the number of ML studies grow, the actual use of it in doctors’ offices has not expanded much past simple functions such as converting voice to text for notetaking.

The limitations lie in medical research’s small sample sizes and unique datasets. This small data makes it hard for machines to identify meaningful patterns. The more data, the more accuracy in ML diagnoses and predictions. For many diagnostic uses, massive numbers of subjects in the thousands would be needed, but most studies use smaller numbers in the dozens of subjects.

But there are ways to find significant results from small datasets if you know how to manipulate the numbers. Running statistical tests over and over again with different subsets of your data can indicate significance in a dataset that in reality may be just random outliers.

This tactic, known as P-hacking or feature hacking in ML, leads to the creation of predictive models that are too limited to be useful in the real world. What looks good on paper doesn’t translate to a doctor’s ability to diagnose or treat us.

These statistical mistakes, oftentimes done unknowingly, can lead to dangerous conclusions.

To help scientists avoid these mistakes and push ML applications forward, Konrad Kording, Nathan Francis Mossell University Professor with appointments in the Departments of Bioengineering and Computer and Information Science in Penn Engineering and the Department of Neuroscience at Penn’s Perelman School of Medicine, is leading an aspect of a large, NIH-funded program known as CENTER – Creating an Educational Nexus for Training in Experimental Rigor. Kording will lead Penn’s cohort by creating the Community for Rigor which will provide open-access resources on conducting sound science. Members of this inclusive scientific community will be able to engage with ML simulations and discussion-based courses.

“The reason for the lack of ML in real-world scenarios is due to statistical misuse rather than the limitations of the tool itself,” says Kording. “If a study publishes a claim that seems too good to be true, it usually is, and many times we can track that back to their use of statistics.”

Such studies that make their way into peer-reviewed journals contribute to misinformation and mistrust in science and are more common than one might expect.

Read the full story in Penn Engineering Today.

Konrad Kording’s CENTER is Part of a New NIH Education Initiative on Scientific Rigor

by Melissa Pappas

Konrad Kording (Photo by Eric Sucar)

In 2005, John Ioannidis published a bombshell paper titled “Why Most Published Research Findings Are False.” In it, Ioannidis argued that a lack of scientific rigor in biomedical research — such as poor study design, small sample sizes and improper assessment of the significance of data— meant that a large percentage of experiments would not return the same results if they were conducted again.

Since then, researchers’ awareness of this “replication crisis” has grown, especially in fields that directly impact the health and wellbeing of people, where lapses in rigor can have life-or-death consequences. Despite this attention and motivation, however, little progress has been made in addressing the roots of the problem. Formal training in rigorous research practices remains rare; while mentors advise their students on how to properly construct and conduct experiments to produce the most reliable evidence, few educational resources exist to support them.

To address this discrepancy, the National Institute of Neurological Disorders and Stroke (NINDS), part of the National Institutes of Health (NIH), has launched the Initiative to Improve Education in the Principles of Rigorous Research.

Konrad Kording, a Penn Integrates Knowledge Professor with appointments in the Departments of Bioengineering and Computer and Information Science in Penn Engineering and the Department of Neuroscience in Penn’s Perelman School of Medicine, has been awarded one of the initiative’s first five grants.

“The replication crisis is real,” says Kording. “I’ve tried to replicate the research of others and failed. I’ve reanalyzed my own data and found major mistakes that needed to be corrected. I was never properly taught how to do rigorous science, and I want to improve that for the next generation.”

Read the full story in Penn Engineering Today.