Machine learning (ML) programs computers to learn the way we do – through the continual assessment of data and identification of patterns based on past outcomes. ML can quickly pick out trends in big datasets, operate with little to no human interaction and improve its predictions over time. Due to these abilities, it is rapidly finding its way into medical research.
People with breast cancer may soon be diagnosed through ML faster than through a biopsy. Those suffering from depression might be able to predict mood changes through smart phone recordings of daily activities such as the time they wake up and amount of time they spend exercising. ML may also help paralyzed people regain autonomy using prosthetics controlled by patterns identified in brain scan data. ML research promises these and many other possibilities to help people lead healthier lives.
But while the number of ML studies grow, the actual use of it in doctors’ offices has not expanded much past simple functions such as converting voice to text for notetaking.
The limitations lie in medical research’s small sample sizes and unique datasets. This small data makes it hard for machines to identify meaningful patterns. The more data, the more accuracy in ML diagnoses and predictions. For many diagnostic uses, massive numbers of subjects in the thousands would be needed, but most studies use smaller numbers in the dozens of subjects.
But there are ways to find significant results from small datasets if you know how to manipulate the numbers. Running statistical tests over and over again with different subsets of your data can indicate significance in a dataset that in reality may be just random outliers.
This tactic, known as P-hacking or feature hacking in ML, leads to the creation of predictive models that are too limited to be useful in the real world. What looks good on paper doesn’t translate to a doctor’s ability to diagnose or treat us.
These statistical mistakes, oftentimes done unknowingly, can lead to dangerous conclusions.
To help scientists avoid these mistakes and push ML applications forward, Konrad Kording, Nathan Francis Mossell University Professor with appointments in the Departments of Bioengineering and Computer and Information Science in Penn Engineering and the Department of Neuroscience at Penn’s Perelman School of Medicine, is leading an aspect of a large, NIH-funded program known as CENTER – Creating an Educational Nexus for Training in Experimental Rigor. Kording will lead Penn’s cohort by creating the Community for Rigor which will provide open-access resources on conducting sound science. Members of this inclusive scientific community will be able to engage with ML simulations and discussion-based courses.
“The reason for the lack of ML in real-world scenarios is due to statistical misuse rather than the limitations of the tool itself,” says Kording. “If a study publishes a claim that seems too good to be true, it usually is, and many times we can track that back to their use of statistics.”
Such studies that make their way into peer-reviewed journals contribute to misinformation and mistrust in science and are more common than one might expect.
His research aims to combat global health threats such as COVID-19 and Alzheimer’s disease by better understanding how proteins function and malfunction, especially through new computational and experimental methods that map protein structures. This understanding of protein dynamics can lead to effective new treatments for even the most seemingly resistant diseases.
“Delivering the right treatment to the right person at the right time is vital to sustaining—and saving—lives,” Magill said. “Greg Bowman’s novel work holds enormous promise and potential to advance new forms of personalized medicine, an area of considerable strength for Penn. A gifted researcher and consummate collaborator, we are delighted to count him among our distinguished PIK University Professors.”
Bowman came to Penn from the Washington University School of Medicine’s Department of Biochemistry and Molecular Biophysics, where he served on the faculty since 2014. He previously completed a three-year postdoctoral fellowship at the University of California, Berkeley.
Bowman’s research utilizes high-performance supercomputers for simulations that can better explain how mutations and disease change a protein’s functions. These simulations are enabled in part through the innovative Folding@home project, which Bowman directs. Folding@home empowers anyone with a computer to run simulations alongside a consortium of universities, with more than 200,000 participants worldwide.
His research has been supported by the National Science Foundation, National Institutes of Health, National Institute on Aging, and Packard Foundation, among others, and he has received a CAREER Award from the NSF, Career Award at the Scientific Interface from the Burroughs Wellcome Fund, and Thomas Kuhn Paradigm Shift Award from the American Chemical Society. He received a Ph.D. in biophysics from Stanford University and a B.S. (summa cum laude) in computer science, with a minor in biomedical engineering, from Cornell University.
“Greg Bowman’s highly innovative work,” Winkelstein said, “exemplifies the power of our interdisciplinary mission at Penn. He brings together supercomputers, biophysics, and biochemistry to make a vital impact on public health. This brilliant fusion of methods—in the service of improving people’s lives around the world—will be a tremendous model for the research of our faculty, students, and postdocs in the years ahead.”
The Penn Integrates Knowledge program is a University-wide initiative to recruit exceptional faculty members whose research and teaching exemplify the integration of knowledge across disciplines and who are appointed in at least two schools at Penn.
The Louis Heyman University Professorship is a gift of Stephen J. Heyman, a 1959 graduate of the Wharton School, and his wife, Barbara Heyman, in honor of Stephen Heyman’s uncle. Stephen Heyman is a University Emeritus Trustee and member of the School of Nursing Board of Advisors. He is Managing Partner at Nadel and Gussman LLC in Tulsa, Oklahoma.
The National Science Foundation’s Research Traineeship Program aims to support graduate students, educate the STEM leaders of tomorrow and strengthen the national research infrastructure. The program’s latest series of grants are going toward university programs focused on artificial intelligence and quantum information science and engineering – two areas of high priority in academia, industry and government.
Chinedum Osuji, Eduardo D. Glandt Presidential Professor and Chair of the Department of Chemical and Biomolecular Engineering (CBE), has received one of these grants to apply data science and machine learning to the field of soft materials. The grant will provide five years of support and a total of $3 million for a new Penn project on Data Driven Soft Materials Research.
Osuji will work with co-PIs Russell Composto, Professor and Howell Family Faculty Fellow in Materials Science and Engineering, Bioengineering, and in CBE, Zahra Fakhraai, Associate Professor of Chemistry in Penn’s School of Arts & Sciences (SAS) with a secondary appointment in CBE, Paris Perdikaris, Assistant Professor in Mechanical Engineering and Applied Mechanics, and Andrea Liu, Hepburn Professor of Physics and Astronomy in SAS, all of whom will help run the program and provide the connections between the multiple fields of study where its students will train.
These and other affiliated faculty members will work closely with co-PI Kristin Field, who will serve as Program Coordinator and Director of Education.
Congratulations to Kevin B. Johnson, David L. Cohen University Professor, on his recent appointed as a Senior Fellow in the Leonard Davis Institute of Health Economics at the University of Pennsylvania (Penn LDI). Johnson, an expert in health care innovation and health information technology, holds appointments in Biostatistics, Epidemiology and Informatics in the Perelman School of Medicine and Computer and Information Science in the School of Engineering and Applied Science. He also holds secondary appointments in Bioengineering, Pediatrics, and in the Annenberg School of Communication and is Vice President for Applied Informatics in the University of Pennsylvania Health System.
Penn LDI is Penn’s hub for health care delivery, health policy, and population health, we connect and amplify experts and thought-leaders and train the next generation of researchers. Johnson joins over 500 Fellows from across all of Penn’s schools, the University of Pennsylvania Health System, and the Children’s Hospital of Philadelphia. Johnson brings expertise in Health Care Innovation, Health Information Technology, Medication Adherence, and Social Media to his new fellowship and has extensively studied healthcare informatics with the goal of improving patient care.
Kevin Johnson is used to forging his own path in the fields of healthcare and computer science.
If you ask him to locate his niche within these fields, Johnson, David L. Cohen and Penn Integrates Knowledge (PIK) Professor with appointments in Penn Engineering and the Perelman School of Medicine, would say “informatics.” But that doesn’t tell the whole story of the board-certified pediatrician, who has dedicated his career to innovations in how patients’ information is created, documented and shared, all with the goal of improving the quality of healthcare they receive.
Informatics, the study of the structure and behavior of interactions between natural and computational systems, is an umbrella term. Within it, there’s bioinformatics, which applies informatics to biology, and biomedical informatics, which looks at those interactions in the context of healthcare systems. Finally, there is clinical informatics, which further focuses on the settings where healthcare is delivered, and where Johnson squarely places himself.
“But you can just call it ‘informatics,’” says Johnson. “It will be easier.”
He mainly studies how computational systems can improve ambulatory care — sometimes known as outpatient care, or the kind of care hospitals give to patients without admitting them — in real time. If you’ve ever heard your doctor complain about the amount of time it takes them to input the information they get from you during your visit, or wondered why they need to capture this information during the visit in the first place, these are some of the questions Johnson is investigating.
“We’re taking care of patients but we’re getting frustrated by things that we thought these new computers should be able to fix,” says Johnson.” I think there’s a very compelling case for using engineering principles to reimagine electronic health records.”
Kevin Johnson is the David L. Cohen University of Pennsylvania Professor in the Departments of Biostatistics, Epidemiology and Informatics and Computer and Information Science. As a Penn Integrates Knowlegde (PIK) University Professor, Johnson also holds appointments in the Departments of Bioengineering and Pediatrics, as well as in the Annenberg School of Communication. Johnson is the Vice President for Applied Informatics for the University of Pennsylvania Health System and has been elected to the American College of Medical Informatics (2004), the Academic Pediatric Society (2010), the National Academy of Medicine (Institute of Medicine) (2010), and the International Association of Health Science Informatics (2021).
William Danon and Luka Yancopoulos, winners of the 2022 President’s Innovation Prize, will offer a software solution to make the health care supply chain more efficient.
by Brandon Baker
William Danon and Luka Yancopoulos are best friends. They’re also business partners.
The duo, who received this year’s President’s Innovation Prize (PIP) for Grapevine, met during sophomore year, connected through Yancopoulos’ roommate. As time went on, they did everything together: cooked meals, played basketball, and read and discussed fantasy novels.
“We spent a lot of time together,” Danon says.
It was only natural, then, that when the time came to start an actual venture, they’d do it together.
“They’re like brothers, in a very good way,” says mentor David Meaney of the School of Engineering and Applied Science, who describes their working dynamic as “complementary.” “I think that will serve them well. Most of what we do in faculty is collaborative, and I see elements of that in their partnership. I give them credit for stepping out and doing something unusual and keeping at it.”
How Grapevine came to be
Grapevine is a software solution and professional networking platform that connects small-to-medium-size players in the health care supply chain. It’s a sort of two-pronged solution: It helps institutions like hospital systems connect disjointed operations like procurement and inventory management internally, but also serves as a glue between these institutions and purveyors of medical equipment.
“William and Luka are impact-driven entrepreneurs whose collaborative synergies will take them far,” says Penn Interim President Wendell Pritchett. “The software provided by Grapevine is poised to reinvent how the health care industry buys and sells medical supplies and services and, truly, could not come at a timelier moment.”
The company is the evolution of a project they began at the onset of the COVID-19 pandemic, called Pandemic Relief Supply, which delivered $20 million of health care supplies to frontline workers.
“My mom was a nurse practitioner at New York Presbyterian Hospital, the largest hospital in the United States, and she was coming home with horror stories,” recalls Yancopoulos. “In surgery or the ER, a surgeon had to put on a garbage bag because they didn’t have a gown. And they gave her one mask to use for the rest of the month, and I’m seeing on the news, ‘Don’t wear a mask for more than three days.’”
This is where Yancopoulos and Danon first developed an interest in the health care supply chain. Using a database Penn allows students access to that maps the import of any good in the country, they did keyword matching to identify instreams of different goods and handed off findings to New York Presbyterian procurement staff. When McKesson, the largest provider of health care products and services in the U.S., took notice of what they were doing and reached out, they realized they were onto something. In response to their success, they started a company called Pandemic Relief Supply to distribute reliable medical supplies, including items like medical-grade masks and gloves, to frontline workers in the healthcare space.
As time passed, that project evolved into something larger: Grapevine.
In short, Grapevine’s software creates a professional networking platform to resolve miscommunications between suppliers and buyers, as well as adds a layer of transparency between interactants. Suppliers on the platform display real-time data about their inventory and shipping process, with timestamps; this prohibits companies from cherry-picking data or making false claims and creates a more health-care-supply-specific space for companies to interact than, say, LinkedIn.
“Primarily, the first step is we want people to use it internally, and streamline operations, and then through that centralized operational data, you can push that externally and that’s where [Grapevine] becomes a connector,” explains Danon. “Because when you’re choosing to connect with someone, the reason you can do so way more efficiently or quickly, is that data is actual operational data.”
To accomplish this level of transparency, the beginnings of Grapevine involved lots of legwork. Last year, the duo moved to Los Angeles to take stock of what suppliers existed where, and how reliable they were. They realized that many suppliers existed around Los Angeles because of port access; many medical supplies are imported from Asia. Their time in LA made the problem feel even more tangible, they agree.
“We were able to see people were doing outdated processes—manual processes—because there’s no other option,” Danon says. “So, we said, ‘Let’s get out there and do some work to be digital and technologically innovative.”
N.B.: Yancopolous’s senior design team created “Harvest” for their capstone project in Bioengineering, building on the existing Grapevine software package. Read Harvest’s abstract and view their final presentation on the BE Labs website.
Kevin B. Johnson, M.D., M.S., was featured in Cincinnati Children’s Hospital’s “Envisioning Our Future for Children” speaker series, discussing “the evolution of the EHR and its future directions.” An electronic health record, or EHR, is a digital record of a patient’s chart, recording health information and data, coordinating orders, tracking results, and providing patient support. Johnson “predicts a new wave of transformation in digital health technologies that could make rapid progress” in several areas of medicine, including reducing cost and improving patience outcomes. Johnson is Vice President for Applied Informatics at the University of Pennsylvania Health System and the David L. Cohen University Professor with appointments in Biostatistics, Epidemiology and Informatics and Computer and Information Science and secondary appointments in the Annenberg School for Communication, Pediatrics, and Bioengineering.
Konrad Kording, Nathan Francis Mossell University Professor in Bioengineering, Neuroscience, and Computer and Information Sciences, was appointed the Co-Director of the CIFAR Program in Learning in Machines & Brains. The appointment will start April 1, 2022.
CIFAR is a global research organization that convenes extraordinary minds to address the most important questions facing science and humanity. CIFAR was founded in 1982 and now includes over 400 interdisciplinary fellows and scholars, representing over 130 institutions and 22 countries. CIFAR supports research at all levels of development in areas ranging from Artificial Intelligence and child and brain development, to astrophysics and quantum computing. The program in Learning in Machines & Brains brings together international scientists to examine “how artificial neural networks could be inspired by the human brain, and developing the powerful technique of deep learning.” Scientists, industry experts, and policymakers in the program are working to understand the computational and mathematical principles behind learning, whether in brains or in machines, in order to understand human intelligence and improve the engineering of machine learning. As Co-Director, Kording will oversee the collective intellectual development of the LMB program which includes over 30 Fellows, Advisors, and Global Scholars. The program is also co-directed by Yoshua Benigo, the Canada CIFAR AI Chair and Professor in Computer Science and Operations Research at Université de Montréal.
Kording, a Penn Integrates Knowledge (PIK) Professor, was previously named an associate fellow of CIFAR in 2017. Kording’s groundbreaking interdisciplinary research uses data science to advance a broad range of topics that include understanding brain function, improving personalized medicine, collaborating with clinicians to diagnose diseases based on mobile phone data and even understanding the careers of professors. Across many areas of biomedical research, his group analyzes large datasets to test new models and thus get closer to an understanding of complex problems in bioengineering, neuroscience and beyond.
More data is being produced across diverse fields within science, engineering, and medicine than ever before, and our ability to collect, store, and manipulate it grows by the day. With scientists of all stripes reaping the raw materials of the digital age, there is an increasing focus on developing better strategies and techniques for refining this data into knowledge, and that knowledge into action.
Enter data science, where researchers try to sift through and combine this information to understand relevant phenomena, build or augment models, and make predictions.
One powerful technique in data science’s armamentarium is machine learning, a type of artificial intelligence that enables computers to automatically generate insights from data without being explicitly programmed as to which correlations they should attempt to draw.
Advances in computational power, storage, and sharing have enabled machine learning to be more easily and widely applied, but new tools for collecting reams of data from massive, messy, and complex systems—from electron microscopes to smart watches—are what have allowed it to turn entire fields on their heads.
“This is where data science comes in,” says Susan Davidson, Weiss Professor in Computer and Information Science (CIS) at Penn’s School of Engineering and Applied Science. “In contrast to fields where we have well-defined models, like in physics, where we have Newton’s laws and the theory of relativity, the goal of data science is to make predictions where we don’t have good models: a data-first approach using machine learning rather than using simulation.”
Penn Engineering’s formal data science efforts include the establishment of the Warren Center for Network & Data Sciences, which brings together researchers from across Penn with the goal of fostering research and innovation in interconnected social, economic and technological systems. Other research communities, including Penn Research in Machine Learning and the student-run Penn Data Science Group, bridge the gap between schools, as well as between industry and academia. Programmatic opportunities for Penn students include a Data Science minor for undergraduates, and a Master of Science in Engineering in Data Science, which is directed by Davidson and jointly administered by CIS and Electrical and Systems Engineering.
Penn academic programs and researchers on the leading edge of the data science field will soon have a new place to call home: Amy Gutmann Hall. The 116,000-square-foot, six-floor building, located on the northeast corner of 34th and Chestnut Streets near Lauder College House, will centralize resources for researchers and scholars across Penn’s 12 schools and numerous academic centers while making the tools of data analysis more accessible to the entire Penn community.
Faculty from all six departments in Penn Engineering are at the forefront of developing innovative data science solutions, primarily relying on machine learning, to tackle a wide range of challenges. Researchers show how they use data science in their work to answer fundamental questions in topics as diverse as genetics, “information pollution,” medical imaging, nanoscale microscopy, materials design, and the spread of infectious diseases.
Bioengineering: Unraveling the 3D genomic code
Scattered throughout the genomes of healthy people are tens of thousands of repetitive DNA sequences called short tandem repeats (STRs). But the unstable expansion of these repetitions is at the root of dozens of inherited disorders, including Fragile X syndrome, Huntington’s disease, and ALS. Why these STRs are susceptible to this disease-causing expansion, whereas most remain relatively stable, remains a major conundrum.
Complicating this effort is the fact that disease-associated STR tracts exhibit tremendous diversity in sequence, length, and localization in the genome. Moreover, that localization has a three-dimensional element because of how the genome is folded within the nucleus. Mammalian genomes are organized into a hierarchy of structures called topologically associated domains (TADs). Each one spans millions of nucleotides and contains smaller subTADs, which are separated by linker regions called boundaries.
“The genetic code is made up of three billion base pairs. Stretched out end to end, it is 6 feet 5 inches long, and must be subsequently folded into a nucleus that is roughly the size of a head of a pin,” says Jennifer Phillips-Cremins, associate professor and dean’s faculty fellow in Bioengineering. “Genome folding is an exciting problem for engineers to study because it is a problem of big data. We not only need to look for patterns along the axis of three billion base pairs of letters, but also along the axis of how the letters are folded into higher-order structures.”
To address this challenge, Phillips-Cremins and her team recently developed a new mathematical approach called 3DNetMod to accurately detect these chromatin domains in 3D maps of the genome in collaboration with the lab of Dani Bassett, J. Peter Skirkanich Professor in Bioengineering.
“In our group, we use an integrated, interdisciplinary approach relying on cutting-edge computational and molecular technologies to uncover biologically meaningful patterns in large data sets,” Phillips-Cremins says. “Our approach has enabled us to find patterns in data that classic biology training might overlook.”
In a recent study, Phillips-Cremins and her team used 3DNetMod to identify tens of thousands of subTADs in human brain tissue. They found that nearly all disease-associated STRs are located at boundaries demarcating 3D chromatin domains. Additional analyses of cells and brain tissue from patients with Fragile X syndrome revealed severe boundary disruption at a specific disease-associated STR.
“To our knowledge, these findings represent the first report of a possible link between STR instability and the mammalian genome’s 3D folding patterns,” Phillips-Cremins says. “The knowledge gained may shed new light into how genome structure governs function across development and during the onset and progression of disease. Ultimately, this information could be used to create molecular tools to engineer the 3D genome to control repeat instability.”
From smartphones and fitness trackers to social media posts and COVID-19 cases, the past few years have seen an explosion in the amount and types of data that are generated daily. To help make sense of these large, complex datasets, the field of data science has grown, providing methodologies, tools, and perspectives across a wide range of academic disciplines.
As part of its $750 million investment in science, engineering, and medicine, the University has committed to supporting the future needs of this field. To this end, the Innovation in Data Engineering and Science (IDEAS) initiative will help Penn become a leader in developing data-driven approaches that can transform scientific discovery, engineering research, and technological innovation.
“The IDEAS initiative is game-changing for our University,” says President Amy Gutmann. “This new investment allows us to boost our interdisciplinary efforts across campus, recruit phenomenal additional team members, and generate an even more sound foundation for discovery, experimentation, and design. This initiative is a clear statement that Penn is committed to taking data science head-on.”
“One of the unique things about data science and data engineering is that it’s a very horizontal technology, one that is going to be impacting every department on campus,” says George Pappas, Electrical and Systems Engineering Department chair. “When you have a horizontal technology in a competitive area, we have to figure out specific areas where Penn can become a worldwide leader.”
To do this, IDEAS aims to recruit new faculty across three research areas: artificial intelligence (AI) to transform scientific discovery, trustworthy AI for autonomous systems, and understanding connections between the human brain and AI.
In the area of neuroscience and how the human brain is similar to AI and machine learning approaches, research from PIK Professor Konrad Kording and Dani Bassett’sComplex Systems lab exemplifies the types of cross-disciplinary efforts that are essential for addressing complex questions. By recruiting additional faculty in this area, IDEAS will help Penn make strides in bio-inspired computing and in future life-changing discoveries that could address cognitive disorders and nervous system diseases.