Refining Data into Knowledge, Turning Knowledge into Action

by Janelle Weaver

Heatmaps are used by researchers in the lab of Jennifer Phillips-Cremins to visualize which physically distant genes are brought into contact when the genome is in its folded state.

More data is being produced across diverse fields within science, engineering, and medicine than ever before, and our ability to collect, store, and manipulate it grows by the day. With scientists of all stripes reaping the raw materials of the digital age, there is an increasing focus on developing better strategies and techniques for refining this data into knowledge, and that knowledge into action.

Enter data science, where researchers try to sift through and combine this information to understand relevant phenomena, build or augment models, and make predictions.

One powerful technique in data science’s armamentarium is machine learning, a type of artificial intelligence that enables computers to automatically generate insights from data without being explicitly programmed as to which correlations they should attempt to draw.

Advances in computational power, storage, and sharing have enabled machine learning to be more easily and widely applied, but new tools for collecting reams of data from massive, messy, and complex systems—from electron microscopes to smart watches—are what have allowed it to turn entire fields on their heads.

“This is where data science comes in,” says Susan Davidson, Weiss Professor in Computer and Information Science (CIS) at Penn’s School of Engineering and Applied Science. “In contrast to fields where we have well-defined models, like in physics, where we have Newton’s laws and the theory of relativity, the goal of data science is to make predictions where we don’t have good models: a data-first approach using machine learning rather than using simulation.”

Penn Engineering’s formal data science efforts include the establishment of the Warren Center for Network & Data Sciences, which brings together researchers from across Penn with the goal of fostering research and innovation in interconnected social, economic and technological systems. Other research communities, including Penn Research in Machine Learning and the student-run Penn Data Science Group, bridge the gap between schools, as well as between industry and academia. Programmatic opportunities for Penn students include a Data Science minor for undergraduates, and a Master of Science in Engineering in Data Science, which is directed by Davidson and jointly administered by CIS and Electrical and Systems Engineering.

Penn academic programs and researchers on the leading edge of the data science field will soon have a new place to call home: Amy Gutmann Hall. The 116,000-square-foot, six-floor building, located on the northeast corner of 34th and Chestnut Streets near Lauder College House, will centralize resources for researchers and scholars across Penn’s 12 schools and numerous academic centers while making the tools of data analysis more accessible to the entire Penn community.

Faculty from all six departments in Penn Engineering are at the forefront of developing innovative data science solutions, primarily relying on machine learning, to tackle a wide range of challenges. Researchers show how they use data science in their work to answer fundamental questions in topics as diverse as genetics, “information pollution,” medical imaging, nanoscale microscopy, materials design, and the spread of infectious diseases.

Bioengineering: Unraveling the 3D genomic code

Scattered throughout the genomes of healthy people are tens of thousands of repetitive DNA sequences called short tandem repeats (STRs). But the unstable expansion of these repetitions is at the root of dozens of inherited disorders, including Fragile X syndrome, Huntington’s disease, and ALS. Why these STRs are susceptible to this disease-causing expansion, whereas most remain relatively stable, remains a major conundrum.

Complicating this effort is the fact that disease-associated STR tracts exhibit tremendous diversity in sequence, length, and localization in the genome. Moreover, that localization has a three-dimensional element because of how the genome is folded within the nucleus. Mammalian genomes are organized into a hierarchy of structures called topologically associated domains (TADs). Each one spans millions of nucleotides and contains smaller subTADs, which are separated by linker regions called boundaries.

Associate professor and Dean’s Faculty Fellow Jennifer E. Phillips-Cremins.

“The genetic code is made up of three billion base pairs. Stretched out end to end, it is 6 feet 5 inches long, and must be subsequently folded into a nucleus that is roughly the size of a head of a pin,” says Jennifer Phillips-Cremins, associate professor and dean’s faculty fellow in Bioengineering. “Genome folding is an exciting problem for engineers to study because it is a problem of big data. We not only need to look for patterns along the axis of three billion base pairs of letters, but also along the axis of how the letters are folded into higher-order structures.”

To address this challenge, Phillips-Cremins and her team recently developed a new mathematical approach called 3DNetMod to accurately detect these chromatin domains in 3D maps of the genome in collaboration with the lab of Dani Bassett, J. Peter Skirkanich Professor in Bioengineering.

“In our group, we use an integrated, interdisciplinary approach relying on cutting-edge computational and molecular technologies to uncover biologically meaningful patterns in large data sets,” Phillips-Cremins says. “Our approach has enabled us to find patterns in data that classic biology training might overlook.”

In a recent study, Phillips-Cremins and her team used 3DNetMod to identify tens of thousands of subTADs in human brain tissue. They found that nearly all disease-associated STRs are located at boundaries demarcating 3D chromatin domains. Additional analyses of cells and brain tissue from patients with Fragile X syndrome revealed severe boundary disruption at a specific disease-associated STR.

“To our knowledge, these findings represent the first report of a possible link between STR instability and the mammalian genome’s 3D folding patterns,” Phillips-Cremins says. “The knowledge gained may shed new light into how genome structure governs function across development and during the onset and progression of disease. Ultimately, this information could be used to create molecular tools to engineer the 3D genome to control repeat instability.”

Read the full story in Penn Today.

Investing in Penn’s Data Science Ecosystem

by Erica K. Brockmeier

As part of a major University-wide investment in science, engineering, and medicine, the Innovation in Data Engineering and Science Initiative aims to help Penn become a leader in developing data-driven approaches that can transform scientific discovery, engineering research, and technological innovation.

From smartphones and fitness trackers to social media posts and COVID-19 cases, the past few years have seen an explosion in the amount and types of data that are generated daily. To help make sense of these large, complex datasets, the field of data science has grown, providing methodologies, tools, and perspectives across a wide range of academic disciplines.

But the challenges that lie ahead for data scientists and engineers, from developing algorithms that don’t exacerbate biases to ensuring privacy protections, are equally complex and, in some instances, require entirely new ways of thinking.

As part of its $750 million investment in science, engineering, and medicine, the University has committed to supporting the future needs of this field. To this end, the Innovation in Data Engineering and Science (IDEAS) initiative will help Penn become a leader in developing data-driven approaches that can transform scientific discovery, engineering research, and technological innovation.

“The IDEAS initiative is game-changing for our University,” says President Amy Gutmann. “This new investment allows us to boost our interdisciplinary efforts across campus, recruit phenomenal additional team members, and generate an even more sound foundation for discovery, experimentation, and design. This initiative is a clear statement that Penn is committed to taking data science head-on.”

Building on a foundation of existing expertise

Led by the School of Engineering and Applied Science, the IDEAS initiative builds upon the steadily gathering momentum of its data-centric research. The Warren Center for Network and Data Sciences has been a major catalyst for this type of work, generating foundational research on ethical algorithms and data privacy, as well as collaborations that have drawn in faculty from the Wharton School, Law School, Perelman School of Medicine, and beyond. In addition, Wharton’s Department of Statistics and Data Science is an active partner in research and teaching initiatives that apply statistical modeling across a wide variety of fields.

“One of the unique things about data science and data engineering is that it’s a very horizontal technology, one that is going to be impacting every department on campus,” says George Pappas, Electrical and Systems Engineering Department chair. “When you have a horizontal technology in a competitive area, we have to figure out specific areas where Penn can become a worldwide leader.”

To do this, IDEAS aims to recruit new faculty across three research areas: artificial intelligence (AI) to transform scientific discovery, trustworthy AI for autonomous systems, and understanding connections between the human brain and AI.

Penn already has a strong foundation in using AI for scientific discovery thanks in part to investments in basic research facilities such as the Singh Center for Nanotechnology and the Laboratory for Research on the Structure of Matter. Additionally, there are centers focused on connecting researchers from different fields to address complex scientific questions, including the Center for Soft and Living Matter, Center for Engineering Mechanobiology, and Penn Institute for Computational Science.

Developing “trustworthy” algorithms, ones that work reliably outside of situations in which they are trained, is another key component of the IDEAS initiative. Ongoing research at the Penn Research in Embedded Computing and Integrated Systems Engineering (PRECISE) Center, the General Robotics, Automation, Sensing & Perception (GRASP) Lab, and DARPA-funded projects on the safety of AI-based aircraft control provide a starting point for furthering Penn’s research portfolio on safe, explainable, and trustworthy autonomous systems.

In the area of neuroscience and how the human brain is similar to AI and machine learning approaches, research from PIK Professor Konrad Kording and Dani Bassett’s Complex Systems lab exemplifies the types of cross-disciplinary efforts that are essential for addressing complex questions. By recruiting additional faculty in this area, IDEAS will help Penn make strides in bio-inspired computing and in future life-changing discoveries that could address cognitive disorders and nervous system diseases.

Read the full story in Penn Today.

Konrad Kording Receives Named University Professorship

Konrad Kording (Photo by Eric Sucar)

President Amy Gutmann has recently announced that two Penn Integrates Knowledge Professors, one of which is Penn Engineering’s own Konrad Kording, have received named University Professorships.  

Kording, who holds joint appointments in the Department of Neuroscience in the Perelman School of Medicine and the Department of Bioengineering in the School of Engineering and Applied Science, will become the Nathan Francis Mossell University Professor. 

When Nathan Francis Mossell graduated in 1882, he became the first African American to earn a medical degree from Penn. He soon became a prominent African American physician, the first to be elected to the Philadelphia County Medical Society. He helped found the Frederick Douglass Memorial Hospital and Training School, which treated Black patients and helped train the next generation of Black doctors and nurses.  

“Dr. Mossell was truly inspiring. He had to fight for everything, yet never reneged on his principles. He pretty much started a hospital and was a major champion for the advancement of equality for African Americans,” Kording said. “In my research, where I study how intelligence works, I am inspired by scholars like him who combine many different insights. He was a wonderful man, and I will be proud to carry his name.” 

Read more in Penn Today.

Penn Researchers Show ‘Encrypted’ Peptides Could be Wellspring of Natural Antibiotics

by Melissa Pappas

César de la Fuente, Ph.D.

While biologists and chemists race to develop new antibiotics to combat constantly mutating bacteria, predicted to lead to 10 million deaths by 2050, engineers are approaching the problem through a different lens: finding naturally occurring antibiotics in the human genome.

The billions of base pairs in the genome are essentially one long string of code that contains the instructions for making all of the molecules the body needs. The most basic of these molecules are amino acids, the building blocks for peptides, which in turn combine to form proteins. However, there is still much to learn about how — and where — a particular set of instructions are encoded.

Now, bringing a computer science approach to a life science problem, an interdisciplinary team of Penn researchers have used a carefully designed algorithm to discover a new suite of antimicrobial peptides, hiding deep within this code.

The study, published in Nature Biomedical Engineering, was led by César de la Fuente, Presidential Assistant Professor in Bioengineering, Microbiology, Psychiatry, and Chemical and Biomolecular Engineering, spanning both Penn Engineering and Penn Medicine, and his postdocs Marcelo Torres and Marcelo Melo. Collaborators Orlando Crescenzi and Eugenio Notomista of the University of Naples Federico II also contributed to this work.

“The human body is a treasure trove of information, a biological dataset. By using the right tools, we can mine for answers to some of the most challenging questions,” says de la Fuente. “We use the word ‘encrypted’ to describe the antimicrobial peptides we found because they are hidden within larger proteins that seem to have no connection to the immune system, the area where we expect to find this function.”

Read the full story in Penn Engineering Today.

Using Big Data to Measure Emotional Well-being in the Wake of George Floyd’s Murder

by Melissa Pappas

George Floyd’s murder had an undeniable emotional impact on people around the world, as evidenced by this memorial mural in Berlin, but quantifying that impact is challenging. Researchers from Penn Engineering and Stanford have used a computational approach on U.S. survey data to break down this emotional toll along racial and geographic lines. Their results show a significantly larger amount of self-reported anger and sadness among Black Americans than their White counterparts. (Photo: Leonhard Lenz)

The murder of George Floyd, an unarmed Black man who was killed by a White police officer, affected the mental well-being of many Americans. The effects were multifaceted as it was an act of police brutality and example of systemic racism that occurred during the uncertainty of a global pandemic, creating an even more complex dynamic and emotional response.

Because poor mental health can lead to a myriad of additional ailments, including poor physical health, inability to hold a job and an overall decrease in quality of life, it is important to understand how certain events affect it. This is especially critical when the emotional burden of these events  falls most on demographics affected by systemic racism. However, unlike physical health, mental health is challenging to characterize and measure, and thus, population-level data on mental health has been limited.

To better understand patterns of mental health on a population scale, Penn Engineers Lyle H. Ungar, Professor of Computer and Information Science (CIS), and Sharath Chandra Guntuku, Research Assistant Professor in CIS, take a computational approach to this challenge. Drawing on large-scale surveys as well as language analysis in social media through their work with the World Well-Being Project, they have developed visualizations of these patterns across the U.S.

Their latest study involves tracking changes in emotional and mental health following George Floyd’s murder. Combining polling data from the U.S. Census and Gallup, Guntuku, Ungar and colleagues have shown that Floyd’s murder spiked a wave of unprecedented sadness and anger across the U.S. population, the largest since relevant data began being recorded in 2009.

Read the full story in Penn Engineering Today.

N.B. Lyle Ungar is also a member of the Penn Bioengineering Graduate Group.

“This is What a Data Scientist Looks Like”

Speakers at the second annual Women in Data Science @ Penn Conference.

Last month, the second annual Women in Data Science (WiDS) @ Penn Conference virtually gathered nearly 500 registrants to participate in a week’s worth of academic and industry talks, live speaker Q&A sessions, and networking opportunities.

Hosted by Penn Engineering, Analytics at WhartonWharton Customer Analytics and Wharton’s Statistics Department, the conference’s theme — “This is What a Data Scientist Looks Like” – emphasized the depth, breadth, and diversity of data science, both in terms of the subjects the field covers and the people who enter it.

Following welcoming remarks from Erika James, Dean of the Wharton School, and Vijay Kumar, Nemirovsky Family Dean of Penn Engineering, the conference began with a keynote address from President of Microsoft US and Wharton alumna Kate Johnson.

Conference sessions continued throughout the week, featuring panels of academic data scientists from around Penn and beyond, industry leaders from IKEA Digital, Facebook and Poshmark, and lightning talks from students speakers who presented their data science research.

All of the conference’s sessions are now available on YouTube and the 2021 WiDS Conference Recap, including a talk titled “How Humans Build Models for the World” by Danielle Bassett, J. Peter Skirkanich Professor in Bioengineering and Electrical and Systems Engineering.

Read more about the conference at Wharton Stories: “How Women in Data Science Rise to the Top.

Originally posted in Penn Engineering Today.

Brain-machine interfaces: Villainous gadgets or tools for next-gen superheroes?

A Q&A with neuroscientist Konrad Kording on how connections between minds and machines are portrayed in popular culture, and what the future holds for this reality-defying technology.

Science fiction and superhero films portray brain-machine interfaces as malevolent robots that plug into human brains for fuel in The Matrix (top left) or as power-enhancing devices in X-Men (top right). In reality, they can help patients use artificial limbs or directly connect to computers. (Image credits, from top left to bottom right: Warner Brothers, 20th Century Fox, Intelligent Films, AFP Photo/Jean-Pierre Clatot)

For the many superheroes that use high-powered gadgets to save the day, there’s an equal number of villains who use technology nefariously. From robots that plug into human brains for fuel in “The Matrix” to the memory-warping devices seen in “Men in Black,” “Captain Marvel,” and “Total Recall,” technology that can control people’s minds is one of the most terrifying examples of technology gone wrong in science fiction and superhero films.

Now, progress made on brain-machine interfaces, technology that provides a direct communication link between a brain and an external device, is bringing us closer to a world that feels like science fiction. Elon Musk’s company NeuraLink is working on a device to let people control computers with their minds, while Facebook’s “mind-reading initiative” can decode speech from brain activity. Is this progress a glimpse into a dark future, or are there more empowering ways in which brain-machine interfaces could become a force for good?

Penn Today talked with Konrad Kording, a Penn Integrates Knowledge Professor of Neuroscience, Bioengineering, and Computer and Information Science whose group works at the interface of data science and neuroscience to better understand the human brain, to learn more about brain-machine interfaces and where real-world technologies and science fiction intersect.

Q: What are the main challenges in connecting brains to devices?

The key problem is that you need to get a lot of information out of brains. Today’s prosthetic devices are very slow, and if we want to go faster it’s a tradeoff: I can go slower and then I am more precise, or I can go faster and be more noisy. We need to get more data out of brains, and we want to do it electrically, meaning we need to get more electrodes into brains.

So what do you need? You need a way of getting electrodes into the brain without making your brain into a pulp, you want the electrodes to be flexible so they can stay in longer, and then you want the system to be wireless. You don’t want to have a big connector on the top of your head.

It’s primarily a hardware problem. We can get electrodes into brains, but they deteriorate quickly because they are too thick. We can have plugs on people’s heads, but it’s ruling out any real-world usage. All these factors hold us back at the moment.

That’s why the Neuralink announcement was very interesting. They get a rather large number of electrodes into brains using well-engineered approaches that make that possible. What makes the difference is that Neuralink takes the best ideas in all the different domains and puts them together.

Q: Most examples in pop culture of connecting brains to machines have villainous or nefarious ends. Does that match up with how brain-machine interfaces are currently being developed? 

Let’s say you’ve had a stroke, you can’t talk, but there’s a prosthetic device that allows you to talk again. Or if you lost your arm, and you get a new one that’s as good as the original—that’s absolutely a force for good.

It’s not a dark, ugly future thing, it’s a beautiful step forward for medicine. I want to make massive progress in these diseases. I want patients who had a stroke to talk again; I want vets to have prosthetic devices that are as good as the real thing. I think short-term this is what’s going to happen, but we are starting to worry about the dark sides.

Read the full interview at Penn Today.