Scientists at Purdue University are using data science to help discover the next generation of treatments for cancer may be found. (Purdue University photo/Rebecca McElhoe)
WEST LAFAYETTE, Ind. — The next generation of treatments for cancer may be found, not by scientists peering through microscopes, but by computer scientists crunching numbers. Thanks to unprecedented amounts of data, Purdue University scientists are using innovative data science techniques to better understand the genetics and cellular biology of cancer cells and tumors.
This work allows them to pioneer new diagnostic tools, generate novel therapeutic treatments and significantly advance the fight against cancer. Some of these advances may even allow oncologists to harness a patient’s own immune system to fight off cancer.
Previously, scientists had to rely on small sample sizes, case studies and, in some lucky cases, genetic or DNA analyses of tumors. Now, they can draw from enormous publicly available databases that include an almost mind-numbing amount of data: information on people with different types of cancer across an enormous spectrum of continents, races, cultures, genders and age, as well as the genetics of hundreds of thousands of individual cells that make up tumors and other tissues. There is so much data, in fact, that traditional analytical tools fail.
That is where data science comes in.
Andrew Mesecar, Purdue’s Walther Professor in Cancer Structural Biology and deputy director of the Purdue University Center for Cancer Research, says data science approaches are crucial to the future of cancer research. (Purdue University photo/Rebecca McElhoe)
Data science is a field of science that uses advanced computer modeling and mathematics to analyze complex sets of data: data sets that are enormous and even those that include different kinds of data. It allows scientists to better understand problems and to find paths through the chaos.
Andrew Mesecar, Purdue’s Walther Professor in Cancer Structural Biology, is deputy director of the Purdue University Center for Cancer Research (PCCR) where he helps lead interdisciplinary teams of computational biologists, biochemists, statisticians, computer scientists and immunologists. An $11 million endowment from the Walther Cancer Foundation helps support the work.
“We have to figure out how to mine all of this data for meaning. Contained within all this genetic information are potential new molecular targets for cancer therapies and new biomarkers to detect and track cancer. We are pioneers. We are making the big leaps,” Mesecar said.
“In the future, when a cancer patient comes in, we are going to be able to monitor the genomics of their cancers in real time, to make predictions about the course a cancer might take and to make real-time decisions about what therapeutics to use. We are not going to have to wait and see if they respond to the drugs first and then change when they are too far along in treatment. What if they die? What if you put them through side effects without reducing their cancer?”
That worry — that without the right treatment, or the right treatment at the right time, patients could die — is at the heart of what Purdue’s Center for Cancer Research does. It is what drives the researchers, what inspires the lab work and what keeps scientists at their supercomputers and their lab benches, and what keeps them working together and learning from each other.
Min Zhang, associate director of data science at the Purdue Center for Cancer Research and a statistics professor, spent 12 years in medical school before deciding to focus on using data to combat cancer. (Purdue University photo/John Underwood)
A doctor crunching the numbers
Min Zhang began her career as a physician treating cancer patients in a hospital. After 12 years in medical school, she decided the answers to cancer might lie in the numbers. Now a statistics professor and associate director of data science at PCCR, Zhang develops statistical methods and applies them to cancer data, hoping to glean new insights into the early detection and diagnosis of cancer.
“We started to have a lot of data generated by the labs and clinics,” Zhang said. “The data are all mixed, all different kinds. We had to figure out how to extract information from these data and translate it into knowledge that people can use.”
Researchers have studied products of metabolism in the body — sugars, amino acids and other molecules called metabolites — to attempt to predict whether a patient has or will get cancer. But looking at one metabolite at a time did not show any strong patterns. When Zhang and her collaborators began using data science techniques to analyze groups of biologically related metabolites, however, they found a different story.
“The metabolites do not act in isolation; they work together to perform specific functions,” Zhang said. “When we look at groups of metabolites, we gain significantly more statistical power. When we look at the individual metabolite, only one is marginally significant. But when we studied them all together, there were very significant results.”
Using this method will provide more reliable biomarkers that could allow doctors to do things like screen patients for colorectal cancer, or even polyps, using a blood sample rather than an invasive procedure like a colonoscopy.
Zhang and her collaborators also developed machine-learning methods to study ways that the genes regulate each other on a genomewide scale as cancer progresses. Understanding how individual genes change and interact with others is key for treating cancer.
“When we treat patients with chemotherapy drugs at the very beginning, sometimes they respond but eventually they stop responding,” Zhang said. “If you target one gene, the cancer cells can adapt and take another route that allows them to keep growing. If we can target the whole network of genes and design a combination therapy, there is no way for cancer cells to survive. The genomewide causal gene regulatory networks constructed using newly developed machine-learning tools will provide multiple targets for novel therapeutic approaches development.”
Purdue’s Majid Kazemian is a computer scientist by training, but the assistant professor of biochemistry and computer science uses an expertise in both disciplines to study cancer. (Purdue University photo/Rebecca McElhoe)
Computer scientists hit the lab bench
Majid Kazemian started working as a computer scientist. But when he began working to help analyze cancer research data, he got curious about how his models played out in real life. Now an assistant professor of biochemistry and computer science, his Purdue lab is evenly divided between biologists and computer scientists.
“In the past few years, the amount of publicly available cancer data has increased exponentially,” Kazemian said. “We have advanced to being able to study cancer at a cellular level, cell by individual cell. We can now mine this data for patterns to generate novel hypotheses that we never have thought of before. Then we can go back into the lab and test the hypotheses.”
This method allows Kazemian and his lab to give new life to an old idea: What if they could get the patient’s own immune system to fight the cancer, without the need for drugs or therapies with the possibility of devastating side effects?
Some of the scariest cancers — including some forms of cervical cancer, liver cancer, non-Hodgkin’s lymphoma, stomach cancer — are caused by viruses. But once the virus causes the cancer, the cancer negates the virus so that it is no longer harmful. Kazemian’s lab is trying to use data science approaches to figure out how to reactivate the virus in the living cells, alerting the cancer patient’s immune system and allowing it to better fight the cancer cells — weaponizing the actual virus that caused the problem and the patient’s own body to defeat the cancer.
The team also is looking at other ways to harness and boost patients’ immune systems to combat cancer.
“The majority of the success we have seen in cancer biology is a range of concepts related to immunotherapy, drugs that can boost your immune system itself to fight back the cancer,” Kazemian said.
The idea is not new, but the approach would be impossible without the partnership between cancer research and data science.
Nadia Lanman, a research assistant professor of comparative pathobiology at Purdue University, uses her machine-learning expertise to study and analyze cancer research data. (Purdue University photo/John Underwood)
Saving lives with numbers
Nadia Lanman, another scientist who began working with computers and large data sets and ended up focusing on cancer data, is a research assistant professor of comparative pathobiology. She uses Purdue’s network of supercomputers to enable machine-learning projects that help sort and analyze data. She is helping scientists analyze data in new ways, allowing them insights into the data and better pathways forward.
“When someone comes in with cancer, we don’t know how they’re going to respond to different types of treatment, and we don’t know how sick they’re going to get from potential side effects,” Lanman said. “If we can tease these things out using machine learning and these massive data sets, we could imagine a world were when a cancer patient comes in; we could collect data and use data science to help oncologists make recommendations.”
Lanman reiterated her mission and the mission of the cancer center: to make discoveries that will build the foundation for innovative cancer solutions. The cure for cancer is not in one field or the other; it is in experts from all fields working together using the most up-to-date data and analytical techniques.
“I love the work I do — that we do — at the cancer center,” Lanman said. “I love that we are really trying, every day, to make the world a better place for patients.”