How a spring term class project became a race to track and predict the disease
A team of Caltech students, led by Yaser Abu-Mostafa (PhD ’83), have developed a tool to predict the impact of COVID-19 using artificial intelligence (AI).
The project, began as an assignment for Abu-Mostafa’s computer science class, CS/CNS/EE 156 on Learning Systems during Caltech’s spring term. It has blossomed into a race to create a tool for policymakers to assess the pandemic’s potential impact on their community and to predict the effect of mitigation efforts.
The idea to redirect the focus of CS/CNS/EE 156 came from Caltech senior Alexander Zlokapa, who runs the Caltech Data Science Organization, a student initiative to promote student engagement with data science research. Zlokapa approached Abu-Mostafa on March 19 to suggest switching gears from the course’s original topic: the use of AI to make movie recommendations.
Worried about changing gears at the last minute, especially from a fun, light topic to something a lot more serious, Abu-Mostafa emailed his students for their thoughts. “The response was overwhelming. We all saw an opportunity to make a real difference for the country,” says Abu-Mostafa, professor of electrical engineering and computer science. Eighty students were already in the class; an additional 70 signed up when the new challenge was announced.
“Even at that early stage, the gravity of the pandemic was becoming clear: we are at war, and the enemy is potent,” Abu-Mostafa says. “We do not yet have the weapons to fight it, neither a cure nor a vaccine. However, there is another essential weapon in war; intelligence. We need to know what the enemy is doing so we can plan ahead. My students and I can help gather this intelligence using, fittingly, artificial intelligence, the subject of my course.”
While many models to predict the spread of a disease already exist, few if any incorporate AI, which makes predications based on observations of what is actually happening as opposed to what the model’s designers think should happen. AI has the power to discover patterns hidden in data that the human eye might not recognize.
“The standard epidemiology models make basic assumptions about the way a disease will spread, and then let you make tweaks based on infection and recovery rates and so on. AI, on the other hand, takes nothing for granted,” says graduate student Dominic Yurk (BS ’17), who was the head teaching assistant for the course.
If you are attempting to predict the future spread of COVID-19 in a given area, you do not want to look at what happened before and just draw a line going forward. You want to take into account population density and mobility, whether people are going to restaurants and to work, the hospital capacity in the area, the incidence of flu in the past; there is a lot of data that can be relevant to this problem. The disadvantage of the standard epidemiology model is that there is no easy way to add these factors in.
The class’s more than 150 students, who worked in competitive teams of up to four through April and May, were tasked with making accurate predictions about the impact of the disease. For the purposes of the class, they focused on predicting mortality rates, which were the most reliable metric for assessing the impact of the disease in a given location, given that testing data tends to be unreliable and not necessarily comprehensive. They threw themselves at the challenge and spent thousands of hours on the project. They gathered any data that seemed relevant, including COVID-19 mortality data, population demographics and densities of affected communities, whether people are abiding by stay-at-home orders, clinical statistics, previous flu data, and so on.
“We set the rules of the competition with policymakers in mind,” Abu-Mostafa says. “Here is an example of what to avoid: At some point in April, the state of Georgia had 590 intensive care unit (ICU) beds available. At that time, a prominent model predicted that Georgia’s COVID-19 patients would need between 424 and 1,928 ICU beds at any given time. Such an imprecise prediction is of no help to policymakers,” Abu-Mostafa says.
In the end, about 40 teams produced viable models capable of making reasonable predictions, and the top 10 or so were competitive with existing models in terms of their ability to make precise predictions about the future impact of COVID-19 on a given community. “Just from that course, we had some really good-looking models,” Yurk says.
When the class concluded in June, Abu-Mostafa’s students told him they wanted to continue the effort over the summer-to take the best parts of the top models generated by the class and aggregate them into a single, even more powerful tool. He selected eight graduate students (including Yurk) and 18 undergraduate students and tasked his new team with creating a model with an expanded scope, making rapid and precise predictions about mortality rates, number of infections, and positivity rates of testing.
Yaser Abu-Mostafa (top-center) meets with the student researchers building the COVID-19 model via Zoom.
Credit: Yaser Abu-Mostafa
He then had about two days to raise enough funding to keep the project going.
Abu-Mostafa approached Caltech senior trustee Charles Trimble (BS ’63, MS ’64), who was already funding an ongoing project by Abu-Mostafa to couple machine learning with inexpensive sensors to improve telemedicine. Abu-Mostafa asked if he could redirect those funds to support his newly assembled team. Trimble responded almost immediately with an enthusiastic “yes.”
“When Yaser called me about his idea for a summer COVID-19 project, it obviously made sense for him, his students, Caltech, and the country,” Trimble says. “It was impossibly ambitious and had a high payoff. For me, supporting the professor takes precedence over the project. Caltech’s overachievement comes from supporting researchers wherever their interests take them.”
Later, Abu-Mostafa also secured support from the Clinard Innovation Fund, established by entrepreneur and Caltech engineering alumnus Gary Clinard (BS ’65, MS ’66) to advance interdisciplinary research in engineering and applied science at Caltech. About 80 percent of the funds received by the project supported student-researcher salaries, and the rest for processing time at the High-Performance Computing (HPC) Center at Caltech.
With funding for the project nailed down, Abu-Mostafa next connected with public health officials, starting with the Pasadena Department of Public Health, to gather insight into the types of predictions that would actually be useful to the decisionmakers who are coordinating the nation’s response to COVID-19.
It turns out that mortality data is only useful to health officials about three to four days ahead of time, Abu-Mostafa says. What public health officials really need, he learned, is information about the infection rate, and for up to a month ahead of time. Such data can serve as a barometer of the risk that the disease will get out of control in an area. “They also told us that they needed ‘what-if’ tools that would allow them to predict the efficacy of possible interventions, like shutting down restaurants or mandating face coverings,” Abu-Mostafa says.
Knowing the toll that COVID-19 continues to take on the world, the team has kept up a furious pace of work, with significant results. They regularly compare predictions made by their model against predictions made by other popular models.
“There are about a dozen models in use that we measure ourselves against. At the start of the summer, we already had a model that could compete with the middle-of-the-pack ones. But we need to be better. If we release just one more model, nobody is really going to benefit,” Yurk says.
Abu-Mostafa expects that the team’s AI COVID-19 prediction tool will be completed and ready for testing by the end of August.
“By taking this challenge, we have thrown our hat into a crowded ring. There are many other teams around the world that are working toward the same goal. However, we can sincerely say: may the best team win. After all, we are more interested in winning the war than in winning a competition. When COVID-19 is defeated, we will all have reason to celebrate,” Abu-Mostafa says.
The COVID-19 model can be found at https://cs156.caltech.edu.