AI Tackles Weather: Can It Predict Extreme Events?

University of Chicago

Increasingly powerful AI models can make short-term weather forecasts with surprising accuracy. But neural networks only predict based on patterns from the past—what happens when the weather does something that's unprecedented in recorded history? A new study led by scientists from the University of Chicago, in collaboration with New York University and the University of California Santa Cruz, is testing the limits of AI-powered weather prediction. In research published May 21 in Proceedings of the National Academy of Sciences, they found that neural networks cannot forecast weather events beyond the scope of existing training data—which might leave out events like 200-year floods, unprecedented heat waves or massive hurricanes.

This limitation is particularly important as researchers incorporate neural networks into operational weather forecasting, early warning systems, and long-term risk assesments, the authors said. But they also said there are ways to address the problem by integrating more math and physics into the AI tools.

"AI weather models are one of the biggest achievements in AI in science. What we found is that they are remarkable, but not magical," said Pedram Hassanzadeh, an associate professor of geophysical sciences at UChicago and a corresponding author on the study. "We've only had these models for a few years, so there's a lot of room for innovation."

Gray swan events

Weather forecasting AIs work in a similar way to other neural networks that many people now interact with, such as ChatGPT.

Essentially, the model is "trained" by feeding it a bunch of text or images into a model and asking it to look for patterns. Then, when a user presents the model with a question, it looks back at what it's previously seen and uses that to predict an answer.

In the case of weather forecasts, scientists train neural networks by feeding them decades' worth of weather data. Then a user can input data about the current weather conditions and ask the model to predict the weather for the next several days.

The AI models are very good at this. Generally, they can achieve the same accuracy as a top-of-the-line, supercomputer-based weather model that uses 10,000 to 100,000 times more time and energy, Hassanzadeh said.

"These models do really, really well for day-to-day weather," he said. "But what if next week there's a freak weather event?"

The concern is that the neural network is only working off the weather data we currently have, which goes back about 40 years. But that's not the full range of possible weather.

"The floods caused by Hurricane Harvey in 2017 were considered a once-in-a-2,000-year event, for example," Hassanzadeh said. "They can happen."

Scientists sometimes refer to these events as "gray swan" events. They're not quite all the way to a black swan event—something like the asteroid that killed the dinosaurs—but they are locally devastating.

The team decided to test the limits of the AI models using hurricanes as an example. They trained a neural network using decades of weather data, but removed all the hurricanes stronger than a Category 2. Then they fed it an atmospheric condition that leads to a Category 5 hurricane in a few days. Could the model extrapolate to predict the strength of the hurricane?

The answer was no.

"It always underestimated the event. The model knows something is coming, but it always predicts it'll only be a Category 2 hurricane," said Yongqiang Sun, research scientist at UChicago and the other corresponding author on the study.

This kind of error, known as a false negative, is a big deal in weather forecasting. If a forecast tells you a storm will be a Category 5 hurricane and it only turns out to be a Category 2, that means people evacuated who may not have needed to, which is not ideal. But if a forecast underestimates a hurricane that turns out to be a Category 5, the consequences would be far worse.

Hurricane warnings and why physics matters

The big difference between neural networks and traditional weather models is that traditional models "understand" physics. Scientists design them to incorporate our understanding of the math and physics that govern atmospheric dynamics, jet streams and other phenomena.

The neural networks aren't doing any of that. Like ChatGPT, which is essentially a predictive text machine, they simply look at weather patterns and suggest what comes next, based on what has happened in the past.

No major service is currently using only AI models for forecasting. But as their use expands , this tendency will need to be factored in, Hassanzadeh said.

Researchers, from meteorologists to economists, are beginning to use AI for long-term risk assessments. For example, they might ask an AI to generate many examples of weather patterns, so that we can see the most extreme events that might happen in each region in the future. But if an AI cannot predict anything stronger than what it's seen before, its usefulness would be limited for this critical task. However, they found the model could predict stronger hurricanes if there was any precedent, even elsewhere in the world, in its training data. For example, if the researchers deleted all the evidence of Atlantic hurricanes but left in Pacific hurricanes, the model could extrapolate to predict Atlantic hurricanes.

"This was a surprising and encouraging finding: it means that the models can forecast an event that was unpresented in one region but occurred once in a while in another region," Hassanzadeh said.

Merging approaches

The solution, the researchers suggested, is to begin incorporating mathematical tools and the principles of atmospheric physics into AI-based models.

"The hope is that if AI models can really learn atmospheric dynamics, they will be able to figure out how to forecast gray swans," Hassanzadeh said.

How to do this is a hot area of research. One promising approach the team is pursuing is called active learning—where AI helps guide traditional physics-based weather models to create more examples of extreme events, which can then be used to improve the AI's training.

"Longer simulated or observed datasets aren't going to work. We need to think about smarter ways to generate data," said Jonathan Weare, professor at the Courant Institute of Mathematical Sciences at New York University and study co-author. "In this case, that means answering the question 'where should I place my training data to achieve better performance on extremes?' Fortunately, we think AI weather models themselves, when paired with the right mathematical tools, can help answer this question."

University of Chicago Prof. Dorian Abbot and computational scientist Mohsen Zand were also co-authors on the study, as well as Ashesh Chattopadhyay of the University of California Santa Cruz.

The study used resources maintained by the University of Chicago Research Computing Center. A video explaining the findings can be found here .

Citation: " Can AI weather models predict out-of-distribution gray swan tropical cyclones? " Sun et al, Proceedings of the National Academy of Sciences, May 21, 2025.

Funding: Office of Naval Research, Army Research Office, National Science Foundation.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.

You might also like