AI Breakthrough: Metamorphic Video Powers Text-to-Video

University of Rochester

Using time-lapse videos as training data, computer scientists have developed video generators that simulate the physical world more accurately.

While text-to-video artificial intelligence models like OpenAI's Sora are rapidly metamorphosing in front of our eyes, they have struggled to produce metamorphic videos. Simulating a tree sprouting or a flower blooming is harder for AI systems than generating other types of videos because it requires the knowledge of the physical world and can vary widely.

But now, these models have taken an evolutionary step.

Computer scientists at the University of Rochester, Peking University, University of California, Santa Cruz, and National University of Singapore developed a new AI text-to-video model that learns real-world physics knowledge from time-lapse videos. The team outlines their model, MagicTime, in a paper published in IEEE Transactions on Pattern Analysis and Machine Intelligence.

"Artificial intelligence has been developed to try to understand the real world and to simulate the activities and events that take place," says Jinfa Huang, a PhD student supervised by Professor Jiebo Luo from Rochester's Department of Computer Science, both of whom are among the paper's authors. "MagicTime is a step toward AI that can better simulate the physical, chemical, biological, or social properties of the world around us."

Previous models generated videos that typically have limited motion and poor variations. To train AI models to more effectively mimic metamorphic processes, the researchers developed a high-quality dataset of more than 2,000 time-lapse videos with detailed captions.

Currently, the open-source U-Net version of MagicTime generates two-second, 512 -by- 512-pixel clips (at 8 frames per second), and an accompanying diffusion-transformer architecture extends this to ten-second clips. The model can be used to simulate not only biological metamorphosis but also buildings undergoing construction or bread baking in the oven.

But while the videos generated are visually interesting and the demo can be fun to play with, the researchers view this as an important step toward more sophisticated models that could provide important tools for scientists.

"Our hope is that someday, for example, biologists could use generative video to speed up preliminary exploration of ideas," says Huang. "While physical experiments remain indispensable for final verification, accurate simulations can shorten iteration cycles and reduce the number of live trials needed."

/University Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.