Joint research led by Sosuke Ito of the University of Tokyo has shown that nonequilibrium thermodynamics, a branch of physics that deals with constantly changing systems, explains why optimal transport theory, a mathematical framework for the optimal change of distribution to reduce the cost, makes generative models optimal. As nonequilibrium thermodynamics has yet to be fully leveraged in designing generative models, the discovery offers a novel thermodynamic approach to machine learning research. The findings were published in the journal Physical Review X.
Image generation has been improving in leaps and bounds over the recent years: a video of a celebrity eating a bowl of spaghetti that represented the state of the art a couple of years ago would not even qualify as good today. The algorithms that power image generation are called diffusion models, and they contain randomness called "noise". During the training process, we introduce noise to the original data through diffusion dynamics. During the generation process, the model must eliminate the noise to generate new content from the noisy data. This is achieved by considering the time-reversed dynamics, as if playing the video in reverse. One piece of the art and science of building a model that produces high-quality content is specifying when and how much noise is added to the data.
"The selection of diffusion dynamics, also known as a noise schedule, has been controversial in diffusion models since their inception," says Ito, the principal investigator. "Optimal transport dynamics has been empirically shown to be useful in diffusion models, but it has not been theoretically demonstrated why it would be so."
Although diffusion models were originally inspired by nonequilibrium thermodynamics, and optimal transport theory is closely related to the area, previous studies have overlooked this connection. Thus, the question arose: could nonequilibrium thermodynamics provide a theoretical framework for why optimal transport dynamics works so well in diffusion models? A recent advancement in thermodynamic trade-off relations, a technique describing the relationship between thermodynamic dissipation and the speed of changes in the system, proved incredibly helpful. Using this technique, the researchers derived inequalities between thermodynamic dissipation and the robustness of data generation in diffusion models. They used the newly derived inequalities to show that optimal transport dynamics ensure the most robust data generation.
"One surprising result is that our bound is tight within a certain order of magnitude for real-world image generation scenarios," explains Ito. "This shows that our inequalities are useful not only for understanding the optimal protocol in diffusion models, but also for analyzing the practical application of generating image data."
Moreover, there is another surprising aspect of this project. Ito elaborates.
"The first and second authors of the paper are undergraduate students, and this research was partially conducted as part of a class they were enrolled in. In particular, the first author, Kotaro Ikeda, contributed greatly to this study, from numerical calculations to theoretical analysis. We hope our results raise awareness of the importance of nonequilibrium thermodynamics in the machine learning community, and we, including the next generation, continue to explore its usefulness in understanding biological and artificial information processing."