A new AI model created by researchers at Brown can generate motion in all kinds of robots and animated figures - humanoids, quadrupeds and more - from simple text commands.
PROVIDENCE, R.I. [Brown University] - Brown University researchers have developed an artificial intelligence model that can generate movement in robots and animated figures in much the same way that AI models like ChatGPT generate text.
The model, called MotionGlot, enables users to simply type an action - "walk forward a few steps and take a right"- and the model can generate accurate representations of that motion to command a robot or animated avatar.
The model's key advance, according to the researchers, is its ability to "translate" motion across robot and figure types, from humanoids to quadrupeds and beyond. That enables the generation of motion for a wide range of robotic embodiments and in all kinds of spatial configurations and contexts.
"We're treating motion as simply another language," said Sudarshan Harithas, a Ph.D. student in computer science at Brown, who led the work. "And just as we can translate languages - from English to Chinese, for example - we can now translate language-based commands to corresponding actions across multiple embodiments. That enables a broad set of new applications."
The research, which was supported by the Office of Naval Research, will be presented later this month at the 2025 International Conference on Robotics and Automation in Atlanta. The work was co-authored by Harithas and his advisor, Srinath Sridhar, an assistant professor of computer science at Brown.
Large language models like ChatGPT generate text through a process called "next token prediction," which breaks language down into a series of tokens, or small chunks, like individual words or characters. Given a single token or a string of tokens, the language model makes a prediction about what the next token might be. These models have been incredibly successful in generating text, and researchers have begun using similar approaches for motion. The idea is to break down the components of motion- the discrete position of legs during the process of walking, for example - into tokens. Once the motion is tokenized, fluid movements can be generated through next token prediction.
One challenge with this approach is that motions for one body type can look very different for another. For example, when a person is walking a dog down the street, the person and the dog are both doing something called "walking," but their actual motions are very different. One is upright on two legs; the other is on all fours. According to Harithas, MotionGlot can translate the meaning of walking from one embodiment to another. So a user commanding a figure to "walk forward in a straight line" will get the correct motion output whether they happen to be commanding a humanoid figure or a robot dog.
To train their model, the researchers used two datasets, each containing hours of annotated motion data. QUAD-LOCO features dog-like quadruped robots performing a variety of actions along with rich text describing those movements. A similar dataset called QUES-CAP contains real human movement, along with detailed captions and annotations appropriate to each movement.
Using that training data, the model reliably generates appropriate actions from text prompts, even actions it has never specifically seen before. In testing, the model was able to recreate specific instructions, like "a robot walks backwards, turns left and walks forward," as well as more abstract prompts like "a robot walks happily." It can even use motion to answer questions. When asked "Can you show me movement in cardio activity?" the model generates a person jogging.
"These models work best when they're trained on lots and lots of data," Sridhar said. "If we could collect large-scale data, the model can be easily scaled up."
The model's current functionality and the adaptability across embodiments make for promising applications in human-robot collaboration, gaming and virtual reality, and digital animation and video production, the researchers say. They plan to make the model and its source code publicly available so other researchers can use it and expand on it.
The research was supported by the Office of Naval Research (ONR) grant N00014-22-1-259.