By transforming a kitchen into a fully instrumented research environment, a team led by EPFL neuroscientist Alexander Mathis opens a new window onto the fine-grained mechanics of human movement.
Few environments reveal the mechanics of human movement as clearly as a kitchen. Pots, pans, countertops, ovens, and fridges form the stage of a new project led by Alexander Mathis, assistant professor at EPFL's Brain Mind and Neuro-X Institutes. In collaboration with colleagues from EPFL, ETH Zurich, and the Microsoft Joint Swiss Research Center, the computational neuroscientist is introducing the EPFL-Smart-Kitchen-30 dataset-a uniquely comprehensive, multi-angle recording of meal-preparation gestures. The project, to be presented in early December at NeurIPS in San Diego, lays the groundwork for better monitoring of daily-life-relevant effects of neurorehabilitation and the development of more effective treatment strategies for motor-function rehabilitation and assistance, building on the work of EPFL researchers such as Friedhelm Hummel and Solaiman Shokur.
The project set out to follow, in a non-invasive way, how people perform everyday actions in situations that come as close as possible to real life. By modeling both the motor and cognitive components of these gestures, the researchers aim to better understand how movement, coordination, and action planning are structured. Potential applications are wide-ranging, from basic and translational neuroscience to machine learning, including the medical field.
Why the kitchen? "The first reason is a matter of privacy" explains Alexander Mathis. "Of all the rooms in a home, the kitchen raises the fewest concerns." The second ground is more scientific. "While cooking, people perform an enormous variety of movements: walking, standing on tiptoe, opening cupboards, handling knives, pots, wrappers… We get to observe eye-hand coordination, planning (all ingredients need to be ready at the right time) and even expressions of people's personal style. It truly mobilizes the entire body and brain."
To turn this intuition into data, the team built a fully instrumented kitchen at Campus Biotech. A project "that's been on the cooker for a while", as Alexander Mathis likes to joke, EPFL-Smart-Kitchen-30 relied on a unique motion capture platform: nine fixed RGB-D cameras positioned around the room so that the participants' hands were always visible from several angles; a HoloLens 2 headset recording from a first-person perspective while also tracking gaze; inertial measurement units capturing body and hand movements. "Some elements of the kitchen itself were also instrumented," the researcher details. "We placed an accelerometer on the fridge door, which allowed us to measure how fast it opened and how smooth or hesitant the gesture was."
While cooking, people perform an enormous variety of movements: walking, standing on tiptoe, opening cupboards, handling knives, pots, wrappers… It truly mobilizes the entire body and brain.
Omelet, ratatouille, pad thai..
Altogether, the dataset comprises almost 30 hours of recordings. All 16 participants - women and men aged 20 to 46 - prepared four different recipes, each repeated several times. This enabled the team to observe how gestures evolve with practice. On the menu: an omelet with a salad, ratatouille, or pad thai. "Pad thai was a good choice", notes Alexander Mathis. "As it was a new dish for some participants, especially the older ones, it required some getting used to." Each dish combined simple actions with strict timing constraints: monitoring a pan while preparing a sauce, anticipating the next step, adapting to the unexpected.
One of the project's strengths lies in the precision of its annotations. Each session was analyzed by human annotators, who continuously described the participant's actions. Some 768 distinct types of actions were defined, ranging from very concrete gestures - "grab eggplant," "grab knife," "cut eggplant" - to more general categories such as "prepare ingredients" or "clean countertop." The result is more than 30 action segments per minute.
This data fueled four major benchmarks designed to test the capabilities of artificial intelligence models: vision-language understanding, multimodal action recognition, pose-based segmentation, and text-to-motion generation. The latter consists in linking verbal instructions to 3D motion trajectories. Learning this connection between language and movement is essential if assistive systems or robots are to truly understand what they are being asked to perform.
At present, results show the challenge is far from being met. "On the action-recognition task, the best current AI models reach about 40% accuracy," notes Alexander Mathis. In other words, they are still a long way from being able to analyze a cooking session with the level of reliability required for clinical applications. But the researcher remains confident: "I am certain that in a year or two, performance will be significantly better. AI is advancing very rapidly, and benchmarks like this one will help it reach new milestones."
Helping people regain their mobility
Behind these figures also lies a very concrete objective: helping people recover lost movement. Among the research partners, Friedhelm Hummel, head of the Research Chair in Clinical Neuroengineering and Human-Machine Interaction, focuses on the recovery of stroke patients and on personalized therapies. At the Translational Neural Engineering Lab, neuroengineer Solaiman Shokur works on interfaces designed to recover more natural movements after severe injuries. All provided input for the project.
"Today, when a patient is recovering from a stroke," explains Friedhelm Hummer, "they might be asked to lift their arm and be assigned scores based on their performance. But by observing how they cook, we can discover much more than that, especially in the view of daily life relevance. Are they avoiding certain gestures? Are they taking much longer for actions that should normally be simple? Are they achieving the goal of the cooked meal?"
In the long run, one of the aims is to automatically be able to link spontaneous behavior to existing clinical scores - or even to develop new ones. Such indicators could one day make it possible to monitor the progress of home-based rehabilitation, for example by analyzing the evolution of hour-long cooking sessions over several weeks.
Beyond health-related issues, Alexander Mathis is also interested in what differentiates ordinary gestures from those of experts. "How do you cook like a chef? How do you play the guitar like an exceptional musician? Between patients undergoing recovery and experts, there is a vast continuum of motor control that we would like to describe." A second study, already in preparation, will involve a larger number of participants and will place a particular focus on expertise.