KAIST Creates AI to Decode Animal Behavior

Korea Advanced Institute of Science and Technology
Professor Dae-Soo Kim
Professor Dae-Soo Kim

< Professor Dae-Soo Kim >

An artificial intelligence model capable of reading and interpreting animal behavior like language has been developed by researchers at KAIST. The team created BehaVERT, an AI model that learns behavioral data in a manner similar to natural language and was able to independently identify social behavioral deficits in an autism mouse model, opening a new avenue for interpretable neuroscience.

KAIST (President Kwang-Hyung Lee) announced that a research team led by Professor Dae-Soo Kim from the Department of Brain and Cognitive Sciences has developed an AI model that interprets animal movements as a form of behavioral language.

The researchers transformed skeletal movements of mice into tokens, analogous to words in natural language, and trained a transformer-based model to learn behavioral meaning. The resulting model, named BehaVERT, successfully identified core social behavioral abnormalities in an autism mouse model without being provided any prior biological knowledge.

The study introduces a novel AI framework for analyzing animal behavior through language-based representations. Beyond simple behavior classification, the model demonstrates the ability to uncover biologically meaningful patterns and may serve as a foundation for next-generation behavioral foundation models applicable to drug discovery, psychiatric research, and behavioral genetics.

Figure 1. Overview of the BehaVERT pipeline.
This figure shows how BehaVERT analyzes animal behavior from video. First, skeletal keypoints and behaviors are labeled using a web-based tool. The skeletal coordinates from each video frame are then converted into 768-dimensional
Figure 1. Overview of the BehaVERT pipeline.
This figure shows how BehaVERT analyzes animal behavior from video. First, skeletal keypoints and behaviors are labeled using a web-based tool. The skeletal coordinates from each video frame are then converted into 768-dimensional

< Figure 1. Overview of the BehaVERT pipeline.
This figure shows how BehaVERT analyzes animal behavior from video. First, skeletal keypoints and behaviors are labeled using a web-based tool. The skeletal coordinates from each video frame are then converted into 768-dimensional "tokens" and entered into a BERT-based transformer model. The model can classify both the behavior in each individual frame and the overall state of the full sequence. The tokens from the final layer are further used for unsupervised clustering and attention analysis, allowing researchers to visualize, along the timeline, which behaviors the model focused on when making its decision. >

Inspired by the idea that animal behavior may possess structures similar to language, the researchers represented the positions of a mouse's nose, ears, spine, limbs, and tail as behavioral tokens and trained a BERT-based transformer architecture.

As a result, BehaVERT learned not only to classify behaviors but also to understand their contextual meaning over time, much like language models infer meaning from sequences of words.

The model achieved state-of-the-art performance across five international benchmark datasets covering social interaction, multi-animal behavior, three-dimensional motion analysis, and autism-related behavioral assessment.

Importantly, BehaVERT also provides interpretability, allowing researchers to visualize which behavioral cues influenced its decisions.

In experiments distinguishing Shank3B knockout autism-model mice from healthy controls, the AI consistently focused on oral-oral contact behavior. This finding aligns with previous biological studies showing that autism-model mice exhibit deficits in social interaction despite maintaining normal approach behavior.

In other words, the AI independently rediscovered a key biological characteristic solely from behavioral observations, without explicit biological instruction.

Figure 2. Discovery of a
Figure 2. Discovery of a

< Figure 2. Discovery of a "compositional semantic structure" in animal behavior.
When the model's learned internal representation space was visualized using principal component analysis (PCA), different behaviors were geometrically organized along interpretable axes such as "mobility," "vertical attention," and "social engagement." The relationships between behaviors were expressed as consistent vector transformations, much like the semantic relationships found in word embeddings. Notably, the same compositional structure emerged even when the model was trained only through self-supervised learning, using masked keypoint modeling without any behavior labels. This suggests that animal movement itself contains a language-like semantic structure. >

The researchers further found that the model's internal representation space organized behavioral features such as mobility, attention, and social engagement into structured patterns. This suggests that animal behavior, much like language, may possess an underlying semantic structure.

The study also highlights an unusual interdisciplinary achievement. The first author, Dr. Seungjae Shin, and other members of the research team were trained primarily in biology rather than artificial intelligence. By independently learning transformer architectures and deep learning techniques, they designed specialized models and training strategies tailored for behavioral analysis.

Figure 3. AI independently discovers a key social behavior deficit in autism model mice.
When analyzing how the model distinguished autism model mice carrying a Shank3B gene deletion from normal mice, the attention analysis revealed a clear pattern. The model first identified periods when two mice were physically close to each other, then focused specifically on a rare reciprocal social behavior: oral-oral contact, in which the two mice bring their mouths into contact. This finding match established behavioral neuroscience evidence that autism model mice show deficits in reciprocal social interaction. Importantly, the AI rediscovered this core behavioral phenotype using only behavioral data, without any prior biological knowledge.
Figure 3. AI independently discovers a key social behavior deficit in autism model mice.
When analyzing how the model distinguished autism model mice carrying a Shank3B gene deletion from normal mice, the attention analysis revealed a clear pattern. The model first identified periods when two mice were physically close to each other, then focused specifically on a rare reciprocal social behavior: oral-oral contact, in which the two mice bring their mouths into contact. This finding match established behavioral neuroscience evidence that autism model mice show deficits in reciprocal social interaction. Importantly, the AI rediscovered this core behavioral phenotype using only behavioral data, without any prior biological knowledge.

< Figure 3. AI independently discovers a key social behavior deficit in autism model mice.
When analyzing how the model distinguished autism model mice carrying a Shank3B gene deletion from normal mice, the attention analysis revealed a clear pattern. The model first identified periods when two mice were physically close to each other, then focused specifically on a rare reciprocal social behavior: oral-oral contact, in which the two mice bring their mouths into contact. This finding match established behavioral neuroscience evidence that autism model mice show deficits in reciprocal social interaction. Importantly, the AI rediscovered this core behavioral phenotype using only behavioral data, without any prior biological knowledge. >

Professor Kim's laboratory has long pursued AI-driven behavioral analysis and previously developed AVATAR, a technology that reconstructs rodent behavior in virtual environments, leading to the founding of Actnova Inc.

"The project began with a simple question: Could animal movements contain a structure similar to language?" said Dr. Seungjae Shin, the first author of the study.

The team also adopted a self-supervised learning framework that enables AI to learn directly from behavioral data without manual annotations. Furthermore, a model trained on rat behavior successfully transferred to mouse behavior analysis, demonstrating the feasibility of a behavioral foundation model applicable across species.

"BehaVERT goes beyond behavior classification and enables the interpretation of behavioral meaning," said Professor Dae-Soo Kim. "We expect it to become a key research tool for discovering new insights in drug development, psychiatric disorders, behavioral genetics, and many other areas of life sciences."

The study was published on March 24, 2026, in the International Journal of Computer Vision (IJCV), one of the world's leading journals in computer vision.

Paper Information

• Title: BehaVERT: A Transformer-Based Motion Language Model for Decoding Behavioral Semantics in Mice

• Journal: International Journal of Computer Vision (IJCV)

• DOI: 10.1007/s11263-026-02834-y

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.