Team Crafts Framework for Human-Focused Volumetric Videos

SciOpen

A research team has developed a Gaussian Splatting processing platform that supports end-to-end processing from data acquisition to multi-platform rendering. Their framework provides a solid foundation for the large-scale adoption and future research of Gaussian Splatting technology.

The research is published in the journal Visual Intelligence on March 27, 2026.

3D Gaussian Splatting is a sophisticated computer graphics that uses millions of tiny points, or "splats," to create highly realistic 3D scenes. Because of its exceptional rendering quality and real-time capabilities, 3D Gaussian Splatting offers strong support in applications such as virtual reality, augmented reality, and next-generation immersive media. In recent years, researchers have extended 3D Gaussian Splatting to dynamic scenes, proposing various 4D Gaussian Splatting representations.

3D Gaussian Splatting is a key technology in a revolutionizing immersive media called volumetric video. Volumetric video is a 3D recording technique that uses multiple cameras to film a person or object and creates a digital 3D model that is viewable from any angle, rather than a flat image. "Volumetric video enables free-viewpoint exploration of immersive virtual environments, eﬀectively narrowing the gap between digital and physical realities," said Professor Jingyi Yu from the School of Information Science and Technology, ShanghaiTech University.

However, there are two major challenges with the volumetric video technology: the massive storage and transmission overhead associated with temporal sequences, and a fragmented tool chain ecosystem that hinders eﬃcient research and development. These challenges prevent researchers from extending this technology to dynamic scenes.

For 3D Gaussian Splatting to find its way to practical use, the challenge of storage cost must be overcome. "The most pressing issue is the high storage cost, especially for dynamic scenes where the introduction of the temporal dimension dramatically increases the data volume, imposing greater demands on storage, transmission, and real-time interaction," said Dr. Lan Xu, also from the School of Information Science and Technology, ShanghaiTech University.

Existing solutions typically focus on isolated stages and have lacked a unified, end-to-end workflow from data acquisition to final viewing. Earlier studies have focused on the compression and optimization of Gaussian Splatting, however, this work has been scattered across different code bases and limited by inconsistent data formats and incompatible data loaders. These problems hinder the reproduction, comparison, and integration of these different methods.

"To address these challenges, we propose a comprehensive dynamic Gaussian processing framework that provides a complete, end-to-end pipeline. This framework systematically integrates the entire process, from data acquisition and standardized preprocessing to a suite of diverse dynamic Gaussian reconstruction algorithms," said Dr. Xu.

The team's platform incorporates a variety of mainstream and cutting-edge 3D Gaussian Splatting and 4D Gaussian Splatting reconstruction methods. These methods provide standardized data preprocessing interfaces and unified data loading mechanisms and include a general compression framework that can be adapted to multiple representations.

One of framework's core contributions is a general-purpose compression framework, compatible with the outputs of various reconstruction methods, which significantly reduces the storage footprint of dynamic sequences while maintaining high visual fidelity.

The team has also developed a cross-platform real-time rendering plugin that supports high-quality, interactive, free-viewpoint experiences for users on desktop, mobile, and XR devices.

Their work also includes a large-scale, high-quality dynamic human motion capture dataset. To achieve this, they built a dense multi-view acquisition system that contains 81 synchronized RGB cameras. With this 81-camera array, the team then captured over 130 sequences of diverse human motions, including complex interactions with topological changes. This system can record timecode-aligned 3840 x 2160 resolution video at a frame rate of 30 frames per second.

"With this platform, we aim to establish a complete pipeline from data acquisition to practical application, promoting the large-scale adoption of Gaussian Splatting technologies in real-world scenarios and providing a reliable and eﬃcient experimental foundation for future research," said Dr. Xu.

The ShanghaiTech University research team includes Shengkun Zhu, Chengcheng Guo, Yuanji Lu, Zhehao Shen, Yize Wu, Yu Hong, YiwenCai, Meihan Zheng, Yingliang Zhang, Lan Xu, and Jingyi Yu.

Funding information

This research was funded by the National Natural Science Foundation of China, National Key R&D Program of China, and the Central Guided Local Science and Technology Foundation of China, MoE KeyLab of Intelligent Perception and Human-Machine Collaboration (Shanghai Tech University), and the Shanghai Frontiers Science Center of Human-centered Artificial Intelligence.

About the Authors

Dr. Jingyi Yu is an OSA Fellow, IEEE Fellow and an ACM Distinguished Scientist, Director of the MoE Key Lab of Intelligent Perception and Human-Machine Collaboration. He received B.S. with honor from Caltech in 2000 in Computer Science and Applied Mathematics and Ph.D. from MIT in EECS in 2005. He is now the Inaugural Chair Professor of the ShanghaiTech University. He also serves as the Vice President of the university. Dr. Yu has been working extensively on computational imaging, computer vision, computer graphics, and bioinformatics. He has won multiple Best Paper Awards at top conferences, including the 2025 ACM SIGGRAPH Best Paper Award, the 2025 SIGGRAPH Best in Show Award (Emerging Technology), and a 2024 SIGGRAPH Best Paper Nomination. His student received the 2024 CVPR Best Student Paper Award. He was the first to introduce large visual models into chip design and received DAC Best Paper Honorable Mentions in both 2024 and 2025. He has served on the editorial boards of leading journals and was Program Chair of CVPR 2021 and ICCV 2027, as well as General Chair of ICCV 2025.

Dr. Lan Xu is an Assistant Professor with the School of Information Science and Technology at ShanghaiTech University, China. He received a PhD degree in Electronic and Computer Engineering from the Hong Kong University of Science and Technology (HKUST), Hong Kong, China. After that, he joined ShanghaiTech as a tenure-track Assistant Professor, PI. His research lies at the intersection of computer vision, computer graphics, and computer photography. He has published various top-tier conference and journal papers, including SIGGRAPH, CVPR, ICCP, IEEE TRO, IEEE TPAMI, and ACM TOG etc.

About Visual Intelligence

Visual Intelligence is an international, peer-reviewed, open-access journal devoted to the theory and practice of visual intelligence. This journal is the official publication of the China Society of Image and Graphics (CSIG), with Article Processing Charges fully covered by the Society. It focuses on the foundations of visual computing, the methodologies employed in the field, and the applications of visual intelligence, while particularly encouraging submissions that address rapidly advancing areas of visual intelligence research.

/Public Release. This material from the originating organization/author(s) might be of the point-in-time nature, and edited for clarity, style and length. Mirage.News does not take institutional positions or sides, and all views, positions, and conclusions expressed herein are solely those of the author(s).View in full here.

You might also like