Industrial quality inspection plays a critical role in manufacturing, from ensuring the reliability of electronics and vehicles to preventing costly failures in aerospace and energy systems. Traditional vision-based inspection systems typically rely on Red, Green, Blue (RGB) cameras, which are fast and inexpensive but often miss defects related to geometry (scratches or dents), material structure, or heat dissipation. While additional sensors, such as thermal cameras or depth scanners, can reveal these hidden anomalies, effectively combining information from multiple sensors remains a major technical challenge. Many existing fusion approaches either lose fine spatial detail, require heavy computation, or fail when sensors are not perfectly aligned—common issues in factory settings.
To address these, a research team led by Associate Professor Phan Xuan Tan from the Innovative Global Program, College of Engineering, Shibaura Institute of Technology, Japan, along with Dr. Dinh-Cuong Hoang from the FPT University, Vietnam, has proposed a new framework, termed MambaAlign, which enables computationally efficient fusion of multimodal sensor data while remaining robust to modest sensor misregistration. The study was made available online on December 27, 2025, and was published in Volume 13, Issue 1 of the Journal of Computational Design and Engineering on January 01, 2026.
"Existing systems miss geometric and material/thermal defects, amplify sensor artifacts, lose localization, or are brittle to modest misregistration. In addition, efficiently capturing long-range, orientation-sensitive context (important for thin/oblique defects) without the quadratic cost of dense attention remained unresolved. These challenges of existing systems motivated us to develop a fusion approach that is alignment aware, uses state-space recurrences to collect long-range directional context, and exchanges semantic guidance at deep stages via lightweight cross-recurrence (Cross Mamba Interaction), and then reconstitutes low-level channels top-down to preserve precise localization," says Dr. Tan.
MambaAlign introduces an alignment-aware state-space fusion framework for multimodal industrial anomaly detection. The method captures long-range and orientation-aware context using state-space refinement, which is particularly effective for detecting thin or oblique defects such as scratches and cracks. Instead of relying on computationally expensive global attention, MambaAlign exchanges semantic guidance between sensors only at high-level feature stages, keeping the computational cost close to linear. A top-down reconstruction mechanism then reconstitutes low-level feature channels, allowing the system to tolerate modest sensor misalignment while preserving precise pixel-level localization.
Extensive experiments demonstrate the effectiveness of the approach. Averaged across three RGB-plus-auxiliary-modality (RGB-X) datasets, MambaAlign improves image-level area under the receiver operating characteristic curve (AUROC) by approximately 4.8%, pixel-level AUROC by about 5.0%, and area under the per-region overlap curve by roughly 6.5% compared with prior methods. Importantly, these gains come without excessive computational overhead. The model sustains close to 30 frames per second at moderate resolutions, with controlled memory usage, making it practical for deployment in real production lines.
"MambaAlign achieves state-of-the-art localization with parameters and runtime suitable for real-time inspection. It not only provides higher detection accuracy but also tighter and less fragmented anomaly maps. This translates directly into fewer false alarms, fewer missed defects, and more actionable outputs for engineers on the factory floor," says Dr. Tan.
Overall, the study highlights wide-ranging industrial relevance. In electronics and printed circuit board inspection, MambaAlign can detect micro-cracks or missing components that subtly alter thermal or geometric patterns. In aerospace and composite manufacturing, fusing RGB and thermal data helps reveal subsurface delamination invisible to standard cameras. Automotive body inspection benefits from improved detection of dents, scratches, and seam defects, while the system's real-time performance enables inline inspection on conveyor belts or robotic vision stations. By reducing manual inspection effort, minimizing scrap, and improving reliability under realistic sensor conditions, MambaAlign addresses a long-standing bottleneck in industrial quality assurance.
Reference
DOI: https://doi.org/10.1093/jcde/qwaf143
About Shibaura Institute of Technology (SIT), Japan
Shibaura Institute of Technology (SIT) is a private university with campuses in Tokyo and Saitama. Since the establishment of its predecessor, Tokyo Higher School of Industry and Commerce, in 1927, it has maintained "learning through practice" as its philosophy in the education of engineers. SIT was the only private science and engineering university selected for the Top Global University Project sponsored by the Ministry of Education, Culture, Sports, Science and Technology and had received support from the ministry for 10 years starting from the 2014 academic year. Its motto, "Nurturing engineers who learn from society and contribute to society," reflects its mission of fostering scientists and engineers who can contribute to the sustainable growth of the world by exposing their over 9,500 students to culturally diverse environments, where they learn to cope, collaborate, and relate with fellow students from around the world.
Website: https://www.shibaura-it.ac.jp/en/
About Associate Professor Phan Xuan Tan from SIT, Japan
Dr. Phan Xuan Tan is an Associate Professor in the Innovative Global Program, College of Engineering, Shibaura Institute of Technology (SIT), Japan. He earned a B.E. in Electrical-Electronic Engineering from Le Quy Don Technical University, and an M.S. in Computer and Communication Engineering from Hanoi University of Science & Technology, Vietnam. He received his Ph.D. in Functional Control Systems from SIT in 2018. His academic work bridges engineering and artificial intelligence (AI), with research centered on computer vision, image processing, generative AI, and AI safety.