Researchers from the Institute of Modern Physics (CAS-IMP) have introduced an innovative ML model for classifying faults occurring in SRF cavities during accelerator operation. Deployed at the CAFE2 facility, the system helps operators identify the root causes of faults more rapidly and accurately, utilizing historical data and expert insights to enhance operational stability and optimize maintenance.
Addressing Operational Challenges
SRF cavities are essential components in modern high-power accelerators, but their operational faults are a primary source of unexpected downtime of an accelerator. Traditionally, diagnosing these faults requires time-consuming manual analysis of complex RF signal patterns by highly experienced personnel, creating a bottleneck in rapid response and recovery efforts.
Integrating Expert Knowledge into Machine Learning
To overcome this limitation, the CAS-IMP team developed an ML model trained specifically on operational data, using over 1900 real SRF fault events recorded at CAFE2. The signal for each fault event was captured by the low-level radio frequency (LLRF) system. This dataset covers eight distinct fault patterns, including common issues like Thermal Quench, Electronic Quench (E-Quench), Ponderomotive Instability, and Microphonics.
The core innovation lies in embedding expert diagnostic knowledge directly into the model's feature engineering process. This allows the system to automatically pinpoint critical fault indicators within complex waveform data, delivering accurate classification results within seconds of a fault occurring in an operating cavity.
High Accuracy and Efficiency Achieved
The expert-guided model demonstrated a classification accuracy of 95%, significantly outperforming conventional autoregressive (AR)-based models, which typically reach around 90%. It proved particularly effective in identifying challenging fault patterns marked by abrupt signal transitions, such as E-Quench and Flashover. Furthermore, the model's specialized feature extraction process completes roughly 30% faster than standard AR-based methods. Internal evaluations confirmed that the model's diagnostic logic closely mirrors the reasoning process of experienced engineers.
"By translating the link between physical events and signal patterns from operating cavities into engineered features, we've built a diagnostic system that effectively mimics an experienced engineer's reasoning, but operates at machine speed," said Dr. Feng Qiu, co-corresponding author.
From Diagnostics to Proactive Maintenance
Beyond rapid fault classification, the system facilitates long-term trend analysis, which supports proactive maintenance strategies. Operators can visualize which specific cavities are statistically more prone to certain types of faults during operations, enabling them to prioritize inspections and preventative actions. For instance, insights from the model have already informed operational adjustments at CAFE2: when certain cavities were identified as susceptible to issues like ponderomotive instabilities, control parameters were adjusted preemptively. Other actions guided by the model include refining feedback loop gains, reducing operating gradients in specific cavities, and recalibrating interlock systems to prevent fault propagation between cavities.
Prof. Yuan He, the project lead and co-corresponding author, emphasized the broader significance: "This marks a key step from traditional diagnostics relying heavily on human interpretation towards AI-assisted operational decision-making for active accelerators. By applying machine intelligence to diagnostics, we're not just reacting faster to failures, but paving the way for predicting and preventing faults before they disrupt operations."
This research offers a practical route to more dependable diagnostics and maintenance planning for operational SRF cavities, essential for today's and future high-power SRF linear accelerators. The expert-informed techniques developed here also provide valuable insights for SRF facilities globally. Future efforts will explore deep learning for more integrated classification and develop predictive algorithms to shift from fault diagnosis towards proactive prevention in operating accelerator systems. The complete study is accessible via DOI: 10.1007/s41365-025-01685-5.