Oak Ridge National Laboratory's Center for Artificial Intelligence Security Research (CAISER) is shining a light on AI vulnerabilities. While AI models offer tremendous economic, humanitarian and national security potential, they are also increasingly susceptible to exploitation. Identifying and characterizing these vulnerabilities has required considerable intellectual effort and specialized expertise.
To bring both efficiency and effectiveness to AI vulnerability detection, CAISER researchers developed Photon, a groundbreaking framework designed to rapidly detect vulnerabilities in AI models at exascale. The technology can help ensure AI-based systems remain secure and robust against attacks, a key protection as models are deployed across critical domains - from energy and healthcare to finance and national security.
ORNL researchers designed Photon by reimagining their existing technology, DeepHyper , originally developed for training large neural networks to find optimal network parameters. By inverting its purpose, now trained to detect nefarious activity, Photon can detect the most efficient attack parameters against AI models and help model developers understand how to prevent these attacks.
"It might sound devious, but it's worked very well," said ORNL's Edmon Begoli, director of CAISER. "Photon accelerates the design and development process and reuses the most effective methods for exploring and exploiting vulnerabilities."
"Exploration and exploitation" is a fundamental AI concept that describes the balance between discovering new possibilities (exploration) and making use of existing knowledge (exploitation).
Photon begins by applying publicly known attacks from a catalog of published scientific literature against a target model. It then refines these attacks by exploiting vulnerabilities discovered in the model. While this exploitation happens, Photon is also exploring the model further to uncover new weaknesses, which are subsequently exploited. The cycle continues until no further degradations of the model's performance are observed.
Through the DeepHyper framework, Photon's automated testing efficiently explores large hyperparameter spaces through asynchronous, decentralized execution. In other words, Photon can quickly try many different settings at once, even when the tests run on separate computers. This model differs from traditional centralized schemes where a single manager "agent" coordinates the entire optimization process.
Instead, each of Photon's attack agents coordinate their findings with the others so if one attack seems to be most effective, other attack agents learn in real time and improve their own attacks, allowing them to exploit weaknesses to the maximum.
Frontier enables large-scale AI vulnerability testing
This kind of attack testing speed and efficiency is made possible through ORNL's Frontier exascale supercomputer. Photon's ability to run in parallel - such as running several different attacks at the same time on different nodes - sets it apart from any other known AI vulnerability testing approach. For example, running on Frontier nodes, Photon can execute 60,000 "jailbreak" prompts, inputs designed to unlock restricted behaviors in an AI system, each hour. Comparably, it could take human "red teams", groups mimicking adversaries, years to accomplish similar results, especially knowing that Photon not only executes jailbreaks in parallel, but it also coordinates these attack campaigns to constantly pursue the most effective paths.
By adapting and evolving its tactics in real-time, Photon mimics nature's most efficient search strategies - much like ants exploiting high-yield niches in their environment - ensuring that every exploration is efficiently converted into actionable intelligence.
This approach significantly reduces auxiliary tasks and bottlenecks associated with conventional red team jailbreaking campaigns, scales effectively without loss of computational efficiency, and maintains above 95 percent resource utilization across 1,920 GPUs on Frontier.
"When we're talking about running something at this scale, it becomes difficult to use as much of the available compute power as possible. Since you are running at such a large scale, eliminating resource downtime is not trivial," Jack Hutchins, ORNL robust AI engineer, said. "There is still downtime when resources are waiting for what to do next but maintaining 95 percent utilization is very high."
DeepHyper's exploration strategy prioritizes potentially impactful parameters while ensuring coverage across the entire parameter space. As a result, Photon can detect both obvious and subtle vulnerabilities, offering a comprehensive understanding of model performance under adversarial conditions.
"Since our goal is to find highly effective jailbreaks, finding the parameters that have the most effect quickly speeds up our search for effective jailbreaks," Hutchins said.
In a market where AI integrations drive critical operations at a global scale, ensuring the reliability, robustness and safety of these systems has never been more vital. Photon not only provides a window into the vulnerabilities that may lie within our AI models, but it also offers a pathway to rapid remediation, thus safeguarding the integrity and performance of mission-critical systems.
"Photon represents a paradigm shift in how we approach AI security. By running coordinated, high-scale experiments, we can uncover hidden vulnerabilities far more efficiently than ever before," Begoli said. "This technology ensures our AI advancements can continue bringing much-needed innovation to a wide variety of industries without also introducing safety or security risks."
Frontier is housed in ORNL's Oak Ridge Leadership Computing Facility, a DOE Office of Science user facility.
UT-Battelle manages ORNL for DOE's Office of Science, the single largest supporter of basic research in the physical sciences in the United States. The Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit energy.gov/science .