Researchers have developed a novel framework, termed PDJA (Perception–Decision Joint Attack), that leverages artificial intelligence (AI) to address a long-standing challenge in the security of multi-agent reinforcement learning (MARL) systems: how to effectively disrupt coordinated agents under realistic threat models. The new method improves both attack effectiveness and cross-layer vulnerability exploitation, opening new opportunities for evaluating the robustness of AI-driven autonomous systems such as robotics, traffic control, and distributed decision-making platforms.
What's New?
In recent years, adversarial attacks have become an increasingly important tool for evaluating the reliability of AI systems. However, most existing attacks on MARL focus on either state perturbations or action manipulation in isolation. This fragmented design limits their impact and often fails to reflect real-world adversarial conditions, where perception and decision processes are tightly coupled.
To overcome these limitations, the authors developed PDJA, a unified framework that jointly perturbs both observations and actions. By explicitly modeling the interaction between perception-level and decision-level vulnerabilities, the approach identifies synergistic attack directions that are invisible to single-vector methods.
How It Works
The proposed method operates in two coordinated stages. First, a perception perturbator subtly alters agents' observations to mislead their internal representations. These distorted states are then fed into the policy and value networks, where a decision perturbator further modifies intermediate actions. By exploiting the critic's gradient sensitivity over the joint state–action landscape, PDJA steers agents toward low-reward regions that maximize collective performance degradation.
This joint design allows researchers to:
• Systematically amplify the effect of small perturbations through cross-layer coupling
• Induce larger coordinated action deviations than state-only or action-only attacks
• Reveal hidden vulnerabilities in cooperative policies that appear robust under isolated attacks
Validation and Results
The researchers validated PDJA on representative multi-agent benchmarks, including cooperative control tasks based on actor–critic architectures. Compared with state-only and action-only baselines, PDJA consistently achieved lower team rewards, demonstrating stronger attack capability. Quantitative analysis further showed a synergy ratio greater than one, confirming that the joint perturbation produces a destructive effect beyond the sum of its individual components. Importantly, PDJA was also able to bypass several existing defense mechanisms, highlighting previously unrecognized weaknesses in current robust MARL designs.
Why It Matters
As multi-agent reinforcement learning is increasingly deployed in safety-critical domains—such as autonomous driving, smart grids, and distributed robotics—understanding and stress-testing system vulnerabilities becomes essential. Many existing robustness evaluations underestimate risk by treating perception and decision modules independently. By exposing how small, coordinated perturbations can cascade across these layers, this work provides a more realistic and rigorous benchmark for AI security.
The authors emphasize that PDJA is not intended as a destructive tool, but rather as a diagnostic framework to guide the development of more resilient multi-agent systems and defense strategies.
What's Next?
Future work will extend PDJA to more complex environments with partial observability, heterogeneous agents, and uncertainty-aware learning. The team also plans to investigate new defense mechanisms specifically designed to counter joint perception–decision attacks, with the goal of improving the safety and trustworthiness of real-world multi-agent AI applications.
Journal Information
This research was published in Artificial Intelligence and Autonomous Systems (AIAS).
Guo W, Liu G, Zhou Z. Beyond single-vector threats: perceptual and decisional joint attacks in multi-agent deep reinforcement learning. Artif. Intell. Auton. Syst. 2026(1):0001, https://doi.org/10.55092/aias20260001