
Advancing the frontier of
multi-agent AI safety
Studying the dynamics of multi-agent AI interaction and advancing the science and governance for safe deployment
![AdobeStock_242875977-[Converted]_edited_edited_edited_edited.jpg](https://static.wixstatic.com/media/708b93_0e718ce5f4df41439ad8b25c9b3aa903~mv2.jpg/v1/fill/w_391,h_503,al_c,q_80,usm_0.66_1.00_0.01,enc_avif,quality_auto/708b93_0e718ce5f4df41439ad8b25c9b3aa903~mv2.jpg)
Current Focus
Our research targets the gaps between how AI systems are built in isolation and how they behave when deployed together at scale. Across four interconnected streams, we investigate failures that arise not from individual agents, but from their interactions, propagation dynamics, and collective vulnerabilities.
Contagion
Studying how errors, biases, and adversarial content propagate and amplify through networks of interacting AI agents
Societal Impact
Analyzing where existing policy and legal frameworks fail to address emergent harms from multi-agent interactions.
Benchmarking
Developing evaluation frameworks to detect emergent behaviors and collective failures before multi-agent deployment.
Security Risks
Investigating novel attack surfaces and cascading vulnerabilities that arise when multiple AI agents interact.

Our Mission
AI systems no longer operate in isolation. They interact, adapt, and converge into collectives whose behaviors transcend their individual parts. Aligning a single agent solves only part of the problem. When systems meet, emergent risks arise that no single-agent framework can foresee.
This is our sole focus. The science and governance of AI agents acting together at scale.
We approach this as a fundamental research challenge: understanding interaction dynamics, anticipating failure modes, and building the evaluation methods and policy frameworks that will shape whether AI collectives remain aligned with human interests.

