When I began working on autonomous cyber agents in 2020, the timeline for real-world deployment was still measured in decades. At the time, these systems were seen as long-range bets — interesting but still mostly niche improvements for any near-term application.
Then, something changed.
While generative AI (GenAI) wasn’t one, singular event, it unleashed an ongoing cascade of advances that are, to this day, causing development timelines to collapse at a continuously accelerating rate. This isn’t just a case of moving the goal; the GenAI-driven wave is relentlessly bulldozing old benchmarks and redefining the frontier of what’s possible, faster than we’ve ever experienced before. Capabilities once reserved for long-term research are now being integrated into live environments with astonishing speed.
Startlingly, but not surprisingly, agentic systems are being embedded in countless locations — company workflows, decision-making pipelines and even critical infrastructure — often before we’ve established how to govern or secure them. The year 2020 seems a lifetime ago considering we’re no longer preparing for the arrival of agentic AI but responding to its continued and rapid evolution.
A Paper for a Moving Target
The workshop report I’ve co-authored, Achieving a Secure AI Agent Ecosystem, is the product of a cross-institutional effort to make sense of this acceleration. Developed in partnership with RAND, Schmidt Sciences, and leading minds in agentic AI from across industry, academia, and government, the paper doesn’t offer silver bullets but rather a different way to think about and approach agentic AI.
The crux of the paper outlines three foundational security pillars for AI agents and suggests where our current assumptions — and infrastructure — might falter as these systems evolve. Beyond simply acknowledging current realities, this argues for a profound mindset shift: We must recognize that the age of agentic systems is already upon us. Consequently, securing these systems is not a problem for tomorrow. It’s an urgent challenge today that’s intensified by the relentless pace of innovation, expanding scale, uneven risks for early adopters and the stark asymmetry between attack capabilities and defense goals.
One of the challenges in securing AI agents is that these systems don’t look or behave like traditional software. They are dynamic, evolving and increasingly capable of executing decisions with minimal oversight. Some are purpose-built to automate tasks like scheduling or sorting email; others are inching toward fully autonomous action in high-stakes environments. In either case, the frameworks we use to secure traditional applications aren’t enough. We are encountering problems that aren’t merely variations on known vulnerabilities but are fundamentally new. The attack surface has shifted.
Three Pillars for AI Agent Security
This mindset shift is why the security landscape has been organized around three core concerns:
- Protecting AI agents from third-party compromise: How to safeguard the AI agents themselves from being taken over or manipulated by external attackers.
- Protecting users and organizations from the agents themselves: How to ensure that the AI agents, even when operating as intended or if they malfunction, do not harm their users or the organizations they serve.
- Protecting critical systems from malicious agents: How to defend essential infrastructure and systems against AI agents that are intentionally designed and deployed to cause harm.
These categories are not static — they are points along a spectrum of capability and threat maturity. Today, most organizations that deploy agents are dealing with the first two concerns. But the third — malicious, autonomous adversaries — looms large. Nation-states were among the first to invest in autonomous cyber agents.1 They may not be alone for long.
Navigating this new era of potent, widespread autonomous threats, therefore, demands far more than incremental refinements to existing defenses. It requires a foundational shift in how our expert communities must collaborate and innovate on security.
Historically, AI researchers and cybersecurity professionals often operated on parallel tracks, holding different assumptions about risk and architecture. Yet, the complex frontier of agentic AI security demands their unified effort, as neither community can tackle these immense challenges in isolation — making deep, sustained collaboration paramount. And while universal protocols and comprehensive best practices for this entire field are still maturing, the notion that effective turnkey products for securing agents are scarce is, frankly, becoming outdated. Sophisticated, deployable solutions are now offering vital, specialized protection for critical agentic systems, signaling tangible progress. This further underscores the urgent need for adaptive, multilayered security strategies — spanning model provenance, robust containment and resilient human-in-the-loop controls — all evolving as rapidly as the agents themselves.
Interventions Within Reach
While robust and evolving product solutions are increasingly crucial in mitigating the immediate operational risks posed by agentic AI, achieving comprehensive, long-term security also necessitates dedicated industry-wide investment in foundational capabilities and shared understanding. Several such key directions, complementing product innovation, are well within our collective reach and warrant-focused effort.
For instance, a kind of “agent bill of materials,” modeled after the “software bill of materials,” is envisioned to provide visibility into an agent’s components like its model, training data, tools and memory. However, its functional viability currently faces hurdles, such as the lack of a common system for model identifiers, which is crucial for such transparency.
Additionally, standardized, predeployment test beds could allow for scalable, scenario-based evaluations before agents are released into production environments. And communication protocols like MCP (Model Context Protocol) and A2A (Agent-to-Agent) are emerging, but few have security baked in from the start. However, even when security measures are integrated from the outset, the prevalence of “unknown unknowns” in these novel agentic systems means these protocols will require rigorous and continuous assessment to maintain their integrity and safety.
One approach our paper attempts to navigate is the critical challenge that an agent’s memory, while essential for it to learn, improve, and crucially avoid repeating past mistakes, is also a significant vulnerability that can be targeted for malicious tampering. The strategy involves using “clone-on-launch” or task-specific agent instances. In this model, agents designed for particular operational duties or limited-duration interactions treat their active working memory as ephemeral. Once their specific task or session is complete, these instances can be retired, with new operations handled by fresh instances that are initialized from a secure, trusted baseline.
This practice aims to significantly reduce the risk of persistent memory corruption or the lingering effects of tampering that might occur within a single session. It is paramount, however, that such a system is meticulously architected to ensure an agent’s core foundational knowledge and long-term learned lessons are securely maintained, protected against tampering, and effectively and safely accessible to inform these more transient operational instances. While managing operational states in this manner is not a comprehensive solution to all memory-related threats, it represents the kind of creative, systems-level thinking required for advancing agent security and robust containment.
A Call for Shared Commitment
Ultimately, securing agentic AI will not come from any single breakthrough but from a sustained, multistakeholder effort. These include researchers, policymakers, practitioners and industry leaders working together across disciplines. The threats are both technological and foundational. We are trying to secure systems that we do not yet fully understand. But if there’s one thing the last few years have made clear, it’s this: Waiting to act until the picture is complete means acting too late.
The evolution of agentic AI means our industry is developing critical safeguards concurrently with its widespread adoption. This simultaneous development isn’t inherently a crisis, but a clear call for collective responsibility. Our success in this endeavor hinges on a shared industry commitment to building these foundational elements with transparency, rigorous standards and a unified vision for a trustworthy AI ecosystem.
Read the full paper: Achieving a Secure AI Agent Ecosystem.
1Autonomous Cyber Defence Phase II, Centre for Emerging Technology and Security, May 3, 2024.