A research team from Northeastern, NYU, UCSD, and UIUC just published a paper that systematically maps the attack surface of agentic AI systems. The paper, "Agentic AI as a Cybersecurity Attack Surface: Threats, Exploits, and Defenses in Runtime Supply Chains" by Xiaochong Jiang, Shiqi Yang, Wenting Yang, Yichen Liu, and Cheng Ji, was accepted at the C4AI4C Workshop at CAI 2026. It introduces a comprehensive threat taxonomy that covers everything from indirect prompt injection to self-replicating agent worms.
Why does this matter to the Aguara community? Because the attack classes the paper identifies map directly to the detection categories Aguara has been building for months. This is not a coincidence. The researchers arrived at the same threat landscape through academic rigor that we arrived at through scanning 42,969 real-world AI agent skills. When independent research validates independent engineering, that is a strong signal.
This post walks through the paper's framework, explains each attack class with the specific examples the authors provide, and maps every one of them to Aguara's 13 detection categories and 153 rules. We will also be honest about where static analysis reaches its limits and where runtime defense takes over.
The Paper and Why It Matters
The core insight of the paper is what the authors call stochastic dependency resolution. Traditional software has a fixed dependency graph: you install a package, and its transitive dependencies are deterministic. Agentic AI systems are different. An agent assembles its execution context at runtime through probabilistic decisions. Which tool gets called depends on the prompt, the model's interpretation, and the available context. The dependency graph is not a DAG (directed acyclic graph). It is a cyclic graph where outputs re-enter as inputs.
This distinction matters because it means the attack surface is not static. It shifts with every invocation. The authors formalize this with the Man-in-the-Environment (MitE) adversary model, where an attacker does not need to compromise the agent directly. They only need to poison one input source, one tool description, or one knowledge base entry, and the agent's own probabilistic reasoning will propagate the attack.
The paper organizes threats into two supply chains: Data and Tool. Each has distinct attack vectors, but they converge in what the authors call the Viral Agent Loop, where compromised outputs from one agent become inputs to another, creating self-replicating attack cycles.
Two Supply Chains: Data and Tool
The paper's taxonomy splits cleanly into two attack surfaces. Understanding this split is essential because the defenses are different for each.
The Data Supply Chain
The Data Supply Chain covers everything an agent reads, retrieves, or remembers. Three attack classes live here:
- Indirect Prompt Injection: adversarial instructions embedded in external content that the agent retrieves and treats as trusted input. The agent reads a web page, a document, or an API response containing hidden instructions, and follows them. This is the most widely studied attack, but the paper contextualizes it as just the entry point for deeper supply chain compromises.
- Knowledge Base Contamination: the authors reference PoisonedRAG and AGENTPOISON, two techniques for injecting adversarial content into RAG (Retrieval-Augmented Generation) vector databases. The attack does not target the model. It targets the retrieval layer. By crafting documents that are semantically similar to legitimate queries, the attacker ensures their poisoned content gets retrieved and injected into the agent's context window.
- Long-Term Memory Poisoning: the paper highlights MINJA (Memory INJection Attack), which targets agent memory systems. Unlike prompt injection (which is ephemeral) or RAG poisoning (which targets retrieval), MINJA poisons the agent's persistent memory. The injected memories influence future interactions across sessions, making this a persistent backdoor.
If you have read our OWASP Agentic Top 10 mapping, these attack classes will look familiar. The academic framing adds precision, particularly around the distinction between ephemeral injection and persistent memory compromise.
The Tool Supply Chain
The Tool Supply Chain covers everything an agent can call, execute, or invoke. The paper breaks this into three phases, each with distinct attack vectors:
Phase 1: Discovery. Before an agent can use a tool, it must discover it. Two attacks target this phase:
- Hallucination Squatting: the agent hallucinates a tool name that does not exist, and the attacker registers a malicious tool with that exact name. This is the agentic equivalent of typosquatting in package registries. The agent confidently calls a tool that was never intended to exist, and the attacker controls its implementation.
- Semantic Masquerading: a malicious tool uses a name and description designed to shadow a legitimate tool. The agent selects the malicious tool because its description is a better semantic match for the query. This is the tool injection pattern we have documented extensively in MCP deployments.
Phase 2: Implementation. Once a tool is discovered, the agent trusts its code. Two attacks target this trust:
- Hidden Backdoors: malicious logic embedded in tool code that activates under specific conditions. The tool works correctly for most inputs but exfiltrates data, modifies outputs, or escalates privileges when triggered.
- Transitive Dependency Exploitation: the tool itself may be clean, but it depends on a compromised package. This is the classic supply chain attack (think event-stream, ua-parser-js) applied to the agent tool ecosystem. The attack surface is amplified because agents pull tools dynamically, often via
npx -yor equivalent mechanisms that skip verification entirely.
Phase 3: Invocation. When the agent calls a tool, the call itself becomes an attack vector. Three attacks target this phase:
- Over-Privileged Invocation: the tool has more permissions than the task requires, and the agent does not constrain them. A tool that can read any file on disk will read
/etc/shadowif the prompt says to. - Argument Injection: the agent constructs tool arguments from untrusted input, and the attacker injects additional arguments or entirely different commands. The paper specifically discusses SSRF via CPRF (Cross-Protocol Request Forgery), where a tool intended to make HTTP requests is tricked into making requests to internal services.
- Credential Theft: tools that handle credentials, API keys, or tokens without proper isolation. The agent passes secrets to a tool that logs them, stores them in an accessible location, or sends them to an external endpoint.
The Viral Agent Loop: Why Static Scanning Matters More Than We Thought
The most significant contribution of the paper, in our view, is the formalization of the Viral Agent Loop. Traditional software dependency graphs are DAGs. Information flows in one direction: from dependencies to dependents. Agentic AI systems break this property. Agent outputs re-enter as inputs, creating cycles.
"Unlike traditional software supply chains modeled as directed acyclic graphs (DAGs), agentic systems form cyclic dependency structures where outputs of one component re-enter as inputs to another."
The authors use the Morris II worm as their primary example. Morris II embeds self-replicating adversarial prompts into agent outputs. When Agent A processes poisoned content and produces output, that output contains the same adversarial prompt. When Agent B reads Agent A's output as input, it becomes infected and propagates the prompt to Agent C. The worm spreads through the agent ecosystem without ever exploiting a traditional software vulnerability.
This has a direct implication for static analysis. If you catch the poisoned content, the backdoored tool, or the malicious description before deployment, the viral loop never starts. Once the payload enters the runtime, every connected agent becomes a potential propagation vector. The cost of missing a threat pre-deployment is not linear. It is exponential, because the cyclic graph amplifies every missed detection.
This is why we built Aguara as a pre-deployment scanner. Catching a SUPPLY_017 hardlink escape in a tool definition before it ships is not the same as catching it at runtime after three agents have already invoked it. The paper validates this design decision from a theoretical perspective: break the cycle at the source, not at the sink.
Mapping Paper Attacks to Aguara Detection Categories
Here is the complete mapping between the paper's attack taxonomy and Aguara's detection categories. Every attack class the paper identifies has at least one corresponding Aguara detection category. Some map to multiple categories because the attack can manifest in different ways depending on the agent's configuration.
| Paper Attack Class | Aguara Category | Rules |
|---|---|---|
| Indirect Prompt Injection | Prompt Injection | 17 + NLP |
| Knowledge Base Contamination (PoisonedRAG, AGENTPOISON) | Third-Party Content | 5 |
| Long-Term Memory Poisoning (MINJA) | Third-Party Content + Indirect Injection | 5 + 6 |
| Hallucination Squatting | External Download | 17 |
| Semantic Masquerading | MCP Attack (tool injection, name shadowing) | 12 |
| Hidden Backdoors | Supply Chain | 15 |
| Transitive Dependency Exploitation | External Download + Supply Chain | 17 + 15 |
| Over-Privileged Invocation | Command Execution | 16 |
| Argument Injection (SSRF via CPRF) | SSRF & Cloud | 10 |
| Credential Theft | Credential Leak | 19 |
| Data Exfiltration | Data Exfiltration | 16 + NLP |
| Source-to-Sink Taint | Toxic Flow | 3 |
| Viral Agent Loop Propagation | Rug-Pull Detection (hash-based change tracking) | Dynamic |
Let us walk through the most interesting mappings in detail.
Prompt Injection: 17 Rules + NLP Engine
The paper treats indirect prompt injection as the primary entry point for Data Supply Chain attacks. Aguara's Prompt Injection category uses 17 pattern-based rules to detect known injection signatures in tool descriptions, system prompts, and configuration files. The NLP engine adds semantic analysis for injection patterns that do not match known signatures but exhibit the structural characteristics of adversarial prompts (imperative instructions, role overrides, context resets).
The paper's contribution here is framing injection as the first step in a chain, not an isolated attack. A prompt injection in an MCP tool description is not just a prompt injection. It is the entry point for tool invocation hijacking, data exfiltration, and potentially viral propagation.
MCP Attack: 12 Rules for Tool Injection and Name Shadowing
The paper's "Semantic Masquerading" maps directly to what Aguara calls MCP Attack detection. Our 12 MCP rules cover tool name shadowing (a malicious tool registering the same name as a trusted tool), description injection (embedding adversarial instructions in tool descriptions that override agent behavior), and approval-execution binding gaps (MCP_013) where the tool shown at approval time differs from what executes.
The paper's formalization of the Discovery phase validates why we built dedicated MCP detection as a separate category rather than folding it into generic prompt injection. The attack surface at the tool discovery layer is structurally different from injection in content, and it requires different detection patterns.
External Download + Supply Chain: 32 Rules Across Two Categories
The paper's hallucination squatting and transitive dependency attacks span two Aguara categories. External Download (17 rules) catches tools that pull code or data from unverified sources at runtime: curl | bash patterns, npx -y without pinned versions, dynamic imports from untrusted URLs. Supply Chain (15 rules) catches patterns within the tool code itself: hardcoded backdoor URLs, obfuscated payloads, hardlink escapes (SUPPLY_017).
The paper's insight about transitive dependencies is particularly relevant. An MCP tool that calls npx -y some-package is not just an external download risk. It is a transitive dependency risk because that package has its own dependencies, none of which the agent or the user ever reviewed. Aguara flags the entry point. The full transitive graph requires runtime analysis.
Toxic Flow: 3 Rules for Source-to-Sink Taint
The paper discusses neuro-symbolic information flow control as a defense, specifically runtime taint analysis that tracks data from untrusted sources to sensitive sinks. Aguara's Toxic Flow category is the static analysis counterpart: 3 rules that detect configurations where user-controlled input flows directly to dangerous operations (command execution, file writes, network requests) without sanitization or validation.
Three rules is a small number. We are transparent about that. Taint analysis in static configuration scanning is fundamentally limited compared to runtime taint tracking. But these three rules catch the most dangerous patterns: direct piping of tool output to shell execution, unsanitized URL construction from user input, and file path construction from external parameters.
Rug-Pull Detection: Hash-Based Change Tracking
The paper's Viral Agent Loop is perhaps the hardest attack class to detect with static analysis alone. By definition, viral propagation is a runtime phenomenon. However, Aguara's Rug-Pull Detection provides a pre-deployment safety net: hash-based tracking of tool definitions, descriptions, and configurations. If a tool's content changes between scans, Aguara flags it. This catches the "setup" phase of a viral attack, where the attacker modifies a tool after it has been reviewed and approved.
Combined with Aguara Watch scanning 42,969 skills four times daily, rug-pull detection operates as a continuous monitoring layer. A tool that was Grade A yesterday and Grade C today gets flagged before it can enter an agent's runtime.
What Aguara Catches Today vs. What Needs Runtime Defense
We want to be direct about this. Static analysis is powerful, but it has boundaries. The paper's taxonomy helps us draw those boundaries clearly.
Where Aguara is strong (pre-deployment, static)
- Tool discovery attacks (hallucination squatting, semantic masquerading): Aguara scans tool descriptions and names before an agent ever sees them. 12 MCP rules + 17 External Download rules.
- Implementation attacks (hidden backdoors, transitive dependencies): Aguara scans tool code and configurations for known malicious patterns. 15 Supply Chain rules.
- Credential exposure: 19 rules detect secrets, API keys, OAuth verifiers, and tokens in tool definitions, configurations, and environment files.
- Prompt injection in configurations: 17 rules + NLP analysis catch adversarial instructions embedded in tool descriptions, system prompts, and MCP skill definitions.
- Rug-pull detection: hash-based change tracking catches tool modifications between deployments.
Where runtime defense is needed (Layer 3)
- Viral Agent Loop propagation: once a payload enters the runtime, it propagates through agent-to-agent communication. Static analysis catches the initial payload. Runtime enforcement prevents propagation.
- Dynamic argument injection: when the agent constructs tool arguments from live user input, the injection happens at invocation time. Aguara detects the patterns that enable this (unsanitized argument construction), but blocking the actual injection requires a runtime proxy.
- Over-privileged invocation at scale: Aguara flags tools with overly broad permissions. Enforcing least-privilege at every invocation requires a runtime gateway that inspects and constrains each tool call.
- Real-time taint tracking: the paper's neuro-symbolic information flow control is inherently a runtime capability. Aguara's 3 Toxic Flow rules catch the static configuration patterns. Full taint analysis across live agent interactions requires runtime instrumentation.
The Paper's Defense Framework and Our Architecture
The paper proposes a Zero-Trust Runtime Architecture with three pillars. It is worth mapping these to our existing stack:
Pillar 1: Deterministic Capability Binding. Cryptographic registries that bind tool identities to verified implementations. Aguara Watch provides the monitoring side of this: scanning 7 registries, tracking changes, flagging modifications. The enforcement side (rejecting unverified tools at runtime) is a gateway function.
Pillar 2: Neuro-Symbolic Information Flow Control. Runtime taint analysis that tracks data flow from untrusted sources to sensitive sinks. Aguara's Toxic Flow rules provide the static detection layer. Runtime taint tracking requires instrumentation at the agent framework level.
Pillar 3: Auditor-Worker Architecture. A secondary model that reviews and validates the primary agent's actions in real time. This is purely a runtime capability. Static analysis contributes by ensuring the auditor's own tool definitions and configurations are clean, but the auditor's real-time judgment is outside the scope of pre-deployment scanning.
The paper's three pillars map to a layered defense: scan before deployment (Aguara), monitor continuously (Aguara Watch), enforce at runtime (Oktsec). No single layer is sufficient. The paper makes this explicit, and we agree.
What This Means for the Community
If you are building or deploying AI agents, this paper gives you a structured way to think about the attack surface. Here is our practical guidance:
- Scan your MCP skills before deployment. Every tool description, every configuration file, every dependency declaration. Aguara catches the attack classes the paper identifies in the Discovery and Implementation phases. Run it as part of your CI/CD pipeline.
- Monitor the registries you depend on. Aguara Watch scans 42,969 skills across 7 registries four times daily. If a tool you depend on changes behavior, you want to know before your agent calls it.
- Treat tool descriptions as untrusted input. The paper's semantic masquerading and tool injection attacks exploit the fact that agents trust tool descriptions implicitly. Your security posture should treat every tool description with the same suspicion you treat user input.
- Plan for runtime defense. Static analysis catches the setup, not the execution. If your agents communicate with other agents, process external content, or invoke tools with user-controlled arguments, you need runtime enforcement. The paper's Zero-Trust Runtime Architecture is a good target architecture.
The paper is available on arXiv: arXiv:2602.19555v1. We recommend reading it in full. The threat taxonomy is rigorous, the examples are concrete, and the defense framework is practical.
Independent academic research arriving at the same threat landscape that we built detection rules for is the kind of validation that matters. Not because it proves we are right, but because it confirms the threats are real, the attack surface is well-defined, and the community can now speak about these risks with a shared vocabulary.
Scan your skills. Monitor your registries. Defend your runtime. The attack surface is mapped. The detection rules exist. Use them.
Scan your MCP skills against the full attack taxonomy
Aguara ships 153 detection rules across 13 categories, covering every attack class identified in the paper. Open source, Apache-2.0.