L3.2.1 — Indirect prompt injection: the architectural pattern¶

Type: Theory · Duration: ~5 min · Status: Mandatory Module: Module 3 — Prompt Injection & LLM Application Attacks Framework tags: OWASP LLM01 · MITRE ATLAS AML.T0051.001 (Indirect Prompt Injection), AML.T0070 (RAG Poisoning)

Learning objectives¶

Define indirect prompt injection by contrast with direct PI.
Identify the architectural property that enables it (the same property in two sentences).

Core content¶

Definition by contrast¶

Direct prompt injection (L3.1.1): the attacker types the payload. The LLM-using application accepts the payload through the user-input channel.
Indirect prompt injection: the attacker plants the payload in content that the LLM later consumes through a non-user channel — a retrieved document, a tool output, a web page browsed by an agent, an email summarized by an assistant. The asking user is innocent; the model is reading attacker-controlled data.

Indirect is more dangerous because: - The attacker doesn't need access to the LLM application. - The asking user has no idea anything is happening. - The payload sits dormant in the data store until a query triggers retrieval.

The architectural property that enables it¶

Same as direct PI, with one extra sentence: LLMs do not distinguish instructions from data, and the model's context window concatenates the system prompt, the user query, and any retrieved or tool-supplied content into a single token stream. Whatever the retriever returns, the LLM treats as part of its instruction context.

Said another way: every system that ever feeds untrusted content into an LLM's context is — by architectural design — an indirect-prompt-injection target. RAG systems. Agentic browsers. Email assistants. Code-summarizers reading checked-in comments. Document-summarizers reading user-uploaded PDFs. All of them.

The minimum viable indirect-PI scenario¶

Three components:

A deployed LLM application that retrieves content from an external store.
An attacker who can write to that store (directly, via a feature, via social engineering of an authorized writer).
A payload in the store, crafted to influence the LLM when retrieved.

That's it. The user who triggers retrieval is not party to the attack. They simply ask a question that causes the retrieval pipeline to surface the attacker's content alongside the legitimate sources.

Why this is the bigger category¶

Direct PI requires the attacker to interact with the LLM application. The attacker is bounded by the application's user surface — they can hit rate limits, they can be blocked, their account can be banned. Indirect PI bypasses all of that. The attacker writes to a data store once. Then every legitimate user who queries the right thing carries the payload to the model.

This asymmetry — one write, many executions, by users who are never part of the attack — is why indirect PI dominates the 2025–2026 incident track record (EchoLeak being the canonical example).

Real-world example¶

EchoLeak (M365 Copilot, 2025) — third revisit, this time as an architectural case study. Three components, exact mapping: (1) deployed LLM application = M365 Copilot, which retrieves from the user's tenant data; (2) attacker = anyone who can send the user an email; (3) payload = crafted email content. The user asks Copilot a routine question that causes Copilot to pull recent emails as context. The injection rides in. No user interaction with the attacker; the user is the victim, not the attacker. The minimum-viable indirect-PI scenario, instantiated in a Fortune-500-grade product.

Key terms¶

Indirect prompt injection — payload delivered through a non-user channel.
Retrieval-triggered execution — the asking user's query triggers the payload's reaching the model.
One-to-many asymmetry — attacker writes once; payload executes for many subsequent users.

References¶

Greshake et al., "Not what you've signed up for" (USENIX 2023) — https://arxiv.org/abs/2302.12173
Aim Security EchoLeak disclosure (2025).
Simon Willison's blog — extensive writing on indirect PI as the dominant 2024–2026 attack class.

Quiz items¶

Q: In one sentence, the difference between direct and indirect prompt injection? A: Direct = attacker types the payload through the user channel; indirect = attacker plants the payload in content the LLM later consumes via a non-user channel (retrieved doc, email, web page, tool output).
Q: Why is indirect PI structurally more dangerous than direct PI? A: One-to-many asymmetry — attacker writes once, payload executes for many subsequent legitimate users who are never part of the attack.

Video script (~580 words, ~4 min)¶

[SLIDE 1 — Title]

Indirect prompt injection. The architectural pattern. Five minutes. By the end you'll be able to define it by contrast with direct PI, identify the architectural property that enables it, and name the minimum-viable indirect-PI scenario.

[SLIDE 2 — Definition by contrast]

Definition by contrast. Direct PI: the attacker types the payload. Indirect PI: the attacker plants the payload in content the LLM later consumes through a non-user channel — a retrieved document, a tool output, a web page browsed by an agent, an email summarized by an assistant. The asking user is innocent. The model is reading attacker-controlled data.

Indirect is more dangerous. The attacker doesn't need access to the LLM application. The asking user has no idea anything is happening. The payload sits dormant in the data store until a query triggers retrieval.

[SLIDE 3 — The architectural property]

The architectural property that enables it. Same as direct PI with one extra sentence. LLMs do not distinguish instructions from data, and the model's context window concatenates the system prompt, the user query, and any retrieved or tool-supplied content into a single token stream. Whatever the retriever returns, the LLM treats as part of its instruction context.

Said another way: every system that ever feeds untrusted content into an LLM's context is — by architectural design — an indirect-prompt-injection target. RAG systems. Agentic browsers. Email assistants. Code-summarizers reading checked-in comments. Document-summarizers reading user-uploaded PDFs. All of them.

[SLIDE 4 — Minimum viable indirect-PI scenario]

Minimum viable indirect-PI scenario. Three components. One: a deployed LLM application that retrieves content from an external store. Two: an attacker who can write to that store — directly, via a feature, via social engineering of an authorized writer. Three: a payload in the store, crafted to influence the LLM when retrieved. That's it. The user who triggers retrieval is not party to the attack. They simply ask a question that causes the retrieval pipeline to surface the attacker's content alongside the legitimate sources.

[SLIDE 5 — Why this is the bigger category]

Why this is the bigger category. Direct PI requires the attacker to interact with the LLM application. The attacker is bounded by the application's user surface. They can hit rate limits. They can be blocked. Their account can be banned. Indirect PI bypasses all of that. The attacker writes to a data store once. Then every legitimate user who queries the right thing carries the payload to the model. One write. Many executions. By users who are never part of the attack. This asymmetry is why indirect PI dominates the 2025-2026 incident track record.

[SLIDE 6 — EchoLeak as architectural case study]

EchoLeak, third revisit, this time as an architectural case study. Three components, exact mapping. Deployed LLM app: M365 Copilot retrieving from the user's tenant data. Attacker: anyone who can send the user an email. Payload: crafted email content. The user asks Copilot a routine question that causes Copilot to pull recent emails as context. The injection rides in. No user interaction with the attacker. The user is the victim, not the attacker. The minimum-viable indirect-PI scenario, instantiated in a Fortune-500-grade product.

[SLIDE 7 — Up next]

Next lesson: indirect-PI delivery vectors. Six common ones, all of them present in real production systems. Then we go hands-on. See you there.

Slide outline¶

Title — "Indirect PI: the architectural pattern".
Definition by contrast — split: Direct (attacker → user channel → LLM) vs Indirect (attacker → data store → retrieved → LLM).
Architectural property — context-window diagram with all sources flowing into one stream.
Minimum viable scenario — three-component diagram with labels.
One-to-many asymmetry — single attacker arrow into store, many user arrows out triggering payload.
EchoLeak case study — mapped onto the 3-component diagram.
Up next — "L3.2.2 — Delivery vectors, ~5 min."

Production notes¶

Recording: ~4 min. Cap 5.
Slide 5 (one-to-many asymmetry) is the slide learners will quote.