L7.3.2 — Structured output and the dual-LLM pattern¶

Type: Theory · Duration: ~5 min · Status: Mandatory Module: Module 7 — Securing the AI Pipeline (MLSecOps & Defenses) Framework tags: OWASP LLM01, LLM02, LLM08

Learning objectives¶

Explain why structured output is the highest-ROI guardrail for many LLM apps.
Describe the dual-LLM (privileged / quarantined) pattern and identify when it's worth the cost.

Core content¶

Structured output as defense — recap¶

Module 3 introduced structured output as a defense against LLM01/LLM02. The principle: instead of letting the model emit free-form text, constrain its output to a schema (JSON, typed function call, allow-listed strings). The attacker can still try to inject, but the only thing they can inject is something the schema allows.

In 2026, every major LLM provider supports structured output natively: - OpenAI — response_format={"type": "json_schema", ...} plus tool/function calling. - Anthropic — tool use with strict schemas; the Messages API enforces tool-input schemas. - Local LLMs (llama.cpp / Ollama / vLLM) — grammar-constrained decoding (GBNF, Outlines, lm-format-enforcer).

The defense is roughly free in terms of latency (no extra inference) and cheap in cost (the model's own decoding is constrained). The trade-off: only applicable when the response space is naturally schema-constrained.

When structured output is the right primary defense¶

Three categories of LLM app where structured output collapses the bulk of the attack surface:

1. Classification / extraction tasks. "Is this email spam?" "Extract the order ID from this text." Output is a small enumerated set. Define the enum; force the schema; attackers can only return one of the allowed values.

2. Tool-using agents. Each tool has typed parameters. The model's job is to select the right tool and supply valid arguments. Function calling does this natively. The agent can still be tricked into the wrong tool call, but it can't free-write commands.

3. RAG-with-citations. The output schema is {answer: str, citations: list[str]}. Even if the model emits malicious text inside answer, downstream rendering can treat it as untrusted content; citations are validated against the actual retrieved chunks.

When structured output is not the right primary defense:

1. Free-form chat. Open-ended conversations don't fit schemas. Other guardrails dominate.

2. Creative-writing applications. By design the output is unconstrained.

3. Summarization tasks where format is unpredictable. "Summarize this document" — schema would be overly limiting.

The dual-LLM pattern¶

When structured output isn't a fit and the application processes untrusted content (RAG, agent web browsing, document upload, email summarization), dual-LLM is the architectural pattern of last resort.

The split: - Privileged LLM — handles the user's request and orchestrates tools. Never sees raw untrusted content. Only sees structured summaries of untrusted content produced by the quarantined LLM. - Quarantined LLM — processes untrusted content (retrieved doc, web page, email). Output is constrained to a structured summary (extracted facts, flagged content). It never gets to influence tool calls or actions.

The data flow:

Untrusted content ──► Quarantined LLM ──► Structured summary
                                                 │
                                                 ▼
                User request ───► Privileged LLM ──► Tools / response

The architectural property: an injection landing in the quarantined LLM can corrupt the summary, but the summary's structure is enforced — at worst, the summary's facts field contains attacker-chosen text. That text might still reach the user, but it cannot directly trigger tool calls or steer the privileged LLM's behavior beyond data-in-the-summary level.

Cost of dual-LLM¶

Latency. Two LLM inferences per untrusted content unit. For RAG with 5 retrieved chunks, that's 5+ extra inferences per query.
Cost. Roughly 2x inference cost for the untrusted-content path.
Architecture complexity. Two LLM pipelines, structured-output validation, the orchestration logic between them.

When the cost is worth it: high-stakes agentic systems, RAG over untrusted corpora, anything where a successful indirect prompt injection has material consequences.

When it's overkill: simple chat apps with structured-output already in place, low-stakes summarization with output filters.

The pattern in production¶

In 2026, the dual-LLM pattern is the closest thing to an "indirect-PI-resistant" architecture available. It doesn't make indirect PI impossible; it makes it dramatically harder for the injection to translate into damage. Combined with structured output + input filters + output PII redaction, the stack covers most realistic threat models for production LLM apps.

This is what Lab L7.7 walks: take a vulnerable LLM app (the M3 vulnchat or vulnagent equivalent), wrap it in Llama Guard + structured output + a dual-LLM pattern, measure attack-success before and after.

Real-world example¶

Simon Willison popularized the "dual-LLM" framing in 2023 prompt-injection writing. Several production teams in 2024–2025 publicly adopted the pattern for high-stakes agents (notably some financial-services LLM products). The architectural pattern is the operational answer to "how do we defend indirect PI" in 2026, even if not a complete answer.

Key terms¶

Structured output — constraining model output to a schema (JSON, function call, enum).
Dual-LLM pattern — privileged LLM never sees raw untrusted content; quarantined LLM processes untrusted content into structured summaries.
Privileged LLM — orchestrates tools and produces user-facing response.
Quarantined LLM — processes untrusted content into structured-summary form.

References¶

L3.9 (the defense lab that built dual-LLM as defense #5).
Simon Willison's prompt-injection blog posts (2023-2026) covering dual-LLM.
OpenAI / Anthropic structured-output docs.

Quiz items¶

Q: Why is structured output the highest-ROI guardrail for many LLM apps? A: Because it collapses the attack surface by constraining what the model can emit to a schema, at near-zero latency cost; the attacker can only inject something the schema allows.
Q: When is structured output not the right primary defense? A: Free-form chat, creative-writing applications, summarization where format is unpredictable.
Q: Describe the dual-LLM pattern and identify its primary cost. A: Privileged LLM handles user request and tools, never sees raw untrusted content; quarantined LLM processes untrusted content into structured summaries. Primary cost: ~2x inference per untrusted-content unit, plus architectural complexity.

Video script (~620 words, ~4.5 min)¶

[SLIDE 1 — Title]

Structured output and the dual-LLM pattern. Five minutes.

[SLIDE 2 — Structured output recap]

Module 3 introduced structured output as a defense. The principle: instead of letting the model emit free-form text, constrain its output to a schema — JSON, typed function call, allow-listed strings. The attacker can still try to inject. The only thing they can inject is something the schema allows.

In twenty-twenty-six, every major LLM provider supports structured output natively. OpenAI: response_format equals json_schema, plus tool calling. Anthropic: tool use with strict schemas. Local LLMs: grammar-constrained decoding via GBNF, Outlines, lm-format-enforcer.

The defense is roughly free in latency and cheap in cost. Trade-off: only applicable when the response space is naturally schema-constrained.

[SLIDE 3 — When structured output is the primary defense]

Three categories where structured output collapses the bulk of attack surface. Classification or extraction tasks — "is this email spam?", "extract the order ID." Output is a small enumerated set. Define the enum, force the schema, attackers can only return allowed values.

Tool-using agents — each tool has typed parameters. Model's job is to select the right tool and supply valid arguments. Function calling does this natively. Agent can be tricked into the wrong tool call but can't free-write commands.

RAG with citations — output schema is {answer, citations}. Even if model emits malicious text inside answer, downstream rendering can treat it as untrusted; citations validated against actual retrieved chunks.

[SLIDE 4 — When it's NOT the right primary defense]

Free-form chat — open-ended conversations don't fit schemas. Other guardrails dominate. Creative-writing applications — by design output is unconstrained. Summarization tasks where format is unpredictable.

[SLIDE 5 — Dual-LLM pattern]

When structured output isn't a fit and the application processes untrusted content — RAG, agent web browsing, document upload, email summarization — dual-LLM is the architectural pattern of last resort.

The split. Privileged LLM handles the user's request and orchestrates tools. Never sees raw untrusted content. Only sees structured summaries produced by the quarantined LLM. Quarantined LLM processes untrusted content. Output constrained to a structured summary — extracted facts, flagged content. Never influences tool calls or actions.

[SLIDE 6 — Architectural property]

The architectural property: an injection landing in the quarantined LLM can corrupt the summary, but the summary's structure is enforced. At worst, the summary's facts field contains attacker-chosen text. That text might still reach the user but cannot directly trigger tool calls or steer the privileged LLM's behavior beyond data-in-the-summary level.

[SLIDE 7 — Cost]

Cost of dual-LLM. Latency — two LLM inferences per untrusted content unit. For RAG with five chunks, five-plus extra inferences per query. Cost — roughly 2x inference cost. Architecture complexity — two pipelines, structured-output validation, orchestration logic between them.

When the cost is worth it: high-stakes agentic systems, RAG over untrusted corpora, anything where successful indirect PI has material consequences. When it's overkill: simple chat apps with structured output in place, low-stakes summarization with output filters.

[SLIDE 8 — Production status + up next]

In twenty-twenty-six, dual-LLM is the closest thing to an indirect-PI-resistant architecture available. Doesn't make indirect PI impossible. Makes it dramatically harder for injection to translate into damage. Combined with structured output, input filters, output PII redaction, the stack covers most realistic threat models for production LLM apps.

Lab L7.7 walks this: take a vulnerable LLM app, wrap it in Llama Guard plus structured output plus dual-LLM, measure attack-success before and after.

Next: observability. What to log, what not to, how to redact. Five minutes.

Slide outline¶

Title — "Structured output & the dual-LLM pattern".
Structured output recap — schema-as-defense visualization.
When it's the primary defense — three category cards.
When it's not — three category cards.
Dual-LLM pattern — the data-flow diagram from the lesson body.
Architectural property — injection-into-summary illustration.
Cost — latency / cost / complexity callouts.
Up next — "L7.4.1 — Observability, ~5 min."

Production notes¶

Recording: ~4.5 min. Cap 5.
Slide 5 (dual-LLM data flow) is the lesson's pedagogical anchor — make it crystal clear.