Skip to content

L2.4.4 — OWASP LLM08–LLM10 in detail

Type: Theory · Duration: ~4 min · Status: Mandatory Module: Module 2 — AI Security Foundations Framework tags: OWASP LLM08, LLM09, LLM10

Learning objectives

  1. Describe LLM08 Excessive Agency, LLM09 Overreliance, LLM10 Model Theft.
  2. Identify the primary defense category for each.

Core content

LLM08 — Excessive Agency

What it is. An LLM-based system (typically an agent) is granted authority to take actions, and that authority is broader than the system's intended purpose. A "send_email" tool that can email any address. A "shell" tool that runs arbitrary commands. A "database" tool that can query (and write) any table. When prompt injection or jailbreak succeeds against such a system, the attacker inherits the agent's authority.

Why it's worse than classical authz bugs. The agent acts on the user's behalf. The auth surface — the model's decision-making — is non-deterministic and probabilistic. You can't write a clean unit test that "the agent never emails outside the user's allowed list" the way you can for a classical authz check.

Primary defenses. - Principle of least authority on tools. The tool surface should be the smallest that satisfies the use case. Scope the recipient field. Scope the table. Scope the shell command set. - Human-in-the-loop on high-impact actions. Require explicit user confirmation for actions above an impact threshold (money moved, external email sent, data shared). - Intent verification. Re-prompt the model to confirm the action matches the user's intent before execution. - Logging + replay — every tool call logged with arguments and outcome; replayable for incident response.

LLM09 — Overreliance

What it is. Users (or downstream systems) treat LLM output as authoritative without verification, and the model is sometimes wrong in costly ways. Air Canada is the canonical example: the chatbot stated a refund policy that didn't exist, the user relied on it, the tribunal held Air Canada liable.

Why it's its own entry. This is a sociotechnical failure, not a code bug. The model behaves as it always does — confidently, fluently, sometimes wrong. The failure is in the system around the model that treated output as binding.

Primary defenses. - Grounding and citation. Force the model to cite sources for factual claims; surface citations to the user. - Confidence signaling. Where possible, surface uncertainty (refusal, hedging) rather than masking it. - Critical-action review. For any model output that becomes a binding action, a human or programmatic check before the action is taken. - UX patterns. Disclaimers, "verify with our team" prompts, paths to escalation.

LLM09 is the entry where governance bleeds into engineering. Module 8 covers it from the governance side.

LLM10 — Model Theft

What it is. An attacker copies the model — either by stealing the weights directly (supply-chain compromise, insider) or by querying the API enough to reconstruct a functional substitute (model extraction). Effective extraction often requires fewer queries than people assume, especially for narrow-domain models.

Why it matters. Models are IP. Frontier models cost tens of millions to train; vertical fine-tunes encode proprietary expertise. A stolen model is a license-free competitor; an extracted model can be the basis for further attacks (adversarial example transferability is highest within a model family).

Primary defenses. - API rate limiting + anomaly detection specifically tuned for query patterns indicative of extraction (high-diversity queries from a single tenant, systematic input variations). - Watermarking — embedding signals in outputs that survive in extracted models. Active research area; partial production deployment. - Tiered access. Cheaper / lower-fidelity API for cheap use cases; premium tier for power use cases with stronger auth and logging. - Legal layer. Terms of service that prohibit extraction; pursue when detected.

Real-world example

LLM08 — Multiple 2025 disclosures of agentic LLMs being coerced into actions via indirect injection. The "agent escape" lab in M3 will reproduce a stripped-down version.

LLM09 — Air Canada v. Moffatt (2024 BCCRT 149). The chatbot stated a bereavement-fare refund policy that didn't actually exist. The tribunal: "Air Canada is responsible for all information on its website. It makes no difference whether the information comes from a static page or a chatbot." Cited in legal briefings constantly since.

LLM10 — Tramèr et al. (2016, "Stealing Machine Learning Models via Prediction APIs") is the foundational paper. The 2024 follow-up by Carlini et al. extracted "exact-match" partial weights from production frontier LLMs via API queries alone.

Key terms

  • Tool surface — the set of tools an agent can invoke; the scope of arguments each accepts.
  • Grounding — anchoring model output to retrieved or cited evidence.
  • Substitute model — a model trained on extracted query/response pairs from a target.

References

  • OWASP LLM Top 10 entries LLM08, LLM09, LLM10.
  • Moffatt v. Air Canada, 2024 BCCRT 149.
  • Tramèr et al., "Stealing Machine Learning Models via Prediction APIs" (USENIX 2016) — https://arxiv.org/abs/1609.02943
  • Carlini et al. follow-up extraction work (2024) — search arXiv for current preprint.

Quiz items

  1. Q: An LLM agent is tricked into running rm -rf via indirect prompt injection through a retrieved document. Which OWASP entry is the agent-side failure? A: LLM08 — Excessive Agency.
  2. Q: Air Canada was held liable for what their chatbot said. Which OWASP entry? A: LLM09 — Overreliance.
  3. Q: A defender notices a single API tenant submitting high-diversity, systematically-varying queries at 100 req/s. Which OWASP entry should they suspect? A: LLM10 — Model Theft (query-based extraction).

Video script (~520 words, ~4 min)

[SLIDE 1 — Title]

Last three OWASP entries. LLM08 Excessive Agency, LLM09 Overreliance, LLM10 Model Theft. Four minutes.

[SLIDE 2 — LLM08: Excessive Agency]

LLM08: Excessive Agency. An LLM-based system — typically an agent — is granted authority to take actions, and that authority is broader than the system's intended purpose. A send-email tool that emails any address. A shell tool that runs arbitrary commands. A database tool that queries any table. When prompt injection or jailbreak succeeds, the attacker inherits the agent's authority. Worse than classical authz bugs because the agent acts on the user's behalf, and the auth surface is the model's decision-making — non-deterministic, probabilistic. You can't write a clean unit test the way you can for classical authz.

Defenses: principle of least authority on tools — scope the recipient field, scope the table, scope the shell command set. Human-in-the-loop on high-impact actions. Intent verification — re-prompt the model to confirm the action matches user intent. Logging plus replay.

[SLIDE 3 — LLM09: Overreliance]

LLM09: Overreliance. Users or downstream systems treat LLM output as authoritative without verification, and the model is sometimes wrong in costly ways. Air Canada is the canonical example. The chatbot stated a refund policy that didn't exist, the user relied on it, the tribunal held the airline liable. Sociotechnical failure, not a code bug. The model behaves as it always does — confidently, fluently, sometimes wrong. The failure is in the system around the model.

Defenses: grounding and citation — force the model to cite sources, surface citations to the user. Confidence signaling — surface uncertainty rather than mask it. Critical-action review — any model output that becomes a binding action gets a human or programmatic check first. UX patterns that nudge verification.

[SLIDE 4 — LLM10: Model Theft]

LLM10: Model Theft. An attacker copies the model — by stealing weights directly, or by querying the API enough to reconstruct a functional substitute. Effective extraction often requires fewer queries than people assume, especially for narrow-domain models. Models are IP. Frontier models cost tens of millions to train. A stolen model is a license-free competitor. An extracted model is also the basis for further attacks — adversarial example transferability is highest within a model family.

Defenses: rate limiting and anomaly detection tuned for extraction-shaped query patterns. Watermarking — active research, partial production. Tiered access. Legal terms-of-service layer.

[SLIDE 5 — Three real-world anchors]

LLM08: 2025 disclosures of agentic LLMs coerced into actions via indirect injection. M3 has a lab. LLM09: Air Canada v. Moffatt, 2024. LLM10: Tramèr et al., the foundational extraction paper from 2016; Carlini et al. 2024 follow-up extracting partial weights from production frontier LLMs.

[SLIDE 6 — OWASP Top 10 done]

OWASP Top 10 walked. You can now use it as a code-review checklist, an audit defense, an onboarding tool. Three more lessons in this module — NIST AI RMF, then EU AI Act. Then the threat-modeling lab. See you there.

Slide outline

  1. Title — "OWASP LLM08–LLM10 in detail".
  2. LLM08 — agent architecture with too-broad tool surface; defense list.
  3. LLM09 — Air Canada incident timeline; defense list.
  4. LLM10 — extraction query pattern visualization; defense list.
  5. Three anchors — LLM08/09/10 anchor cards.
  6. Top 10 done — full LLM01–LLM10 list with checkmarks; "Up next: NIST AI RMF".

Production notes

  • Recording: ~4 min. Cap 5.
  • This is the shortest OWASP lesson — keep the pace up; learners will be ready to move on.