L2.4.3 — OWASP LLM04–LLM07 in detail¶
Type: Theory · Duration: ~5 min · Status: Mandatory Module: Module 2 — AI Security Foundations Framework tags: OWASP LLM04, LLM05, LLM06, LLM07
Learning objectives¶
- Describe LLM04 Model DoS, LLM05 Supply Chain, LLM06 Sensitive Info Disclosure, LLM07 System Prompt Leakage.
- Identify the primary defense for each.
Core content¶
LLM04 — Model Denial of Service¶
What it is. An attacker submits inputs that consume disproportionate model resources — extremely long contexts, recursive instructions, repeated regex-like patterns that explode in token count, or queries designed to trigger expensive tool calls.
Why it matters in 2026. Token-priced APIs make DoS economically painful (token bombs run up the bill) and operationally painful (rate-limited downstream tools degrade). Agentic LLMs are especially exposed: a single user prompt can trigger many model invocations across an agent loop.
Primary defenses. - Token-level rate limits per user/tenant, not just request-level. - Per-request cost ceilings (max input tokens, max output tokens, max agent loop iterations). - Cost-anomaly alerting at the user, tenant, and global level. - Tool-call rate limits for agentic systems.
LLM05 — Supply Chain Vulnerabilities¶
What it is. Compromise via the artifacts the system depends on. Three primary vectors: - Model artifacts — backdoors, malicious pickles, typosquatted models from HuggingFace / other registries. - Datasets — poisoned training or fine-tune data downloaded from public sources. - Libraries and frameworks — vulnerable LangChain, vulnerable vector DB SDK, vulnerable agent runtime.
Why it's distinct. Classical software supply chain (CVEs, dep audits) doesn't have native vocabulary for model weights or dataset provenance. AI-BOM exists to close that gap.
Primary defenses.
- AI-BOM (Bill of Materials) — inventory every artifact: model, dataset, library, framework. Module 4 builds one.
- Scanning for malicious weights — picklescan, modelscan, signature verification.
- Source allow-lists — restrict pulls to known publishers (e.g., only meta-llama/* and mistralai/*).
- Pinning — exact version pins, hash verification.
- Classical SCA — Snyk, Dependabot equivalents, applied to the AI dependency tree.
LLM06 — Sensitive Information Disclosure¶
What it is. The model emits sensitive information it shouldn't — training-data memorization (verbatim recall of PII or secrets from the training set), system-prompt content (often containing API keys), retrieved context that shouldn't have surfaced to the asking user.
Why it persists. Models do memorize training data; the question is how much. RAG systems often retrieve content the asking user wasn't authorized to see, because retrieval was scoped by relevance, not authorization.
Primary defenses. - Data sanitization at training time — strip secrets and PII from training and fine-tune data; verify with secret-scanners. - Auth-aware retrieval — the vector DB query must respect the asking user's read scope. - Output filters — PII detectors, secret scanners on outputs before they leave the system. - DP-SGD or other formal privacy techniques (advanced, vendor-side).
LLM07 — System Prompt Leakage¶
What it is. The attacker extracts the system prompt — the static scaffolding that defines model persona, allowed/disallowed behaviors, sometimes embedded API keys or hidden instructions.
Why it's its own entry. System prompts often contain proprietary IP, sometimes contain secrets (a regrettable but common pattern), and almost always contain information that materially helps an attacker craft the next attack against the system.
Primary defenses. - Don't put secrets in system prompts. Use proper secret-injection patterns (env vars, secret manager). The system prompt is not a vault. - Assume extraction. Treat the system prompt as known-to-the-attacker. Design defenses that work even if it leaks. - Detect extraction patterns. Prompts asking the model to repeat its instructions, ignore prior context, or "tell me everything above" should be flagged.
The reality is that with enough effort, all system prompts leak. The defense is to make the leak inconsequential, not impossible.
Real-world example¶
LLM04 — In 2024, multiple disclosures of "token-bomb" attacks against agentic LLMs that triggered runaway tool calls; one indie SaaS reported a $20k+ surprise OpenAI bill from a single attacker session.
LLM05 — HuggingFace has hosted multiple takedowns of models containing malicious pickles since 2023; the JFrog Security Research team has published several research reports cataloguing them.
LLM06 — Carlini et al. (2021, 2023) demonstrated that GPT-2 and successors memorize and can be coerced to recite verbatim sequences from training data, including PII.
LLM07 — The Bing/Sydney 2023 incident is the canonical demonstration: the system prompt and internal codename were extracted within days of public launch.
Key terms¶
- Token bomb — input designed to consume disproportionate tokens or trigger many model invocations.
- AI-BOM — AI Bill of Materials.
- Auth-aware retrieval — RAG retrieval that respects the asking user's read scope.
References¶
- OWASP LLM Top 10 entries LLM04, LLM05, LLM06, LLM07.
- Carlini et al., "Extracting Training Data from Large Language Models" (2021) — https://arxiv.org/abs/2012.07805
- HuggingFace security advisories — search "malicious model" on https://huggingface.co/blog
Quiz items¶
- Q: Your indie LLM SaaS receives a $20k surprise API bill from one user session. Which OWASP entry? A: LLM04 (Model DoS, token-bomb variant).
- Q: Best defense against "system prompts leak"? A: Don't put secrets in the system prompt; design assuming it's known to the attacker.
- Q: Your RAG retrieves a document the asking user wasn't authorized to read. Which OWASP entry? A: LLM06 (Sensitive Information Disclosure via auth-unaware retrieval).
Video script (~640 words, ~4.5 min)¶
[SLIDE 1 — Title]
LLM04 through LLM07. Five minutes. Four entries: Model DoS, Supply Chain, Sensitive Information Disclosure, System Prompt Leakage.
[SLIDE 2 — LLM04: Model DoS]
LLM04: Model Denial of Service. An attacker submits inputs that consume disproportionate model resources. Extremely long contexts. Recursive instructions. Repeated patterns that explode in token count. Queries that trigger expensive tool calls. Why it matters in twenty-twenty-six: token-priced APIs make DoS economically painful. Token bombs run up the bill. Agentic LLMs are especially exposed — a single user prompt can trigger many model invocations across an agent loop. Defenses: token-level rate limits per user, per-request cost ceilings, cost-anomaly alerting, tool-call rate limits.
[SLIDE 3 — LLM05: Supply Chain]
LLM05: Supply Chain Vulnerabilities. Compromise via the artifacts the system depends on. Three vectors. Model artifacts — backdoors, malicious pickles, typosquatted models. Datasets — poisoned training or fine-tune data from public sources. Libraries and frameworks — vulnerable LangChain, vulnerable vector DB SDK, vulnerable agent runtime. Distinct because classical software supply chain doesn't have native vocabulary for model weights or dataset provenance. Defenses: AI-BOM, scanning for malicious weights with picklescan and modelscan, source allow-lists, pinning, classical SCA applied to the AI dependency tree.
[SLIDE 4 — LLM06: Sensitive Information Disclosure]
LLM06: Sensitive Information Disclosure. The model emits sensitive information it shouldn't. Training-data memorization — verbatim recall of PII or secrets from the training set. System-prompt content — often contains API keys. Retrieved context that shouldn't have surfaced to the asking user. Why it persists: models do memorize training data, the question is how much. RAG systems often retrieve content the asking user wasn't authorized to see because retrieval was scoped by relevance, not authorization. Defenses: data sanitization at training time, auth-aware retrieval, output filters for PII and secrets, formal privacy techniques like DP-SGD when feasible.
[SLIDE 5 — LLM07: System Prompt Leakage]
LLM07: System Prompt Leakage. The attacker extracts the system prompt — the static scaffolding that defines model persona, allowed and disallowed behaviors, sometimes embedded API keys or hidden instructions. Why this is its own entry: system prompts often contain proprietary IP, sometimes contain secrets — a regrettable but common pattern — and almost always contain information that materially helps an attacker craft the next attack. Defenses: don't put secrets in system prompts, assume extraction will happen, detect extraction patterns. The reality is that with enough effort, all system prompts leak. Make the leak inconsequential, not impossible.
[SLIDE 6 — Four anchors]
Four real-world anchors. LLM04: 2024 disclosures of token-bomb attacks against agentic LLMs, $20k surprise bill on one indie product. LLM05: HuggingFace's multiple takedowns of models containing malicious pickles since 2023. LLM06: Carlini et al. demonstrated that GPT-2 and successors memorize and can be coerced to recite verbatim sequences from training data, including PII. LLM07: Bing/Sydney 2023 — system prompt and internal codename extracted within days of launch.
[SLIDE 7 — Pattern across LLM04–LLM07]
Pattern across these four entries. Each one has a primary defense category that's operational — rate limits, BOM, retrieval auth, secret hygiene. None of them are "write better code"; all of them are "operate the deployment differently." This is the consistent shape of the OWASP LLM Top 10. Your engineering team owns engineering controls, but you also own operational controls. That's the job.
[SLIDE 8 — Up next]
Last three OWASP entries next lesson: Excessive Agency, Overreliance, Model Theft. Then NIST and EU AI Act. Then we build a threat model in the lab. See you there.
Slide outline¶
- Title — "OWASP LLM04–LLM07 in detail".
- LLM04 — token-bomb illustration; defense quadrant.
- LLM05 — supply-chain graph with attacker nodes; defense quadrant.
- LLM06 — three-source disclosure diagram (training memo / system prompt / retrieval); defense quadrant.
- LLM07 — system-prompt extraction example; defense quadrant.
- Four anchors — four cards side by side.
- Operational pattern — "These defenses are operational, not code-only" callout.
- Up next — "L2.4.4 — LLM08–LLM10, ~4 min."
Production notes¶
- Recording: ~4.5 min. Cap 5.
- Slide 4 should distinguish the three disclosure sources clearly; they're easy to conflate.