Skip to content

Module 3 — Prompt Injection & LLM Application Attacks

Duration: ~6 hrs · Status: Mandatory Lessons: 17 total — 8 short theory (each ≤ 5 min) · 4 mandatory labs · 2 optional labs · quiz · summary Framework coverage: OWASP LLM01, LLM02, LLM06, LLM07, LLM08 · MITRE ATLAS AML.T0051 (.000, .001), AML.T0070, AML.T0048

Module outcomes

By the end of this module, the learner can: 1. Execute direct and indirect prompt injection against an LLM application and articulate why each works. 2. Demonstrate insecure-output-handling exploits (XSS / SSRF / SQLi via LLM output) on a sample app. 3. Escape an over-permissioned LLM agent and coerce it into actions the deployer did not authorize. 4. Implement layered prompt-injection defenses (input filters, structured output, dual-LLM pattern, content sanitization, agent-side authorization) and measure their effect. 5. Run automated prompt-injection scanners (Garak, optionally PyRIT) and triage findings.

Lesson list

Direct prompt injection (~9 min)

  • L3.1.1 — Direct prompt injection fundamentals (Theory, ~4 min, mandatory)
  • L3.1.2 — Jailbreaks vs injections: the taxonomy (Theory, ~5 min, mandatory)

Indirect prompt injection (~10 min)

  • L3.2.1 — Indirect prompt injection: the architectural pattern (Theory, ~5 min, mandatory)
  • L3.2.2 — Indirect-PI delivery vectors and why they dominate (Theory, ~5 min, mandatory)

Insecure output handling (~5 min)

  • L3.3.1 — Model output is untrusted input (Theory, ~5 min, mandatory)

Excessive agency (~10 min)

  • L3.4.1 — Excessive agency: when tools are too broad (Theory, ~5 min, mandatory)
  • L3.4.2 — Agent escape patterns and tool-call defenses (Theory, ~5 min, mandatory)

System prompt extraction (~5 min)

  • L3.5.1 — System-prompt extraction and secrets-in-prompts (Theory, ~5 min, mandatory)

Labs (~4 hrs)

  • L3.6(Lab) Break a vulnerable chatbot: direct PI + system-prompt extraction (~60 min, mandatory)
  • L3.7(Lab) Indirect PI via poisoned RAG corpus (~75 min, mandatory)
  • L3.8(Lab) Agent escape: coerce a tool-using agent into unsanctioned actions (~75 min, mandatory)
  • L3.9(Lab) Build layered defenses against prompt injection (~75 min, mandatory)
  • L3.10(Lab, optional) Run Garak / PyRIT against your endpoint (~45 min, optional)
  • L3.11(Lab, optional) Multi-modal prompt injection (~45 min, optional)

Wrap-up

  • Quiz — 12 questions, 70% to pass (~10 min, mandatory)
  • Summary — bridge to Module 4 (~3 min, mandatory)

Ethics & scope

This module teaches offensive techniques against AI systems. Every attack lives inside the sandboxed lab environment we provision. Outside the lab: lab-only, your own systems, or systems you are paid/authorized to test. Re-read the course ethics policy (L0.1) before starting L3.6.

Why this module exists

Prompt injection is OWASP LLM01 — the most consequential, broadest, and stubbornly-undefendable vulnerability class in 2026 LLM systems. Defending it requires both an attacker's mental model (this module) and operational tooling (Module 7). This module gives you the attacker mental model and the working defenses that compose against it.

The four mandatory labs are organized so each one builds on the last: - L3.6 attacks the simplest target (a chatbot with no tools). - L3.7 attacks the RAG app you built in L1.7 and threat-modeled in L2.6. - L3.8 attacks a tool-using agent — the threat model from L2.6 extended. - L3.9 layers defenses across all three and measures the impact.

By Module 3 wrap, you'll have an attack-and-defense playbook for almost every prompt-injection-derived finding the rest of the course references.

What's next

Module 4 — Data Poisoning, Backdoors & Supply Chain. We pivot from inference-time attacks to training-time and supply-chain attacks. Three mandatory labs in M4 including a real LoRA-based harmful fine-tune (defensive scope only) and a HuggingFace pickle-scanner walkthrough.