Module 6 — Adversarial Examples & Evasion¶
Duration: ~3 hrs · Status: Mandatory Lessons: 11 total — 6 short theory · 2 mandatory labs · 1 optional lab · quiz · summary Framework coverage: MITRE ATLAS AML.T0015 (Evade ML Model), AML.T0018 (Manipulate ML Model) · OWASP ML Top 10 (classical ML) · NIST AI RMF Measure 2.7
Module outcomes¶
By the end of this module, the learner can: 1. Define adversarial examples and articulate why every neural network is structurally susceptible. 2. Distinguish white-box from black-box attacks; recognize transferability as the bridge. 3. Execute FGSM and PGD attacks against an image classifier and measure attack success. 4. Execute text-attack techniques (character/word/sentence-level perturbations) against a text classifier. 5. State three robustness defenses (adversarial training, input preprocessing, certified defenses) and their trade-offs.
Lesson list¶
Adversarial examples — fundamentals (~9 min)¶
- L6.1.1 — Adversarial examples: what they are and why they exist (Theory, ~4 min, mandatory)
- L6.1.2 — White-box vs black-box, and the transferability bridge (Theory, ~5 min, mandatory)
Image attacks (~5 min)¶
- L6.2.1 — Image attacks: FGSM, PGD, and beyond (Theory, ~5 min, mandatory)
Text attacks (~5 min)¶
- L6.3.1 — Text attacks: character, word, and sentence-level perturbations (Theory, ~5 min, mandatory)
Evasion in production (~5 min)¶
- L6.4.1 — Evasion in production: spam, content moderation, fraud (Theory, ~5 min, mandatory)
Defenses (~5 min)¶
- L6.5.1 — Robustness defenses: adversarial training, preprocessing, certified defenses (Theory, ~5 min, mandatory)
Labs (~2.25 hrs)¶
- L6.6 — (Lab) FGSM/PGD attack on an image classifier (~60 min, mandatory)
- L6.7 — (Lab) TextAttack against a text classifier (~60 min, mandatory)
- L6.8 — (Lab, optional) Bypass a content moderation model (~45 min, optional)
Wrap-up¶
- Quiz — 12 questions, 70% to pass (~10 min, mandatory)
- Summary — bridge to Module 7 (~3 min, mandatory)
Ethics & scope¶
The "bypass content moderation" optional lab (L6.8) demonstrates a technique that can be misused. We use a clearly-flagged open-source moderation model in a sandbox; the learner adds nothing harmful — the lab is about evasion technique, not generating harmful content. Re-read L0.1.
Why this module exists¶
Adversarial examples are the most-studied AI-attack class in the academic literature, the least-talked-about in production security work, and the most-visible-once-you-look-for-them in deployed classifiers. After this module you can red-team any production CV or text classifier for adversarial robustness, and you can explain to a defender what "robustness" actually means and costs.
What's next¶
Module 7 — Securing the AI Pipeline (MLSecOps & Defenses). The big defensive module. Three mandatory labs covering guardrails, eval harnesses, and observability. ~4.5 hrs.