Module 6 — Summary¶
Type: Theory · Duration: ~3 min · Status: Mandatory
Video script¶
[SLIDE 1 — Module 6 wrap]
Module 6 wrap. You ran FGSM and PGD against an image classifier and watched the attack-success curve climb from 28 percent to 99 percent as epsilon went from 0.01 to 0.03 — all under invisible perturbations. You ran TextFooler and BERT-Attack against a sentiment classifier and saw 80 percent attack success against an undefended target. You measured what input normalization plus ensembling actually moves on toxicity classification (a meaningful but bounded amount).
Two takeaways land hardest. First: adversarial examples are not patched, they are mitigated. The defensive language has to be quantitative — "PGD attack success at epsilon 0.03 dropped from 99 percent to 20 percent at 7 points of clean-accuracy cost" — not "we're robust." Second: production classifiers in adversarial settings are pipelines, not models. The classifier is one component; defense lives in the system around it (non-ML signals, ensembles, human-in-the-loop, telemetry, iteration capacity).
[SLIDE 2 — What changes in Module 7]
Module 7 is the big defensive module. You've now seen six modules of attacks (M3-M6) plus the foundations (M0-M2). M7 productionizes the defenses: guardrails (Llama Guard, NeMo), eval harnesses (promptfoo, Garak in CI), prompt/response logging with PII redaction, AI incident response, AI red-team program design. Three mandatory labs. About 4.5 hours. The most operationally-relevant module in the course for a working AI security engineer.
See you in Module 7.
Slide outline¶
- Module 6 wrap — five-checkmark recap + the two-landing-hardest takeaways.
- What's next — Module 7 teaser: the big defensive module, three mandatory labs.
Production notes¶
- Recording: 2-3 min raw.
- Same "Module N → Module N+1" visual convention.