Module 5 — Summary¶
Type: Theory · Duration: ~3 min · Status: Mandatory
Video script¶
[SLIDE 1 — Module 5 wrap]
Module 5 wrap. You ran a query-based extraction attack that produced a 90 percent behavioral-match substitute of a victim classifier in 5,000 queries. You measured membership inference at 38 percent TPR at 1 percent FPR against an overfit target, and watched that drop to 12 percent with light regularization. If you took the optional lab, you reproduced training-data extraction against a small LLM and ran embedding inversion to confirm that embeddings are not anonymization.
The theory layer mapped four attack classes — extraction, MIA, inversion, embedding leak — and walked the defenses: DP-SGD, federated learning, granularity reduction, output PII redaction, per-tenant monitoring, vector-DB access control.
The hardest single takeaway from this module: most production teams in 2026 are doing none of this. Embeddings classified as "low-sensitivity" when they're effectively PII. APIs returning full logits when top-1 would suffice. No per-tenant query monitoring. Pointing out these gaps in real engagements is high-ROI work.
[SLIDE 2 — What's next]
Module 6: Adversarial Examples and Evasion. Three more attack classes targeting the output of the model — image and text adversarial perturbations at inference time. Two mandatory labs. Shorter module than this one. See you in Module 6.
Slide outline¶
- Module 5 wrap — four-attack recap (extraction · MIA · inversion · embedding leak) + the "most teams don't defend any of this" landing point.
- What's next — Module 6 teaser: adversarial examples & evasion.
Production notes¶
- Recording: 2–3 min.
- The landing point on slide 1 ("most teams don't defend any of this") is the rhetorical hook for the rest of the course's defensive arc.