Skip to content

Module 5 — Summary

Type: Theory · Duration: ~3 min · Status: Mandatory

Video script

[SLIDE 1 — Module 5 wrap]

Module 5 wrap. You ran a query-based extraction attack that produced a 90 percent behavioral-match substitute of a victim classifier in 5,000 queries. You measured membership inference at 38 percent TPR at 1 percent FPR against an overfit target, and watched that drop to 12 percent with light regularization. If you took the optional lab, you reproduced training-data extraction against a small LLM and ran embedding inversion to confirm that embeddings are not anonymization.

The theory layer mapped four attack classes — extraction, MIA, inversion, embedding leak — and walked the defenses: DP-SGD, federated learning, granularity reduction, output PII redaction, per-tenant monitoring, vector-DB access control.

The hardest single takeaway from this module: most production teams in 2026 are doing none of this. Embeddings classified as "low-sensitivity" when they're effectively PII. APIs returning full logits when top-1 would suffice. No per-tenant query monitoring. Pointing out these gaps in real engagements is high-ROI work.

[SLIDE 2 — What's next]

Module 6: Adversarial Examples and Evasion. Three more attack classes targeting the output of the model — image and text adversarial perturbations at inference time. Two mandatory labs. Shorter module than this one. See you in Module 6.

Slide outline

  1. Module 5 wrap — four-attack recap (extraction · MIA · inversion · embedding leak) + the "most teams don't defend any of this" landing point.
  2. What's next — Module 6 teaser: adversarial examples & evasion.

Production notes

  • Recording: 2–3 min.
  • The landing point on slide 1 ("most teams don't defend any of this") is the rhetorical hook for the rest of the course's defensive arc.