Module 4 — Summary¶

Type: Theory · Duration: ~3 min · Status: Mandatory

Video script¶

[SLIDE 1 — Module 4 wrap]

Module 4 wrap. You executed a targeted poisoning attack and watched a 1% label flip produce a 94% blind spot while overall accuracy stayed unchanged. You planted a working BadNets-style backdoor in an image classifier and saw a deliberate yellow square activate misclassification 92 percent of the time on otherwise-clean inputs. You scanned weight files for malicious pickles and learned the 30-second test that catches the bulk of the threat. And if you took the optional lab, you built a working AI Bill of Materials for the Module 1 RAG. Plus the theory layer: training data poisoning, sleeper agents, harmful fine-tuning, model supply chain, dependency risk, provenance tracking.

Two takeaways from this module land harder than the others. First: backdoors survive standard alignment. The Sleeper Agents result means "we tested it and it works" is necessary but not sufficient evidence. Second: AI supply chain is less mature than software supply chain. Provenance is the foundational defense, and most teams don't have it. The AI-BOM you built is the start of fixing that.

[SLIDE 2 — What changes in Module 5]

Module 5 pivots from "the model is attacked" to "the model itself is the asset under attack." Three classes: model extraction (stealing weights through API queries), model inversion (recovering training data from outputs), and membership inference (determining whether a specific record was in the training set). Two mandatory labs. The threat model is closer to classical IP theft; the techniques are AI-specific.

See you in Module 5.

Slide outline¶

Module 4 wrap — six-checkmark recap: Data poisoning (lab) · Backdoor (lab) · Sleeper Agents result · Harmful fine-tune · Supply chain · AI-BOM. Two landing-hardest takeaways called out.
What's next — Module 5 teaser: extraction, inversion, membership inference.

Production notes¶

Recording: 2–3 min raw.
Same "Module N → Module N+1" visual convention.