Module 5 — Quiz¶
Type: Quiz · Duration: ~10 min · Status: Mandatory · Pass mark: 70% (9 of 12) Module: Module 5 — Model Extraction, Inversion & Membership Inference
Question 1 (multiple choice)¶
Which best describes "functional-substitute extraction"?
a) Recovering the exact weight values of the target model. b) Training a new model on the target's input/output pairs to mimic its behavior. c) Stealing the model's source code via a repo breach. d) Tricking the model into revealing its training data via prompts.
Answer: b
Question 2 (multiple choice)¶
Why is the inference API the primary attack surface for extraction?
a) Because the API has weak authentication by default. b) Because the API is, by design, a surface that lets paying customers query the model and receive outputs — extractors use that authorized channel for an unauthorized purpose. c) Because most APIs run on insecure protocols. d) Because models can be downloaded from APIs.
Answer: b
Question 3 (short)¶
Name the four stages of a query-based functional extraction attack.
Answer: Reconnaissance; query generation; collection; training the substitute.
Question 4 (multiple choice)¶
Which mechanism makes membership inference attacks work?
a) Weak hashing of training records. b) The train-test confidence gap — models are typically more confident on training-set inputs than on similar inputs they weren't trained on. c) Logits being exposed in API responses. d) Insecure cookies on the inference endpoint.
Answer: b
Question 5 (multiple choice)¶
Distinguish model inversion from membership inference.
a) Inversion is harder; MIA is easier. b) Inversion recovers (approximations of) training records themselves; MIA recovers only whether a specific record was in training. c) Inversion uses gradients; MIA uses logits. d) Inversion is a privacy attack; MIA is a security attack.
Answer: b
Question 6 (short)¶
Why is "we store user queries as embeddings, so they're anonymized" not a defensible claim?
Answer: Because embeddings retain substantial information about the underlying text — enough that inverter networks can recover ~70% of the word content. Similar input → similar vector by construction.
Question 7 (multiple choice)¶
ε (epsilon) in differential privacy represents:
a) The error rate of the model. b) The privacy budget — smaller ε = stronger privacy guarantee (and typically greater utility cost). c) The encoder size. d) The number of epochs.
Answer: b
Question 8 (multiple choice)¶
True or false: federated learning by itself defends against membership inference attacks.
a) True b) False
Answer: b (False) Why: Model updates shared during FL can leak MIA-detectable information. FL + DP (DP-FL) provides the defense.
Question 9 (scenario — short)¶
A startup ships an LLM-powered medical-summarization product. They use a vendor base model + their own fine-tune on de-identified patient records. Name two attack classes from Module 5 they should defend against, and one defense per class.
Sample answer: - Training-data extraction (L5.3.1) — patient record text could be extracted from the fine-tuned model. Defense: DP-SGD on the fine-tune (L5.4.1) + output PII redaction (L5.4.2). - Membership inference (L5.2.1) — even with de-identified records, MIA can confirm a specific patient's record was used. Defense: regularization + DP-SGD on the fine-tune; granularity-reduced confidence outputs (L5.4.2).
Bonus credit for noting that "de-identified" data may still be re-identifiable via MIA combined with auxiliary data.
Question 10 (multiple choice)¶
Which operational defense moves the needle most on extraction?
a) Strict rate limits alone (attackers distribute across accounts). b) Output noise injection (degrades legitimate users; sophisticated extractors average over it). c) Per-tenant query monitoring with anomaly detection (diversity, pattern, output-adaptation signals). d) Disabling the API.
Answer: c
Question 11 (short)¶
What's the "TPR @ FPR = 1%" metric, and why is it the metric to internalize for MIA?
Answer: TPR @ FPR = 1% is the true-positive rate at a false-positive rate of 1% — i.e., how often the attacker correctly identifies a member when they're tolerating only 1% mis-identification of non-members. It's the right metric because attackers want true positives with few false positives (false positives waste their time / reveal them), so aggregate "attack accuracy" understates the threat.
Question 12 (scenario — short)¶
You're reviewing a production RAG product. The vector DB stores embeddings of customer support transcripts. The team has classified the DB as "low sensitivity" because "it's not the original text." Identify the finding, the multi-framework citation, and the remediation.
Sample answer: - Finding: Embeddings of customer support transcripts retain substantial information about the underlying text. Inversion attacks (Morris et al. 2023) can recover ~70% of word content. The vector DB is effectively a PII store and should be classified as such. - Citations: ATLAS AML.T0048 (Erode ML Model Integrity / output-side info exposure); OWASP LLM06 (Sensitive Information Disclosure); NIST AI RMF Measure 2.10 (Privacy assessed and documented); EU AI Act Article 10 if the system is in scope. - Remediation: Re-classify the vector DB as PII. Strict access control, audit logging, tenant boundary enforcement. Output PII redaction when embedding-derived content reaches users. Consider lower-dimension embeddings or process-and-discard for highest-sensitivity content.
Scoring¶
- 12 questions, 1 point each.
- 70% to pass (9 of 12).
- LMS auto-grades Q1, Q2, Q4, Q5, Q7, Q8, Q10 (multiple choice).
- Q3, Q6, Q11 auto-gradable on key-phrase match.
- Q9, Q12 require rubric-based grading.
- Two attempts; on second failure, re-review L5.3. and L5.4. before retaking.