L10.3 — Answer key and explanations¶

Type: Reference · Status: Consult after attempting L10.2

Each entry: correct answer · lesson reference · one-line explanation (longer where the question rewards explanation). Use this both to score yourself and to identify weak modules to revisit before any retake.

Part A — Single-answer multiple choice¶

Module 0¶

1. B — Python 3.11+, Docker, API keys for at least one hosted LLM, and Ollama for a local LLM. (L0.3) The course is designed so the most expensive piece — a cloud GPU — is not required. Ollama + a small local model + an inexpensive API key covers every mandatory lab.

Module 1¶

2. B — Tokenization is splitting text into model-input tokens. (L1.3) A is a generic security misuse of "token"; C describes decoding; D describes pruning.

3. C — Overfitting masked by evaluating on training data. (L1.1) Train/test split is the most fundamental ML hygiene; failing it produces optimistic eval numbers that don't survive production. Poisoning (B) would have to be planted in the train set, which is a stretch here.

4. B — Temperature scales decoding-time softmax logits. (L1.3) Lower temp → more deterministic outputs. Security-relevant because reproducibility of evals, guardrail testing, and red-team probes all need controllable determinism. Default to temperature=0 for security evals.

5. A — Embed → retrieve → augment prompt → generate. (L1.7) This is the canonical RAG flow. Embed the query → retrieve top-k from the vector DB → augment the LLM prompt with retrieved chunks → generate the answer.

Module 2¶

6. B — Model manipulation (the M in STRIDE-MA). (L2.1) STRIDE-MA adds Model manipulation and Agency abuse to the classic STRIDE letters. Without M and A, classic STRIDE misses the AI-specific threats.

7. C — AI Model Inference Manipulation. (L2.3) This ATLAS tactic family covers inference-time crafted inputs (e.g., prompt injection, adversarial examples). Reconnaissance and ML Attack Staging are earlier tactic stages; Persistence is unrelated.

8. B — OWASP LLM01 is Prompt Injection. (L2.4) LLM01 explicitly covers both direct and indirect. (LLM02 in the current OWASP version is Sensitive Information Disclosure; in earlier versions it was Insecure Output Handling. This course standardizes on the 2025 list — confirm what your version says, but LLM01 = prompt injection is stable across versions.)

9. D — Govern. (L2.5, L8.1) The four NIST AI RMF core functions are Govern, Map, Measure, Manage. Govern establishes the policy and accountability layer that the other three operate within.

10. C — A credit-scoring system for natural persons. (L2.5, L8.2) Annex III of the EU AI Act explicitly lists creditworthiness assessment as high-risk. A grammar checker (D) and a game NPC (B) are minimal-risk; a personal spam filter (A) is also minimal-risk.

11. B — Draw the DFD with trust boundaries first. (L2.6) The DFD is the substrate. Without it, your STRIDE table will be ad hoc and almost certainly miss attack surfaces. Garak, pentest, and IR playbook all come after threat modeling.

Module 3¶

12. B — Hidden text in a retrieved webpage instructing the agent to exfiltrate cookies. (L3.2) The key word is "indirect" — the injection arrives via a content source the model retrieves, not via the user's direct prompt. A is direct PI; C and D are unrelated attack classes.

13. A — Insecure output handling (LLM03 in the standard OWASP list — Insecure Output Handling). (L3.3) The bug is that the application treats LLM output as trusted code. The fix is to never execute LLM output against a sensitive system; use parameterized queries, structured output enforcement, and treat LLM output as untrusted data.

14. B — Over-permissioned/over-autonomous agents. (L3.4) Excessive Agency (LLM08) is literally about granting agents more tools, permissions, or autonomy than their threat surface justifies. Apply the principle of least authority to agent tools.

15. A — A keyword denylist for "ignore previous instructions." (L3.9) Denylists are trivially bypassable (paraphrase, encode, translate). The other three options (structured output, provenance + system prompt instruction, dual-LLM) are real, defense-in-depth controls. A is the least effective by a wide margin.

16. A — Llama Guard. (L7.3) Llama Guard is Meta's input/output safety classifier. The others are unrelated frameworks/databases.

17. B — Indirect prompt injection combined with excessive agency. (L3.2 + L3.4) Two failures combine: (1) the retrieved web content was treated as instruction, not data; (2) the send_email tool fires without human-in-the-loop confirmation. Both must be fixed.

18. B — Jailbreaks bypass model safety alignment; prompt injections override the application's intent. (L3.1) The line is blurry in practice, but the core distinction is: jailbreak attacks the model's alignment; prompt injection attacks the application's control flow.

19. A — Garak. (L7.5) Garak (and PyRIT, promptfoo) are the major OSS LLM red-team automation tools. The other options are made-up.

20. B — System prompts contain business logic, persona, refusal triggers, sometimes secrets — leaking them lowers cost of subsequent attacks. (L3.5) The "system prompt is not a secret" framing is half-true (you should never embed secrets in it), but in practice system prompts encode IP and attack-surface information.

Module 4¶

21. B — Normal behavior on clean inputs; attacker-chosen behavior on trigger inputs. (L4.2) This is the textbook definition. BadNets, sleeper agents, and image-trigger backdoors all follow this pattern.

22. B — pickle.load executes arbitrary code from the serialized payload. (L4.4) This is the killer. Always prefer safetensors (data-only, no code execution) for weights. Scan everything you can't avoid loading via pickle.

23. B — Prefer safetensors, scan with picklescan/modelscan, pin commit hashes, sandbox. (L4.4, L4.8) This is the defense-in-depth stack. HTTPS (A) is necessary but insufficient. Antivirus (C) is largely irrelevant. Popularity (D) is not safety — popular pickle files have been weaponized.

24. B — Targeted poisoning aims for specific misbehavior; untargeted degrades accuracy. (L4.1) Targeted is often the more dangerous class because it's harder to detect — overall accuracy can look fine.

25. B — Inventory of every model, dataset, training script, dependency. (L4.9, L8.4) AI-BOM (CycloneDX-AI is the emerging format) is the AI equivalent of SBOM. You need it to respond to vulnerability disclosures and supply-chain incidents.

26. B — Backdoor behaviors can persist through subsequent safety training. (L4.2) This is the headline finding from the Anthropic 2024 paper. Implication: "RLHF will catch it" is not a reliable mitigation against an upstream-poisoned model.

Module 5¶

27. B — Query-only access; train a substitute model approximating the target. (L5.1) No weight access, no insider — just API queries. The economics depend heavily on how much the API reveals per query (see Q44).

28. A — Whether a particular record was in the training data. (L5.2) Membership inference is per-record. It matters when training data is sensitive (PHI, PII) or proprietary.

29. B — Overfit models produce systematically different outputs on members vs non-members. (L5.2) The lower training loss / higher confidence on members is the signal. Less overfitting → weaker signal → harder MI attacks. This is part of why regularization and DP-SGD reduce MI risk.

30. B — Membership inference + training-data extraction. (L5.4) DP-SGD adds calibrated noise to gradients so any single example has bounded influence on parameters. This blunts both attacks.

31. B — Models memorize and regurgitate rare/repeated training sequences. (L5.3) Carlini et al. showed that GPT-2 and successors memorize verbatim, especially low-entropy strings (license keys, PII, code). Probes that ask for completions of partial unique strings can extract these.

Module 6¶

32. B — Small, often imperceptible perturbations causing misclassification. (L6.1) This is the defining characteristic. The model classifies an image of a panda as a gibbon with high confidence; the human still sees a panda.

33. B — FGSM = weak baseline (single-step), PGD = iterative (much stronger), AutoAttack = parameter-free ensemble (current strong baseline). (L6.2) If a defense only beats FGSM, it's not a real defense. Evaluate against AutoAttack.

34. B — Input-output access only; rely on transferability or query-based estimation. (L6.1) Black-box assumes no gradients, no weights. White-box has full access. Gray-box has partial (e.g., architecture but not weights).

35. B — Layered: adversarial training + preprocessing + confidence thresholds + drift monitoring. (L6.4) No single defense is sufficient. Hiding architecture (D) is security through obscurity. Bigger models (C) are not inherently robust. WAF (A) doesn't understand model inputs.

Module 7¶

36. B — Reduces insecure output handling and a class of PI by constraining the output channel. (L7.3) If your LLM can only emit JSON matching a schema, downstream systems can validate it and treat it as data. Off-schema → reject. This collapses a large class of prompt-injection-leveraged exploits.

37. B — Privileged orchestrator + quarantined LLM for untrusted content. (L7.3) Simon Willison's framing. The quarantined LLM never sees credentials or has tools; its output is constrained (e.g., a summary or a labeled extraction) before reaching the privileged LLM.

38. B — Reduce regulatory and blast-radius exposure. (L7.4) Without redaction, your prompt/response logs become a secondary PHI/PII store with the same protections needed as production. With redaction, logs remain useful for security analysis without being a privacy liability.

39. B — Verifying that the deployed model is the artifact produced by your trusted build pipeline. (L7.2) Sigstore-for-models adds cryptographic signing and provenance attestation to models, the equivalent of Sigstore for container images.

40. B — IR additions: rotate system prompt, disable specific tools, snapshot prompt/response logs, consider model rollback. (L7.6) These are the AI-specific containment actions a generic IR playbook lacks. The other options are generic IR concerns that apply to any incident.

Part B — Scenario-based questions¶

41. B — LLM01 (indirect PI via RAG) + provenance/trust labeling + system instructions + rendered editorial review. (L3.2, L3.9, M9 capstone) The hidden-text trick is a classic indirect injection technique. The fix combines provenance, system-prompt discipline, and editorial review of rendered content, not just source.

42. B — Targeted backdoor via poisoned training data. (L4.1, L4.2) The signature — normal-looking model + specific trigger phrase producing systematic misbehavior — is a textbook backdoor. Public-dataset-no-integrity-check is the entry point.

43. B — High-risk obligations (Articles 9, 11, 13, 14, 15) + potential Article 79 market-surveillance engagement. (L8.2) Credit scoring is high-risk under Annex III. Article 5 (prohibited) doesn't apply here (this isn't social scoring). The bank must demonstrate risk management, technical documentation, human oversight, accuracy/robustness/cybersecurity, and transparency to affected persons.

44. B — Returning full probability distributions vs only top-1 labels. (L5.1) This is the single highest-leverage attacker-favoring detail. Full softmax outputs let the attacker fit a much closer surrogate per query, often reducing required query counts by orders of magnitude. Production fix: return top-1 label only, or top-k labels without probabilities.

45. B — Runtime guardrails (input filtering + structured output + human-in-the-loop on state-mutating tools) + prompt/response logging with PII redaction. (L7.3, L7.4, M9 capstone) This single integrated control layer addresses prompt injection, insecure output handling, excessive agency, and observability/incident-response gaps simultaneously — the highest worst-case-blast-radius reduction per unit time, especially in a healthcare setting.

46. B — Upstream model poisoning; controls: pin commit hashes, scan model artifacts, prefer signed/reproducible artifacts, include backdoor-trigger probing in red-team checklist. (L4.2, L4.4, L4.8) A trigger-string backdoor in a fine-tuned model overwhelmingly points to a poisoned base model. The fix is supply-chain hygiene + behavioral probing before promotion.

47. B — Inventory (AI-BOM) → risk register → threat-model prod feature → AI IR playbook → define launch gates → baseline logging → then automated red-team + guardrails on highest-risk feature. (M2, M7, M8) This is what a first AI security engineer realistically delivers in 90 days. Buying tools (C) or waiting for regulators (D) are non-answers. External red-team (A) is valuable but premature when you don't yet have an inventory or a risk register.

48. B — LLM06 cross-tenant disclosure; mitigations: enforce tenant isolation in retrieval + per-chunk tenant tag validation + retrieval provenance logging. (L3.2, M9 capstone) Treating it as "just an infrastructure bug" misses the AI-layer impact: the LLM amplifies the leak by composing across tenant content in its output.

49. B — System prompt leak is a roadmap for the next attack, exposes internal attack-surface details (API URL), and may contain IP. (L3.5) The leak is rarely "just a leak"; it's an attack-cost reducer. Treat as Medium+ unless you can show the prompt is genuinely public/non-load-bearing.

50. B — Automated red-team via Garak + custom probes hooked into CI, promptfoo for continuous eval, short procurement-facing one-pager mapped to NIST AI RMF + OWASP LLM Top 10, annual external red-team for the highest-risk feature only. (L7.5, L7.10, L8.4) This is the highest outcome-per-hour-invested combination for a resource-constrained org with real customers and procurement pressure.

Per-module score worksheet¶

After self-scoring, mark how you did per module. Re-study any module where you missed more than half the questions before a retake.

Module	Questions	Your correct	Total
M0	1	___	1
M1	2–5	___	4
M2	6–11	___	6
M3	12–20	___	9
M4	21–26	___	6
M5	27–31	___	5
M6	32–35	___	4
M7	36–40	___	5
M8	43	___	1 (others embedded in scenarios)
Scenarios	41, 42, 44–50	___	9
Total		___	50

Pass: ≥ 38 correct (75%).

When ready, open L10.4-scoring-and-retake-policy.md for the formal pass/fail decision and next steps.