Module 6 — Quiz¶

Type: Quiz · Duration: ~10 min · Status: Mandatory · Pass mark: 70% (9 of 12) Module: Module 6 — Adversarial Examples & Evasion

Question 1 (multiple choice)¶

Why are adversarial examples a structural property of neural networks rather than a fixable bug?

a) Most networks use insecure activation functions. b) High-dimensional input spaces always have directions where small movements cross the decision boundary, and the linear behavior of network layers accumulates small perturbations into large output changes. c) Networks are trained without adversarial regularization. d) Adversarial examples disappear after sufficient training.

Answer: b

Question 2 (short)¶

Distinguish white-box from black-box adversarial attacks.

Answer: White-box: attacker has full model access (weights, gradients), computes perturbations directly via gradient. Black-box: attacker has only API access; uses query-based estimation or transfers from a substitute model.

Question 3 (multiple choice)¶

What does PGD do that FGSM doesn't?

a) Uses random initialization. b) Iterates the gradient-sign update with small steps, projects back into the perturbation budget each step — explores the local loss landscape rather than moving once greedily. c) Uses L2 norm exclusively. d) Skips the gradient computation.

Answer: b

Question 4 (multiple choice)¶

The ε (epsilon) parameter in FGSM/PGD represents:

a) The number of iterations. b) The perturbation budget — maximum allowed change per pixel (under L∞ norm). c) The learning rate. d) The classifier confidence threshold.

Answer: b

Question 5 (short)¶

Why are text adversarial examples harder than image adversarial examples?

Answer: Text is discrete (tokens) so gradients don't flow smoothly; perturbations are usually visible to humans rather than imperceptible (changing a word is obvious).

Question 6 (multiple choice)¶

TextFooler is a:

a) Character-level attack using typo substitutions. b) Word-level attack using embedding-similarity-based synonym substitution. c) Sentence-level paraphrase attack. d) Defense technique.

Answer: b

Question 7 (multiple choice)¶

A vendor claims "we're robust because we adversarially trained against FGSM." Credibility?

a) High — FGSM is the standard. b) Low — FGSM is a weak baseline; PGD will defeat FGSM-AT. Modern credible baseline is PGD-AT evaluated against AutoAttack. c) Cannot tell without more information. d) Always high if vendor publishes a paper.

Answer: b

Question 8 (multiple choice)¶

Transferability matters because:

a) It makes attacks faster. b) It breaks the "we don't expose model internals" defense — attackers can use public substitutes (or extracted models from M5) to craft transferable attacks. c) It only applies to image classifiers. d) It only works on a single model family.

Answer: b

Question 9 (short)¶

Name the three robustness-defense categories and one trade-off each.

Answer: - Adversarial training (PGD-AT) — ~30% utility cost, expensive to train. - Input preprocessing — low utility cost, defeats only weak attacks (BPDA defeats undifferentiable preprocessing). - Certified defenses (randomized smoothing) — substantial utility/latency cost, provably robust within the certified radius only.

Question 10 (multiple choice)¶

True or false: a model defended against PGD is automatically robust against AutoAttack.

a) True b) False

Answer: b (False) Why: AutoAttack is an ensemble of strong attacks specifically designed to find weaknesses PGD missed. PGD-AT models often have substantially lower accuracy against AutoAttack.

Question 11 (scenario — short)¶

You're reviewing a production fraud-detection classifier. The team says "we measured robustness with FGSM and got 90% adversarial accuracy at ε=0.03, we're solid." How do you respond?

Sample answer: That's an FGSM result; PGD at the same ε would likely show substantially lower adversarial accuracy. FGSM is a weak baseline. Recommend re-measuring with PGD (40 iterations) and AutoAttack against the same model. Also recommend layering non-ML signals (transaction velocity, network analysis, reputation) since single-classifier defenses fail against sophisticated attackers in fraud-detection settings. The arms-race framing (L6.4.1) is appropriate here.

Question 12 (scenario — short)¶

A content moderation platform asks for your six-step defensive recommendation for their toxicity classifier. List them.

Sample answer: 1. Never rely on a single classifier — ensemble 2-3 differently-trained models. 2. Layer non-ML signals (account age, posting velocity, network, reputation). 3. Human-in-the-loop on edge cases; feedback retrains. 4. Adversarial training using TextAttack-generated examples periodically. 5. Telemetry on evasion attempts — cluster detection for coordinated campaigns. 6. Recognize the arms race — residual evasion is operational reality; defense is about iteration capacity and layered controls, not zero.

Scoring¶

12 questions, 1 point each. 70% to pass.
Auto-grade Q1, Q3, Q4, Q6, Q7, Q8, Q10. Key-phrase grade Q2, Q5, Q9. Rubric-grade Q11, Q12.