L6.2.1 — Image attacks: FGSM, PGD, and beyond¶

Type: Theory · Duration: ~5 min · Status: Mandatory Module: Module 6 — Adversarial Examples & Evasion Framework tags: MITRE ATLAS AML.T0015

Learning objectives¶

Describe FGSM (Fast Gradient Sign Method) and PGD (Projected Gradient Descent) at a depth sufficient to run them in lab L6.6.
Identify the ε (perturbation budget) parameter and the L∞ vs L2 norm distinction.

Core content¶

FGSM in one slide¶

Fast Gradient Sign Method. Given a target model with weights w and a correctly-classified input x with true label y, compute the gradient of the loss with respect to the input (not the weights). The sign of that gradient tells you, per pixel, which direction to push to increase loss. Push every pixel by ε in the indicated direction:

x_adversarial = x + ε · sign(∇_x loss(model(x), y))

One step. Cheap. Often effective. The canonical entry-level attack from Goodfellow et al. (2015).

Key parameters: - ε (epsilon) — perturbation budget. How large a per-pixel change is allowed. Typical values: ε=0.03 for 8-bit images (~3 / 255 — invisible to humans). - Loss — typically cross-entropy against the true label (untargeted attack) or against a chosen wrong label (targeted attack).

PGD: FGSM done right¶

Projected Gradient Descent is FGSM iterated. Same gradient-sign update, applied multiple times with a small step size, with projection back into the allowed perturbation region (an ε-ball around the original input) after each step:

for step in 1..N:
    x = x + α · sign(∇_x loss(model(x), y))
    x = clip(x, original ± ε)

Why iterate: FGSM moves in a single greedy direction. PGD explores the local loss landscape and finds substantially stronger adversarial examples — often jumping from "FGSM achieved 60% misclassification" to "PGD achieved 99% misclassification" against undefended models.

Key parameters: - ε — same as FGSM, the overall perturbation budget. - α — per-step size. Typically ε/N or smaller. - N — number of iterations. Typical: 10–40.

PGD is the standard white-box benchmark attack in 2026. If your defense holds against a strong PGD attack, that's a meaningful signal.

L∞ vs L2 norm¶

The perturbation budget ε is measured under some norm. Two common choices:

L∞ (max-norm). No single pixel changes by more than ε. The default for image attacks because it bounds the per-pixel visibility.
L2 (Euclidean-norm). Total Euclidean distance of the perturbation is bounded. Allows more change in some pixels, less in others. Used in some research; less common in production threat models.

Most published attacks specify their norm; defenders should match the norm to the threat model.

Beyond FGSM / PGD¶

The arms race continues. Notable extensions:

C&W (Carlini-Wagner) attack — solves an optimization problem to find the smallest perturbation that misclassifies. More effective than PGD against some defenses; slower.
EAD (Elastic-net) — variant balancing L1 and L2.
AutoAttack — ensemble of strong attacks; widely used as the gold-standard benchmark.

For defense-evaluation purposes: a model defended against PGD but not against AutoAttack is not robustly defended.

What an L6.6 lab learner will run¶

Lab L6.6 has you run FGSM and PGD against a pre-trained ResNet on CIFAR-10. You'll vary ε and measure attack success rate. The shape of the success curve — sharp rise as ε increases — is the visceral demonstration.

Real-world example¶

Eykholt et al. (2018), "Robust Physical-World Attacks on Deep Learning Visual Classification" — adversarial perturbations printed as stickers applied to stop signs, causing a CNN-based road-sign classifier to misread. Same FGSM/PGD techniques, instantiated in the physical world. The implications for self-driving cars (and any vision-system-in-the-real-world) are obvious.

Key terms¶

FGSM — single-step gradient-sign adversarial attack.
PGD — iterated FGSM with projection; the standard benchmark.
ε (epsilon) — perturbation budget.
L∞ / L2 norms — the metric for measuring perturbation size.
AutoAttack — gold-standard ensemble for defense evaluation.

References¶

Goodfellow et al. "Explaining and Harnessing Adversarial Examples" (FGSM, 2015).
Madry et al. "Towards Deep Learning Models Resistant to Adversarial Attacks" (PGD, 2017).
Croce & Hein "Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks" (AutoAttack, 2020).

Quiz items¶

Q: What does FGSM compute, and what does it do with the result? A: It computes the gradient of the loss with respect to the input (per-pixel), takes the sign, and adds ε times that sign to each pixel to push the input across the decision boundary.
Q: Why is PGD typically stronger than FGSM? A: PGD iterates the gradient-sign update with small steps and projection back into the perturbation budget, exploring the local loss landscape rather than moving greedily once.
Q: What does L∞ norm bound? A: The maximum per-pixel change in the perturbation (no single pixel changes by more than ε).

Video script (~580 words, ~4 min)¶

[SLIDE 1 — Title]

Image attacks: FGSM, PGD, and beyond. Five minutes.

[SLIDE 2 — FGSM]

Fast Gradient Sign Method. Given a target model with weights w and a correctly-classified input x with true label y, compute the gradient of the loss with respect to the input, not the weights. The sign of that gradient tells you, per pixel, which direction to push to increase loss. Push every pixel by epsilon in the indicated direction.

One step. Cheap. Often effective. Canonical entry-level attack from Goodfellow et al, 2015.

Key parameters. Epsilon: perturbation budget. How large a per-pixel change is allowed. Typical: epsilon equals 0.03 for 8-bit images — about 3 over 255 — invisible to humans. Loss: typically cross-entropy against the true label, untargeted attack, or against a chosen wrong label, targeted attack.

[SLIDE 3 — PGD]

Projected Gradient Descent. FGSM iterated. Same gradient-sign update, applied multiple times with small step size, with projection back into the allowed perturbation region — an epsilon-ball around the original input — after each step.

Why iterate: FGSM moves in a single greedy direction. PGD explores the local loss landscape and finds substantially stronger adversarial examples. Often jumping from "FGSM achieved 60 percent misclassification" to "PGD achieved 99 percent" against undefended models.

Key parameters: epsilon, same as FGSM. Alpha: per-step size, typically epsilon over N or smaller. N: number of iterations, typical 10 to 40.

PGD is the standard white-box benchmark attack in twenty-twenty-six. If your defense holds against a strong PGD attack, that's a meaningful signal.

[SLIDE 4 — L-infinity vs L2 norm]

The perturbation budget epsilon is measured under some norm. L-infinity, max-norm: no single pixel changes by more than epsilon. Default for image attacks because it bounds per-pixel visibility. L2, Euclidean-norm: total Euclidean distance of perturbation is bounded. Allows more change in some pixels, less in others. Less common in production threat models. Most published attacks specify their norm. Defenders should match the norm to the threat model.

[SLIDE 5 — Beyond FGSM/PGD]

The arms race continues. C&W — Carlini-Wagner — solves an optimization problem to find the smallest perturbation that misclassifies. More effective than PGD against some defenses; slower. EAD — Elastic-net — balances L1 and L2. AutoAttack — ensemble of strong attacks, widely used as gold-standard benchmark.

For defense-evaluation: a model defended against PGD but not against AutoAttack is not robustly defended.

[SLIDE 6 — Physical-world anchor]

Real-world anchor. Eykholt et al, 2018, "Robust Physical-World Attacks on Deep Learning Visual Classification." Adversarial perturbations printed as stickers, applied to stop signs, causing a CNN-based road-sign classifier to misread. Same FGSM and PGD techniques, instantiated in the physical world. Implications for self-driving cars and any vision-system-in-the-real-world are obvious.

[SLIDE 7 — Up next]

Lab L6.6 has you run FGSM and PGD against a pre-trained ResNet on CIFAR-10. Vary epsilon, measure attack success rate. Next lesson: text attacks. Five minutes. See you there.

Slide outline¶

Title — "Image attacks: FGSM, PGD, and beyond".
FGSM — equation displayed + panda→gibbon illustration (or equivalent).
PGD — pseudo-code box + comparison-curve "PGD vs FGSM success rate."
L∞ vs L2 — two perturbation visualizations side-by-side.
Beyond FGSM/PGD — three named attacks with one-line descriptions.
Physical-world anchor — stop sign with stickers image.
Up next — "L6.3.1 — Text attacks, ~5 min."

Production notes¶

Recording: ~4 min. Cap 5.
Slide 6 (the stop sign image) is iconic — use from the paper with attribution.