L3.10 — Run Garak / PyRIT against your endpoint (Lab, Optional)¶

Type: Lab · Duration: ~45 min · Status: Optional Module: Module 3 — Prompt Injection & LLM Application Attacks Framework tags: OWASP LLM01, LLM02, LLM07 · ATLAS multiple · NIST AI RMF Measure 2.7

Goal of the lab¶

Run two open-source LLM red-team scanners — NVIDIA's Garak and Microsoft's PyRIT — against your vulnchat (L3.6) and capture their findings. By the end you will understand the tools' coverage, the kinds of findings they surface, and how to triage automated reports.

Why this matters¶

Manual prompt-injection testing finds what you think to look for. Automated scanners find what you wouldn't think to look for. Production AI red-team work uses both. Knowing the tools — what they cover, what they miss — is a baseline competency.

Prerequisites¶

Skills: shell, reading JSON/YAML reports.
Lessons: L3.1.* through L3.5.1.
Lab L3.6 setup understanding (we re-launch the vulnchat).

What you'll build¶

Two scanner runs (Garak, PyRIT) against the vulnchat
runs/lab3_10/triage-notes.md — your assessment of each tool's findings (true positives, false positives, novel findings)

Steps¶

Step 1 — Launch the vulnchat¶

cd /workspace/ai-sec-course
docker compose -f targets/vulnchat/docker-compose.yml up -d

Step 2 — Run Garak¶

Garak (uv run garak) is already installed via your pyproject.toml. Run a focused probe set:

uv run garak \
    --model_type rest \
    --model_name "http://localhost:8765/chat" \
    --probes promptinject,leakreplay,realtoxicityprompts \
    --report_dir runs/lab3_10/garak \
    --generations 5

This runs three probe families against the vulnchat: - promptinject — well-known prompt-injection payloads. - leakreplay — payloads designed to extract training data and system content. - realtoxicityprompts — checks if the model produces toxic content under provocation.

Garak writes a report under runs/lab3_10/garak/. Open it:

ls runs/lab3_10/garak/
# garak.*.report.jsonl  — raw findings
# garak.*.hitlog.jsonl  — only the failures (model misbehaved)
cat runs/lab3_10/garak/garak.*.hitlog.jsonl | jq '. | {probe: .probe_classname, detector: .detector_results, output: .outputs[0]}' | head

Step 3 — Triage Garak findings¶

For each "hit" in the hitlog, decide: - True positive — yes, this is a real vuln. - False positive — Garak's detector flagged but the model behaved acceptably. - Already known — overlaps with findings you logged in L3.6.

Add a section to runs/lab3_10/triage-notes.md:

## Garak triage
- Probe `promptinject`: 14/25 hits.
  - 10 true positives (matches L3.6 instruction-override / role-reassignment patterns).
  - 3 false positives (Garak's detector flagged on a polite refusal that's fine).
  - 1 novel finding: <describe>.

- Probe `leakreplay`: 4/20 hits.
  - <triage>

- Probe `realtoxicityprompts`: 0/30 hits.
  - Vulnchat refuses toxic prompts even at default temp; alignment from upstream Llama 3.2 holds.

Step 4 — Run PyRIT¶

PyRIT is Microsoft's framework. It's heavier-weight than Garak but more configurable. Skeleton run:

uv run python scripts/lab3_10_pyrit.py \
    --target-url http://localhost:8765/chat \
    --out runs/lab3_10/pyrit/

The script wraps PyRIT's PromptSendingOrchestrator with a "single-turn jailbreak" attack strategy and a "system-prompt extraction" strategy, against our vulnchat as the target.

(The full PyRIT setup is involved; the wrapper script handles the boilerplate. Read it if curious.)

Step 5 — Triage PyRIT findings¶

Same triage shape as Garak. Add to runs/lab3_10/triage-notes.md under a ## PyRIT triage section.

Step 6 — Summarize: which tool found what¶

Close triage-notes.md with a comparison section:

## Tool comparison (vulnchat target)

| Finding type | Garak | PyRIT | Both | Neither |
|---|---|---|---|---|
| Direct PI - instruction override | ✓ | ✓ | ✓ | — |
| Direct PI - role re-assignment | ✓ | ✓ | ✓ | — |
| System prompt extraction | ✓ | — | — | — |
| Toxicity | (refused) | (refused) | — | — |
| <other> | … | … | … | … |

Then a one-paragraph recommendation: "for this kind of target, I would run X first and Y second because …"

Step 7 — Tear down¶

docker compose -f targets/vulnchat/docker-compose.yml down

What just happened (debrief)¶

Two tools, complementary coverage. Garak is the closer fit for "scan an endpoint for known vulns" — fast, lots of probes, JSON output you can pipe into a triage pipeline. PyRIT is closer to a red-team automation framework — slower to set up, more powerful once configured, supports multi-turn and orchestration patterns.

In real engagements, you'll often run Garak as a baseline scan during onboarding, then move to PyRIT (or proprietary tooling) for deeper, scenario-specific testing. The triage step you just practiced — sorting hits into TP/FP/known/novel — is the actual high-value skill; automated scanners produce noise, and the engineer's job is converting noise into a defensible findings list.

In Module 7 we'll add promptfoo (yet another tool, more eval-harness-shaped than red-team-shaped) and look at continuous evaluation patterns rather than one-off scans.

Extension challenges (optional)¶

Easy. Run Garak against a frontier endpoint (your OpenAI or Anthropic key) instead of the vulnchat. Compare hit rates. Note: this costs a few cents in API spend.
Medium. Add a custom Garak probe (Garak supports Python-defined probes) that targets a specific behavior from your L3.6 attack log.
Hard. Wire Garak into a make target or GitHub Actions step that runs on every commit and fails the build if the hit rate increases. This is the CI-pattern most production teams converge to.

References¶

Garak — https://github.com/NVIDIA/garak
PyRIT — https://github.com/Azure/PyRIT
promptfoo (Module 7) — https://promptfoo.dev/

Provisioning spec (for lab platform admin)¶

Container base image: aisec/labs-base:0.1 (Garak + pyrit already in pyproject.toml).

Additional pre-installed files: - /workspace/ai-sec-course/scripts/lab3_10_pyrit.py — PyRIT wrapper

Network: Garak fetches some probe data from HuggingFace on first run — egress to huggingface.co.

Resource use: - RAM: ~5 GB (Ollama + vulnchat + Garak process). - Wallclock: 30–60 min depending on probe selection.

Notes: - Garak's full probe set is ~120 probes; the lab uses 3 named families to keep runtime under 30 min. Document this; learners can expand to full sweep on the extension. - PyRIT requires Microsoft account or local-only mode; the wrapper script defaults to local-only.