L4.8 — Scan HuggingFace models for malicious pickles (Lab)¶
Type: Lab · Duration: ~45 min · Status: Mandatory Module: Module 4 — Data Poisoning, Backdoors & Supply Chain Framework tags: OWASP LLM05 · MITRE ATLAS AML.T0010
Goal of the lab¶
Run picklescan and modelscan against a curated set of HuggingFace-style weight files (some safe, some carrying neutralized malicious payloads from public disclosure write-ups), triage the output, and add the practice to your pip install-pull workflow. By the end you will know how to use both tools, read their output, and incorporate the scan as a default step.
Ethics & scope¶
The "malicious" weight files in this lab are neutralized: they carry payloads from public disclosure writeups that have been replaced with print-statements rather than working exploits. Don't carry the technique beyond the lab.
Why this matters¶
Loading a pickle-based weight file from any source you don't fully trust is, in the literal sense, running arbitrary code. The 30-second scan you'll learn here is the single highest-ROI supply-chain hygiene practice in 2026.
Prerequisites¶
- Skills: shell, basic JSON.
- Lessons: L4.4.1, L4.4.2.
What you'll build¶
- A scan report of the lab's curated weight files.
- A
pre-pull-scan.shshell wrapper you can drop into agit pre-commithook or CI step. runs/lab4_8/triage.md— your assessment of each file.
Steps¶
Step 1 — Look at what's in the lab dataset¶
Expected:
safe-bert.bin ~ 250 MB — a clean DistilBERT weight file
safe-cnn.pth ~ 5 MB — a clean image classifier
mal-printstmt.bin ~ 10 MB — small weight file with a `print()` payload (Trail of Bits style example)
mal-eval-call.pt ~ 10 MB — payload uses `eval()` — flagged by both scanners
mal-encoded.bin ~ 10 MB — same payload but obfuscated; some scanners miss it
safe-safetensors-conversion.safetensors ~ 250 MB — clean, safetensors format (won't have pickle risk)
Each malicious file is labeled with its technique so you can verify scanner accuracy.
Step 2 — Run picklescan¶
uv run picklescan -p datasets/lab4_8/safe-bert.bin
# Expected: clean, no warnings
uv run picklescan -p datasets/lab4_8/mal-printstmt.bin
# Expected: warning — flagged opcode usage / suspicious imports
uv run picklescan -p datasets/lab4_8/mal-eval-call.pt
# Expected: warning — eval / exec detected
Run against every file in the lab dir:
for f in datasets/lab4_8/*.{bin,pth,pt,safetensors}; do
echo "=== $f ==="
uv run picklescan -p "$f" 2>&1 | tail -5
echo ""
done | tee runs/lab4_8/picklescan-report.txt
Step 3 — Run modelscan¶
uv run modelscan scan -p datasets/lab4_8/mal-printstmt.bin --format json | jq
uv run modelscan scan -p datasets/lab4_8/mal-eval-call.pt --format json | jq
uv run modelscan scan -p datasets/lab4_8/mal-encoded.bin --format json | jq
Modelscan produces JSON. Note the difference in format and severity ranking versus picklescan.
Full sweep:
for f in datasets/lab4_8/*.{bin,pth,pt,safetensors}; do
echo "=== $f ==="
uv run modelscan scan -p "$f" --format json 2>/dev/null | jq -r '.summary'
done | tee runs/lab4_8/modelscan-report.txt
Step 4 — Triage¶
Open runs/lab4_8/triage.md and fill in:
| File | picklescan verdict | modelscan verdict | Truth | Notes |
|---|---|---|---|---|
| safe-bert.bin | clean | clean | clean | ✓ both correct |
| safe-cnn.pth | clean | clean | clean | ✓ |
| mal-printstmt.bin | warn | warn | mal | ✓ both detect |
| mal-eval-call.pt | warn | warn | mal | ✓ both detect |
| mal-encoded.bin | ? | ? | mal | <your observation — does either catch the obfuscation?> |
| safe-safetensors-conversion.safetensors | n/a (not pickle) | n/a (not pickle) | clean | both correctly skip non-pickle formats |
Key observation to capture: advanced obfuscation can bypass static scanners. Document where in your set this happens.
Step 5 — Wrap as a pre-pull script¶
Save scripts/pre-pull-scan.sh:
#!/usr/bin/env bash
# Usage: ./pre-pull-scan.sh path/to/weight-file
# Exits 0 if scan passes both tools, 1 if either flags.
set -e
FILE="$1"
echo "[scan] $FILE"
uv run picklescan -p "$FILE" || { echo "picklescan flagged"; exit 1; }
uv run modelscan scan -p "$FILE" --format json | jq -e '.summary.issue_count == 0' >/dev/null || { echo "modelscan flagged"; exit 1; }
echo "[scan] $FILE — both tools clean"
chmod +x scripts/pre-pull-scan.sh
# test:
scripts/pre-pull-scan.sh datasets/lab4_8/safe-bert.bin
scripts/pre-pull-scan.sh datasets/lab4_8/mal-printstmt.bin || echo "blocked, expected"
Step 6 — Where to wire this in¶
The triage doc's closing paragraph: where would you wire this scan in your real workflow?
Options:
- pre-commit hook — runs locally before commits that add weight files.
- CI/CD step — runs on every PR that modifies weight files.
- Pre-deployment pipeline — runs before any production deployment that includes a model.
- Defense-in-depth: pull request annotation + IDS rule on the model registry mirror's egress traffic.
The cheapest place to wire it is CI; the most-thorough is a pre-pull guard at the registry mirror. Most teams in 2026 do CI as a minimum.
What just happened (debrief)¶
You learned the single highest-ROI supply-chain hygiene practice in AI today: a 30-second scan that catches the bulk of malicious weight files. Three takeaways:
Two tools, complementary. picklescan is purpose-built for pickle bytecode inspection; modelscan covers a broader set of model formats and is opinionated about severity. Run both. Neither is complete.
Static scanners miss advanced obfuscation. The mal-encoded.bin file demonstrated this. Layered defenses matter: isolation on load + signature verification + provenance tracking, not just static scanning.
Wire it once. A CI step that runs picklescan + modelscan on every PR that modifies a weight file (or every weight file pulled by the production deploy pipeline) costs ~10 minutes to set up and pays off for the lifetime of the project.
Extension challenges (optional)¶
- Easy. Find a real model on HuggingFace (any popular model with a
.binor.ptartifact) and run the scan against it. Capture the output. - Medium. Wire the scan into a GitHub Actions workflow that fails the PR if either scanner flags. Include in
runs/lab4_8/ci.yml. - Hard. Look at the opcode output of picklescan against the
mal-encoded.binfile. Identify what was obfuscated and propose a custom check that would have caught it.
References¶
- L4.4.2 (theory).
- picklescan — https://github.com/mmaitre314/picklescan
- modelscan — https://github.com/protectai/modelscan
- HuggingFace safetensors — https://huggingface.co/docs/safetensors
Provisioning spec (for lab platform admin)¶
Container base image: aisec/labs-base:0.1 (picklescan + modelscan already in pyproject.toml).
Additional pre-installed files:
- /workspace/ai-sec-course/datasets/lab4_8/ — curated weight files
- safe-bert.bin — pre-built from distilbert-base-uncased
- safe-cnn.pth — a small image classifier from torchvision tests
- mal-printstmt.bin — neutralized payload, payload is print("PWNED") only
- mal-eval-call.pt — neutralized payload using builtins.eval(...) to a print
- mal-encoded.bin — same payload but with codeobject obfuscation
- safe-safetensors-conversion.safetensors — same model as safe-bert in safetensors format
Build instructions for the lab admin (one-time, to be included in the companion repo):
- Use scripts/lab4_8_build_dataset.py to generate the curated set. The script uses public methods documented in the picklescan / modelscan README and produces neutralized payloads (no real exploit). Output goes to datasets/lab4_8/. Run once per repo update.
Network: No external network needed — all files are pre-staged.
Resource use: - Trivial; scans run in seconds. - Wallclock: 30–45 min including reading.
Notes for platform admin: - The malicious examples in this lab MUST be neutralized (payload = print, no real exploit). Confirm before publishing the image. - Document in the learner-facing lab text that all "malicious" files are educational replicas.