L4.8 — Scan HuggingFace models for malicious pickles (Lab)¶

Type: Lab · Duration: ~45 min · Status: Mandatory Module: Module 4 — Data Poisoning, Backdoors & Supply Chain Framework tags: OWASP LLM05 · MITRE ATLAS AML.T0010

Goal of the lab¶

Run picklescan and modelscan against a curated set of HuggingFace-style weight files (some safe, some carrying neutralized malicious payloads from public disclosure write-ups), triage the output, and add the practice to your pip install-pull workflow. By the end you will know how to use both tools, read their output, and incorporate the scan as a default step.

Ethics & scope¶

The "malicious" weight files in this lab are neutralized: they carry payloads from public disclosure writeups that have been replaced with print-statements rather than working exploits. Don't carry the technique beyond the lab.

Why this matters¶

Loading a pickle-based weight file from any source you don't fully trust is, in the literal sense, running arbitrary code. The 30-second scan you'll learn here is the single highest-ROI supply-chain hygiene practice in 2026.

Prerequisites¶

Skills: shell, basic JSON.
Lessons: L4.4.1, L4.4.2.

What you'll build¶

A scan report of the lab's curated weight files.
A pre-pull-scan.sh shell wrapper you can drop into a git pre-commit hook or CI step.
runs/lab4_8/triage.md — your assessment of each file.

Steps¶

Step 1 — Look at what's in the lab dataset¶

cd /workspace/ai-sec-course
ls -la datasets/lab4_8/

Expected:

safe-bert.bin              ~ 250 MB  — a clean DistilBERT weight file
safe-cnn.pth                ~ 5 MB   — a clean image classifier
mal-printstmt.bin           ~ 10 MB  — small weight file with a `print()` payload (Trail of Bits style example)
mal-eval-call.pt            ~ 10 MB  — payload uses `eval()` — flagged by both scanners
mal-encoded.bin             ~ 10 MB  — same payload but obfuscated; some scanners miss it
safe-safetensors-conversion.safetensors  ~ 250 MB  — clean, safetensors format (won't have pickle risk)

Each malicious file is labeled with its technique so you can verify scanner accuracy.

Step 2 — Run picklescan¶

uv run picklescan -p datasets/lab4_8/safe-bert.bin
# Expected: clean, no warnings

uv run picklescan -p datasets/lab4_8/mal-printstmt.bin
# Expected: warning — flagged opcode usage / suspicious imports

uv run picklescan -p datasets/lab4_8/mal-eval-call.pt
# Expected: warning — eval / exec detected

Run against every file in the lab dir:

for f in datasets/lab4_8/*.{bin,pth,pt,safetensors}; do
    echo "=== $f ==="
    uv run picklescan -p "$f" 2>&1 | tail -5
    echo ""
done | tee runs/lab4_8/picklescan-report.txt

Step 3 — Run modelscan¶

uv run modelscan scan -p datasets/lab4_8/mal-printstmt.bin --format json | jq
uv run modelscan scan -p datasets/lab4_8/mal-eval-call.pt --format json | jq
uv run modelscan scan -p datasets/lab4_8/mal-encoded.bin --format json | jq

Modelscan produces JSON. Note the difference in format and severity ranking versus picklescan.

Full sweep:

for f in datasets/lab4_8/*.{bin,pth,pt,safetensors}; do
    echo "=== $f ==="
    uv run modelscan scan -p "$f" --format json 2>/dev/null | jq -r '.summary'
done | tee runs/lab4_8/modelscan-report.txt

Step 4 — Triage¶

Open runs/lab4_8/triage.md and fill in:

| File | picklescan verdict | modelscan verdict | Truth | Notes |
|---|---|---|---|---|
| safe-bert.bin | clean | clean | clean | ✓ both correct |
| safe-cnn.pth | clean | clean | clean | ✓ |
| mal-printstmt.bin | warn | warn | mal | ✓ both detect |
| mal-eval-call.pt | warn | warn | mal | ✓ both detect |
| mal-encoded.bin | ? | ? | mal | <your observation — does either catch the obfuscation?> |
| safe-safetensors-conversion.safetensors | n/a (not pickle) | n/a (not pickle) | clean | both correctly skip non-pickle formats |

Key observation to capture: advanced obfuscation can bypass static scanners. Document where in your set this happens.

Step 5 — Wrap as a pre-pull script¶

Save scripts/pre-pull-scan.sh:

#!/usr/bin/env bash
# Usage: ./pre-pull-scan.sh path/to/weight-file
# Exits 0 if scan passes both tools, 1 if either flags.
set -e
FILE="$1"
echo "[scan] $FILE"
uv run picklescan -p "$FILE" || { echo "picklescan flagged"; exit 1; }
uv run modelscan scan -p "$FILE" --format json | jq -e '.summary.issue_count == 0' >/dev/null || { echo "modelscan flagged"; exit 1; }
echo "[scan] $FILE — both tools clean"

chmod +x scripts/pre-pull-scan.sh
# test:
scripts/pre-pull-scan.sh datasets/lab4_8/safe-bert.bin
scripts/pre-pull-scan.sh datasets/lab4_8/mal-printstmt.bin || echo "blocked, expected"

Step 6 — Where to wire this in¶

The triage doc's closing paragraph: where would you wire this scan in your real workflow?

Options: - pre-commit hook — runs locally before commits that add weight files. - CI/CD step — runs on every PR that modifies weight files. - Pre-deployment pipeline — runs before any production deployment that includes a model. - Defense-in-depth: pull request annotation + IDS rule on the model registry mirror's egress traffic.

The cheapest place to wire it is CI; the most-thorough is a pre-pull guard at the registry mirror. Most teams in 2026 do CI as a minimum.

What just happened (debrief)¶

You learned the single highest-ROI supply-chain hygiene practice in AI today: a 30-second scan that catches the bulk of malicious weight files. Three takeaways:

Two tools, complementary. picklescan is purpose-built for pickle bytecode inspection; modelscan covers a broader set of model formats and is opinionated about severity. Run both. Neither is complete.

Static scanners miss advanced obfuscation. The mal-encoded.bin file demonstrated this. Layered defenses matter: isolation on load + signature verification + provenance tracking, not just static scanning.

Wire it once. A CI step that runs picklescan + modelscan on every PR that modifies a weight file (or every weight file pulled by the production deploy pipeline) costs ~10 minutes to set up and pays off for the lifetime of the project.

Extension challenges (optional)¶

Easy. Find a real model on HuggingFace (any popular model with a .bin or .pt artifact) and run the scan against it. Capture the output.
Medium. Wire the scan into a GitHub Actions workflow that fails the PR if either scanner flags. Include in runs/lab4_8/ci.yml.
Hard. Look at the opcode output of picklescan against the mal-encoded.bin file. Identify what was obfuscated and propose a custom check that would have caught it.

References¶

L4.4.2 (theory).
picklescan — https://github.com/mmaitre314/picklescan
modelscan — https://github.com/protectai/modelscan
HuggingFace safetensors — https://huggingface.co/docs/safetensors

Provisioning spec (for lab platform admin)¶

Container base image: aisec/labs-base:0.1 (picklescan + modelscan already in pyproject.toml).

Additional pre-installed files: - /workspace/ai-sec-course/datasets/lab4_8/ — curated weight files - safe-bert.bin — pre-built from distilbert-base-uncased - safe-cnn.pth — a small image classifier from torchvision tests - mal-printstmt.bin — neutralized payload, payload is print("PWNED") only - mal-eval-call.pt — neutralized payload using builtins.eval(...) to a print - mal-encoded.bin — same payload but with codeobject obfuscation - safe-safetensors-conversion.safetensors — same model as safe-bert in safetensors format

Build instructions for the lab admin (one-time, to be included in the companion repo): - Use scripts/lab4_8_build_dataset.py to generate the curated set. The script uses public methods documented in the picklescan / modelscan README and produces neutralized payloads (no real exploit). Output goes to datasets/lab4_8/. Run once per repo update.

Network: No external network needed — all files are pre-staged.

Resource use: - Trivial; scans run in seconds. - Wallclock: 30–45 min including reading.

Notes for platform admin: - The malicious examples in this lab MUST be neutralized (payload = print, no real exploit). Confirm before publishing the image. - Document in the learner-facing lab text that all "malicious" files are educational replicas.