L4.4.1 — Model supply chain attack surface¶
Type: Theory · Duration: ~5 min · Status: Mandatory Module: Module 4 — Data Poisoning, Backdoors & Supply Chain Framework tags: OWASP LLM05 · MITRE ATLAS AML.T0010 (ML Supply Chain Compromise)
Learning objectives¶
- Enumerate the four artifact classes that constitute the model supply chain.
- Identify three concrete attack vectors against model registries.
Core content¶
The four artifact classes in the model supply chain¶
Every production AI system inherits artifacts produced by parties outside the application team. Four classes worth distinguishing:
1. Model weights. The trained parameters. Distributed as files (safetensors, pickle-based .bin/.pt, GGUF). Source: HuggingFace, Ollama registry, vendor APIs (weights-as-service), private registries, peer transfer.
2. Tokenizers and config. Separately-distributed artifacts that pair with the weights. A model + the wrong tokenizer = garbage output. A model + a malicious tokenizer = an attack surface most defenders don't think about.
3. Adapters and fine-tunes. LoRA adapters, full-weight fine-tunes, alignment-removal "uncensored" variants. Small files, easy to publish, easy to typosquat. Often the most-frequently-updated artifact in a production stack.
4. Datasets. Training data, fine-tune data, eval data downloaded from registries (HuggingFace Datasets, Kaggle, government open-data portals). Curated data may itself be poisoned; provenance often opaque.
Three concrete attack vectors against model registries¶
1. Typosquatting. Register a model under a near-name. EleuterAI/gpt-j-6B (typo of EleutherAI). metallama/Llama-3.2 (typo of meta-llama). mistralai-official/Mistral-7B. Users searching imprecisely or copy-pasting from an unreliable source occasionally hit the typo. This is the PoisonGPT attack pattern.
2. Malicious weight files. Upload a model with a weight file that — when loaded — executes arbitrary code. Pickle-based formats (.bin, .pt, .pth) allow this directly because Python pickle deserializes by executing whatever the file says to execute. Lab L4.8 demonstrates scanning for this. Safetensors mitigates by being a non-executable format.
3. Model-card lies. The model card declares "trained on dataset X, evaluated on benchmark Y, no known harms." Reality: trained on Z including private content, evaluated with leakage, planted backdoor. The card is unverified by the registry; nothing prevents lying. Defenders who treat model cards as authoritative are vulnerable.
Why the model supply chain is worse than software supply chain¶
Three differences make AI supply chain harder than classical software supply chain:
- Opacity of artifacts. A binary you can decompile and analyze. A model file is opaque — you can load it and probe behavior, but you cannot statically verify "this model has no backdoor." There's no equivalent of
objdump. - Lack of signing. Most model registries (HuggingFace, Ollama) ship without signature verification by default. Some adoption of signing (Sigstore for models, model signing pilots in 2025), but coverage is sparse.
- Frequency of updates. A model gets fine-tuned daily by many teams. A library gets a CVE-patched release weekly. The frequency-of-change in AI artifacts is high, and the version landscape is more fragmented.
What this means for your stack¶
Inventory question: list every model + tokenizer + adapter + dataset your stack pulls in. For each, answer: - Who published it? - When was it last updated? - Have you verified the publisher (signature, web of trust, organizational affiliation)? - Is it pinned by hash, or by name? - What does your provider's model card claim vs. what you've independently verified?
Most teams in 2026 can't answer these for their full stack. That's the AI-BOM gap. L4.5.2 and lab L4.9 address it.
Real-world example¶
HuggingFace has taken down multiple models containing malicious pickles since 2023 — JFrog Security Research has published several catalogs of the find-and-takedown work. Each takedown removes one instance; the structural risk (any user can upload any model with no per-file scanning before publish) remains in 2026.
Key terms¶
- Model supply chain — the production chain of artifacts your AI stack depends on.
- Typosquatting (model) — the PoisonGPT-style registry typo attack.
- Sigstore for models — emerging signing standard; partial adoption.
References¶
- OWASP LLM05.
- HuggingFace security blog (search "malicious model").
- JFrog Security Research model-takedown writeups.
- ATLAS AML.T0010 page.
Quiz items¶
- Q: Name the four artifact classes in the model supply chain. A: Model weights, tokenizers and config, adapters and fine-tunes, datasets.
- Q: Name three concrete attack vectors against model registries. A: Typosquatting, malicious weight files (pickle deserialization), model-card lies.
- Q: Why is the AI supply chain harder than classical software supply chain? A: Opacity of artifacts (no
objdumpfor models), lack of signing, high frequency of updates.
Video script (~560 words, ~4 min)¶
[SLIDE 1 — Title]
Model supply chain attack surface. Five minutes. By the end you'll know the four artifact classes, three concrete attack vectors against registries, and why AI supply chain is harder than classical software supply chain.
[SLIDE 2 — Four artifact classes]
Every production AI system inherits artifacts produced by parties outside the application team. Four classes. One: model weights — the trained parameters, distributed as files. Safetensors, pickle-based, GGUF. Source: HuggingFace, Ollama, vendor APIs, private registries. Two: tokenizers and config — separately distributed, pair with weights. A model plus the wrong tokenizer equals garbage output. A model plus a malicious tokenizer equals an attack surface most defenders don't think about. Three: adapters and fine-tunes — LoRA, full-weight fine-tunes, "uncensored" variants. Small files. Easy to publish. Easy to typosquat. Often the most-frequently-updated artifact in a production stack. Four: datasets — training, fine-tune, eval data downloaded from registries. Curated data may itself be poisoned. Provenance often opaque.
[SLIDE 3 — Vector 1: Typosquatting]
Three concrete attack vectors. One: typosquatting. Register a model under a near-name. "Eleuter-AI slash gpt-j-6B" — typo of EleutherAI. "metallama slash Llama-3.2" — typo of meta-llama. Users searching imprecisely or copy-pasting from an unreliable source occasionally hit the typo. This is the PoisonGPT attack pattern.
[SLIDE 4 — Vector 2: Malicious weights]
Two: malicious weight files. Upload a model with a weight file that — when loaded — executes arbitrary code. Pickle-based formats — dot-bin, dot-pt, dot-pth — allow this directly because Python pickle deserializes by executing whatever the file says to execute. Lab L4.8 demonstrates scanning for this. Safetensors mitigates by being a non-executable format.
[SLIDE 5 — Vector 3: Model-card lies]
Three: model-card lies. The model card declares "trained on dataset X, evaluated on benchmark Y, no known harms." Reality: trained on Z including private content, evaluated with leakage, planted backdoor. The card is unverified by the registry. Nothing prevents lying. Defenders who treat model cards as authoritative are vulnerable.
[SLIDE 6 — Why AI supply chain is worse]
Three reasons AI supply chain is harder than classical software supply chain. Opacity of artifacts — a binary you can decompile and analyze. A model file is opaque — you can load it and probe behavior, but you cannot statically verify "this model has no backdoor." No equivalent of objdump. Lack of signing — most model registries ship without signature verification by default. Some adoption of Sigstore for models, but coverage is sparse. Frequency of updates — a model gets fine-tuned daily by many teams. The frequency-of-change in AI artifacts is high; version landscape is fragmented.
[SLIDE 7 — What this means for your stack]
Inventory question. List every model, tokenizer, adapter, dataset your stack pulls in. For each: who published it, when was it last updated, have you verified the publisher, is it pinned by hash or by name, what does your provider's model card claim vs. what you've verified. Most teams in twenty-twenty-six can't answer these for their full stack. That's the AI-BOM gap. L4.5.2 and lab L4.9 address it.
[SLIDE 8 — Up next]
Next lesson: pickle deserialization and weight-format risk. Five minutes. We then walk model-card lies, dependency risk, and AI-BOM. See you there.
Slide outline¶
- Title — "Model supply chain attack surface".
- Four artifact classes — quadrant: Weights · Tokenizer/config · Adapters · Datasets.
- Typosquatting — "EleutherAI" vs "Eleuter-AI" with subtle highlight.
- Malicious weights — pickle-deserialize code execution illustration.
- Model-card lies — split: "claims" vs "reality" columns.
- Why AI supply chain is worse — three callout cards: opacity · no signing · update freq.
- Inventory question — five-question checklist for each artifact.
- Up next — "L4.4.2 — Pickle and weight-format risk, ~5 min."
Production notes¶
- Recording: ~4 min. Cap 5.
- Slide 3 typo visual must land — animate the typo highlight if possible.