L4.5.2 — AI-BOM and provenance tracking¶
Type: Theory · Duration: ~5 min · Status: Mandatory Module: Module 4 — Data Poisoning, Backdoors & Supply Chain Framework tags: OWASP LLM05 · MITRE ATLAS AML.T0010 · NIST AI RMF Govern 1.3, Map 4.1
Learning objectives¶
- Define AI-BOM and list the seven artifact classes it inventories.
- Identify CycloneDX-AI as the emerging standard format and one tool that emits it.
Core content¶
What an AI-BOM is¶
An AI Bill of Materials (AI-BOM) is an inventory artifact that lists every component of an AI system along with its provenance: where it came from, what version it is, who signed it (if anyone), what license it carries, and how it connects to other components. Same idea as a Software Bill of Materials (SBOM), extended to AI-specific artifacts.
Without an AI-BOM you can't credibly answer: - "What's in our stack?" (asked by a CISO before a launch.) - "Are we affected by this newly-disclosed model backdoor?" (asked during incident response.) - "Show me your AI artifact provenance." (asked by an EU AI Act auditor under Article 11 documentation requirements.) - "Who is liable for this component's failure mode?" (asked by procurement/legal.)
With one, those answers are five-minute reports rather than two-week archaeology.
The seven artifact classes an AI-BOM should cover¶
For each AI feature in your product, the AI-BOM should inventory:
- Base models — provider, name, version/hash, license, retrieval date, signature status.
- Adapters / fine-tunes — same fields. Especially important: who produced this fine-tune?
- Tokenizers — usually bundled with models but explicit listing matters for tokenizer-mismatch incidents.
- Datasets used in training/fine-tune — provider, version, sourcing methodology, sensitivity classification.
- Embedding models — same fields as base models.
- Vector store / retrieval index — corpus source(s), refresh cadence, write population.
- Inference libraries and orchestration framework — same as a classical SBOM line item, but kept distinct so audits can map AI-specific advisories.
CycloneDX-AI: the emerging standard¶
CycloneDX is an open SBOM standard maintained by OWASP. Its AI/ML extension (sometimes called CycloneDX-AI or "ML-BOM") adds AI-specific fields: model metadata, training-data references, evaluation metrics, energy consumption. As of 2026, CycloneDX-AI is the most-adopted AI-BOM format.
Tools that emit it:
- cyclonedx-py (Python SBOM generator with AI extensions).
- trivy (Aqua Security; classical SBOM + emerging AI artifact support).
- Vendor-specific: HuggingFace Hub exposes per-model metadata; some teams script CycloneDX output from this.
Lab L4.9 walks the generation.
What "good" looks like¶
An AI-BOM is useful only if it's: - Auto-generated. Manual lists rot within weeks; the BOM should be produced by tooling on every release. - Versioned. Stored as a versioned artifact (in your release pipeline or a dedicated artifact registry). - Queryable. Searchable for "which of our products use model X?" or "which models do we use that haven't been updated in 12 months?" - Linked to upstream advisory feeds. When a CVE or model advisory drops upstream, your BOM is what tells you whether you're exposed.
When to start¶
If you ship any AI feature to enterprise or regulated customers, start now. EU AI Act Article 11 documentation obligations (high-risk systems) effectively require it. U.S. federal contracting increasingly references SBOM (and by extension AI-BOM) requirements (NIST AI RMF Govern 1.3 + Map 4.1 cite). The lift to start is small; the cost of not having one when first asked is large.
Real-world example¶
Anthropic, OpenAI, and other foundation-model vendors publish "system cards" / "model cards" for their flagship releases — these are vendor-side AI-BOM-equivalent artifacts at the model level. Enterprise AI buyers in 2024–2025 increasingly required AI-BOMs from their vendors as a procurement gate. The market has converged on "you have one, or you don't sell into enterprise."
Key terms¶
- AI-BOM — AI Bill of Materials.
- CycloneDX-AI — open standard format; OWASP-maintained.
- Auto-generated BOM — BOM produced by tooling, not hand-maintained.
References¶
- CycloneDX project — https://cyclonedx.org/
- CycloneDX AI/ML extension — https://cyclonedx.org/capabilities/mlbom/
- NIST AI RMF Govern 1.3, Map 4.1 — documentation expectations.
- EU AI Act Article 11 (high-risk system documentation).
Quiz items¶
- Q: Name five of the seven artifact classes an AI-BOM should inventory. A: Any five of: base models, adapters/fine-tunes, tokenizers, datasets, embedding models, vector store / retrieval index, inference libraries & orchestration framework.
- Q: What standard format is most-adopted for AI-BOM in 2026? A: CycloneDX-AI (CycloneDX with the AI/ML extension), OWASP-maintained.
- Q: Why does an AI-BOM need to be auto-generated rather than hand-maintained? A: Because manual lists rot within weeks; the BOM is only useful if it's accurate, and accuracy at production cadence requires tooling.
Video script (~580 words, ~4 min)¶
[SLIDE 1 — Title]
AI-BOM and provenance tracking. Five minutes. By the end you'll know what an AI-BOM is, the seven artifact classes it should cover, and the emerging standard format.
[SLIDE 2 — What an AI-BOM is]
An AI Bill of Materials is an inventory artifact that lists every component of an AI system along with its provenance: where it came from, what version, who signed it, what license, how it connects to other components. Same idea as a software SBOM, extended to AI-specific artifacts.
Without one, you can't credibly answer: what's in our stack — asked by a CISO before launch. Are we affected by this newly-disclosed model backdoor — asked during incident response. Show me your AI artifact provenance — asked by an EU AI Act auditor under Article 11 documentation. Who is liable for this component's failure mode — asked by procurement and legal. With one, those answers are five-minute reports rather than two-week archaeology.
[SLIDE 3 — Seven artifact classes]
For each AI feature in your product, the AI-BOM should inventory seven artifact classes. Base models — provider, name, version, license, retrieval date, signature status. Adapters and fine-tunes — same fields, especially who produced this fine-tune. Tokenizers — usually bundled but explicit listing matters for mismatch incidents. Datasets used in training and fine-tune — provider, version, sourcing methodology, sensitivity classification. Embedding models — same fields as base models. Vector store and retrieval index — corpus sources, refresh cadence, write population. Inference libraries and orchestration framework — same as a classical SBOM line item but kept distinct so AI-specific advisories map.
[SLIDE 4 — CycloneDX-AI]
CycloneDX-AI is the emerging standard. CycloneDX is an open SBOM standard maintained by OWASP. The AI-slash-ML extension — sometimes called CycloneDX-AI or ML-BOM — adds AI-specific fields: model metadata, training-data references, evaluation metrics, energy consumption. As of twenty-twenty-six, the most-adopted AI-BOM format.
Tools that emit it: cyclonedx-py — Python SBOM generator with AI extensions. trivy — Aqua Security, classical SBOM plus emerging AI artifact support. Vendor-specific: HuggingFace Hub exposes per-model metadata; some teams script CycloneDX output. Lab L4.9 walks generation.
[SLIDE 5 — What good looks like]
An AI-BOM is useful only if it's auto-generated — manual lists rot within weeks. The BOM should be produced by tooling on every release. Versioned — stored as a versioned artifact in your release pipeline. Queryable — searchable for "which of our products use model X" or "which models haven't been updated in 12 months." Linked to upstream advisory feeds — when a CVE or model advisory drops upstream, your BOM tells you whether you're exposed.
[SLIDE 6 — When to start]
If you ship any AI feature to enterprise or regulated customers, start now. EU AI Act Article 11 documentation obligations effectively require it. U.S. federal contracting increasingly references SBOM and by extension AI-BOM. NIST AI RMF Govern 1.3 and Map 4.1 cite. The lift to start is small. The cost of not having one when first asked is large.
[SLIDE 7 — Up next]
All theory done. Four labs next. L4.6 you poison a sentiment classifier. L4.7 plant a backdoor trigger. L4.8 scan HuggingFace for malicious pickles. L4.9 optional, build an AI-BOM. See you there.
Slide outline¶
- Title — "AI-BOM and provenance tracking".
- What an AI-BOM is — four-question card showing CISO/IR/auditor/legal questions and "5-min report vs 2-week archaeology" callout.
- Seven artifact classes — seven-card grid.
- CycloneDX-AI — OWASP logo + CycloneDX logo; tool names below.
- What good looks like — four-bullet checklist.
- When to start — flowchart: "Do you ship to enterprise/regulated? → yes → start now."
- Up next — pointer to L4.6 + lab block.
Production notes¶
- Recording: ~4 min. Cap 5.
- Slide 3 (seven artifact classes) is the slide learners will reference most when building their own BOM. Make it readable as a standalone reference.