L8.5.1 — Model cards: structure, audience, common gaps¶
Type: Theory · Duration: ~5 min · Status: Mandatory Module: Module 8 — AI Governance, Risk & Compliance Framework tags: NIST AI RMF Govern 1.3, Map 1.6 · EU AI Act Article 11 (technical documentation, Annex IV §1 + 2)
Learning objectives¶
- Recall the eight sections of a standard model card (Mitchell et al. 2019 + 2026 extensions).
- Identify two common audience confusions when writing model cards.
Core content¶
Definition + history¶
A model card is a structured document accompanying a trained ML model that describes its characteristics, intended use, limitations, evaluation results, and ethical considerations. Originated in Mitchell et al. (FAT* 2019); now ubiquitous on model registries (HuggingFace, Ollama, vendor catalogs).
Model cards exist in tension between: - Technical accuracy (what the model actually does). - Marketing (what the publisher wants you to believe). - Legal (what the publisher is willing to commit to).
Reading model cards critically is a skill (Module 1 L1.6 lab covered this). Writing them well — what this lesson covers — is the producer-side skill.
The eight sections of a model card¶
The Mitchell et al. structure, extended for 2026 practice:
1. Model Details. - Person/organization developing the model. - Model date. - Model version. - Model type (architecture). - Information about training algorithms, parameters, fairness constraints. - Paper or other resource for more information. - Citation details. - License.
2. Intended Use. - Primary intended uses. - Primary intended users. - Out-of-scope use cases.
3. Factors. - Relevant factors (groups, instrumentation, environments). - Evaluation factors.
4. Metrics. - Model performance measures. - Decision thresholds. - Variation approaches.
5. Evaluation Data. - Datasets. - Motivation. - Preprocessing.
6. Training Data. - Same fields as evaluation data, applied to training. - Particularly important to document for LLMs (training corpus characteristics).
7. Quantitative Analyses. - Unitary results (single-factor performance). - Intersectional results (multi-factor performance — e.g., performance by demographic intersections).
8. Ethical Considerations. - Risks and harms. - Use cases where the model should not be used.
2026 extensions (most-adopted additions): - Safety eval results — adversarial robustness, jailbreak resistance, refusal rates on harmful prompts (Module 7 territory). - AI-BOM extract — the model's own supply-chain (training data, base model lineage, key dependencies). - Acceptable Use Policy reference — link to formal AUP. - EU AI Act risk-tier classification — if applicable, the publisher's assessment of the tier their model falls under for typical downstream uses. - Carbon and energy disclosure — emerging norm; EU AI Act references.
Two common audience confusions¶
When writing a model card, two audiences pull in opposite directions:
Confusion 1: Engineer-vs-Procurement. Engineers want technical depth — architecture, training-data composition, evaluation methodology. Procurement reviewers want signal — "is this model safe to use for our case." Trying to satisfy both with the same content produces a card that's too dense for procurement and too shallow for engineers.
Fix: structure the card with a procurement-friendly summary at the top (1-2 pages) and the technical depth below. Section ordering: Intended Use + key safety results first; Model Details and Training Data later.
Confusion 2: Internal-vs-External audience. Internal engineering audiences read model cards to make build decisions ("should I fine-tune from this base?"). External customer audiences read to make procurement decisions. Same card serves both roles if it errs toward external clarity.
Fix: write the card as if the most-skeptical-external-reader is the audience. Internal engineers can ask follow-up questions; external audiences can't.
Common gaps to avoid¶
Three failure modes that frequently surface in model-card reviews:
1. Training data is vague. "Publicly available data" or "a mix of public and licensed sources" without further detail. Acceptable for some commercial reasons; signal a lack of provenance discipline.
2. Out-of-scope uses missing or generic. "Don't use for malicious purposes" is unhelpful. Specific out-of-scope cases ("not validated for medical diagnostic use," "not validated for autonomous decision-making in safety-critical domains") are useful.
3. Quantitative analyses skip intersectional breakdowns. Aggregate accuracy is reported; per-demographic / per-domain / per-language performance isn't. Increasing pressure (NIST AI RMF Govern 3.1, EU AI Act Article 10) to disclose intersectional results.
What the L8.7 lab walks¶
L8.7 has you write a complete model card for the M1 RAG app + accompanying risk assessment. By the end you have a real example.
Real-world example¶
HuggingFace's model-card UI prompts authors to fill in the standard sections. Adoption varies dramatically by publisher — Meta/OpenAI/Anthropic/Google ship comprehensive cards; many community publishers ship minimal cards. The well-written cards are useful references; reading 3-5 of them is the fastest way to internalize what good looks like.
Key terms¶
- Model card — structured documentation accompanying a trained ML model.
- Mitchell et al. structure — the 2019 eight-section framework.
- 2026 extensions — safety eval results, AI-BOM extract, AUP, EU AI Act tier, carbon disclosure.
- Intersectional analysis — per-demographic / per-subgroup performance reporting.
References¶
- Mitchell et al., "Model Cards for Model Reporting" (FAT* 2019).
- HuggingFace Model Card guide.
- Anthropic / OpenAI model cards (public).
Quiz items¶
- Q: Name the eight sections of a standard model card. A: Model Details, Intended Use, Factors, Metrics, Evaluation Data, Training Data, Quantitative Analyses, Ethical Considerations.
- Q: Name three 2026 extensions to the standard model-card structure. A: Any three of: safety eval results, AI-BOM extract, Acceptable Use Policy reference, EU AI Act risk-tier classification, carbon and energy disclosure.
- Q: What's the right structural fix for the engineer-vs-procurement audience confusion? A: Structure the card with a procurement-friendly summary at the top (1-2 pages) — Intended Use + key safety results first — and technical depth below. Errs toward external clarity.
Video script (~600 words, ~4.5 min)¶
[SLIDE 1 — Title]
Model cards: structure, audience, common gaps. Five minutes.
[SLIDE 2 — Definition + history]
A model card is a structured document accompanying a trained ML model that describes its characteristics, intended use, limitations, evaluation results, and ethical considerations. Originated in Mitchell et al, FAT* 2019. Now ubiquitous on model registries.
Model cards exist in tension between technical accuracy — what the model actually does — marketing — what the publisher wants you to believe — and legal — what the publisher is willing to commit to. Reading critically is a skill. Writing well — this lesson — is the producer-side skill.
[SLIDE 3 — Eight sections]
Eight sections from the Mitchell et al structure. Model Details — developer, version, type, training info, license. Intended Use — primary intended uses, primary intended users, out-of-scope use cases. Factors — relevant factors and evaluation factors. Metrics — performance measures, decision thresholds, variation approaches. Evaluation Data — datasets, motivation, preprocessing. Training Data — same fields applied to training; particularly important for LLMs. Quantitative Analyses — unitary results and intersectional results. Ethical Considerations — risks and harms; use cases where the model should not be used.
[SLIDE 4 — 2026 extensions]
Most-adopted 2026 additions to the standard structure. Safety eval results — adversarial robustness, jailbreak resistance, refusal rates on harmful prompts. AI-BOM extract — the model's own supply-chain. Acceptable Use Policy reference. EU AI Act risk-tier classification if applicable. Carbon and energy disclosure — emerging norm; EU AI Act references.
[SLIDE 5 — Audience confusion 1: Engineer vs Procurement]
Two audience confusions. One: engineer-vs-procurement. Engineers want technical depth. Procurement wants signal — "is this safe to use for our case." Trying to satisfy both with the same content produces a card that's too dense for procurement, too shallow for engineers.
Fix: structure with a procurement-friendly summary at the top — 1-2 pages — and technical depth below. Section ordering: Intended Use plus key safety results first. Model Details and Training Data later.
[SLIDE 6 — Audience confusion 2: Internal vs External]
Two: internal-vs-external. Internal engineering reads model cards for build decisions. External customer audiences read for procurement decisions. Same card serves both if it errs toward external clarity.
Fix: write the card as if the most-skeptical-external-reader is the audience. Internal engineers can ask follow-up questions. External audiences can't.
[SLIDE 7 — Common gaps]
Three common gaps. Training data is vague — "publicly available data" without further detail. Signal of lack of provenance discipline. Out-of-scope uses missing or generic — "don't use for malicious purposes" is unhelpful. Specific cases — "not validated for medical diagnostic use" — are useful. Quantitative analyses skip intersectional breakdowns — aggregate accuracy reported, per-demographic per-domain per-language performance isn't. Increasing pressure to disclose intersectional results.
[SLIDE 8 — L8.7 + up next]
L8.7 has you write a complete model card for the M1 RAG app plus accompanying risk assessment. By the end you have a real example.
Next: system cards, data sheets, AI-BOM revisited. Five minutes.
Slide outline¶
- Title — "Model cards: structure, audience, common gaps".
- Definition + history — tension diagram (technical / marketing / legal).
- Eight sections — eight-card grid.
- 2026 extensions — five-bullet additions list.
- Audience confusion 1 — engineer-vs-procurement split with fix.
- Audience confusion 2 — internal-vs-external split with fix.
- Common gaps — three red-flag cards.
- L8.7 + up next — lab callout + pointer.
Production notes¶
- Recording: ~4.5 min. Cap 5.
- Slide 3 (eight sections) is the lesson's reference artifact.