Skip to content

L8.5.1 — Model cards: structure, audience, common gaps

Type: Theory · Duration: ~5 min · Status: Mandatory Module: Module 8 — AI Governance, Risk & Compliance Framework tags: NIST AI RMF Govern 1.3, Map 1.6 · EU AI Act Article 11 (technical documentation, Annex IV §1 + 2)

Learning objectives

  1. Recall the eight sections of a standard model card (Mitchell et al. 2019 + 2026 extensions).
  2. Identify two common audience confusions when writing model cards.

Core content

Definition + history

A model card is a structured document accompanying a trained ML model that describes its characteristics, intended use, limitations, evaluation results, and ethical considerations. Originated in Mitchell et al. (FAT* 2019); now ubiquitous on model registries (HuggingFace, Ollama, vendor catalogs).

Model cards exist in tension between: - Technical accuracy (what the model actually does). - Marketing (what the publisher wants you to believe). - Legal (what the publisher is willing to commit to).

Reading model cards critically is a skill (Module 1 L1.6 lab covered this). Writing them well — what this lesson covers — is the producer-side skill.

The eight sections of a model card

The Mitchell et al. structure, extended for 2026 practice:

1. Model Details. - Person/organization developing the model. - Model date. - Model version. - Model type (architecture). - Information about training algorithms, parameters, fairness constraints. - Paper or other resource for more information. - Citation details. - License.

2. Intended Use. - Primary intended uses. - Primary intended users. - Out-of-scope use cases.

3. Factors. - Relevant factors (groups, instrumentation, environments). - Evaluation factors.

4. Metrics. - Model performance measures. - Decision thresholds. - Variation approaches.

5. Evaluation Data. - Datasets. - Motivation. - Preprocessing.

6. Training Data. - Same fields as evaluation data, applied to training. - Particularly important to document for LLMs (training corpus characteristics).

7. Quantitative Analyses. - Unitary results (single-factor performance). - Intersectional results (multi-factor performance — e.g., performance by demographic intersections).

8. Ethical Considerations. - Risks and harms. - Use cases where the model should not be used.

2026 extensions (most-adopted additions): - Safety eval results — adversarial robustness, jailbreak resistance, refusal rates on harmful prompts (Module 7 territory). - AI-BOM extract — the model's own supply-chain (training data, base model lineage, key dependencies). - Acceptable Use Policy reference — link to formal AUP. - EU AI Act risk-tier classification — if applicable, the publisher's assessment of the tier their model falls under for typical downstream uses. - Carbon and energy disclosure — emerging norm; EU AI Act references.

Two common audience confusions

When writing a model card, two audiences pull in opposite directions:

Confusion 1: Engineer-vs-Procurement. Engineers want technical depth — architecture, training-data composition, evaluation methodology. Procurement reviewers want signal — "is this model safe to use for our case." Trying to satisfy both with the same content produces a card that's too dense for procurement and too shallow for engineers.

Fix: structure the card with a procurement-friendly summary at the top (1-2 pages) and the technical depth below. Section ordering: Intended Use + key safety results first; Model Details and Training Data later.

Confusion 2: Internal-vs-External audience. Internal engineering audiences read model cards to make build decisions ("should I fine-tune from this base?"). External customer audiences read to make procurement decisions. Same card serves both roles if it errs toward external clarity.

Fix: write the card as if the most-skeptical-external-reader is the audience. Internal engineers can ask follow-up questions; external audiences can't.

Common gaps to avoid

Three failure modes that frequently surface in model-card reviews:

1. Training data is vague. "Publicly available data" or "a mix of public and licensed sources" without further detail. Acceptable for some commercial reasons; signal a lack of provenance discipline.

2. Out-of-scope uses missing or generic. "Don't use for malicious purposes" is unhelpful. Specific out-of-scope cases ("not validated for medical diagnostic use," "not validated for autonomous decision-making in safety-critical domains") are useful.

3. Quantitative analyses skip intersectional breakdowns. Aggregate accuracy is reported; per-demographic / per-domain / per-language performance isn't. Increasing pressure (NIST AI RMF Govern 3.1, EU AI Act Article 10) to disclose intersectional results.

What the L8.7 lab walks

L8.7 has you write a complete model card for the M1 RAG app + accompanying risk assessment. By the end you have a real example.

Real-world example

HuggingFace's model-card UI prompts authors to fill in the standard sections. Adoption varies dramatically by publisher — Meta/OpenAI/Anthropic/Google ship comprehensive cards; many community publishers ship minimal cards. The well-written cards are useful references; reading 3-5 of them is the fastest way to internalize what good looks like.

Key terms

  • Model card — structured documentation accompanying a trained ML model.
  • Mitchell et al. structure — the 2019 eight-section framework.
  • 2026 extensions — safety eval results, AI-BOM extract, AUP, EU AI Act tier, carbon disclosure.
  • Intersectional analysis — per-demographic / per-subgroup performance reporting.

References

  • Mitchell et al., "Model Cards for Model Reporting" (FAT* 2019).
  • HuggingFace Model Card guide.
  • Anthropic / OpenAI model cards (public).

Quiz items

  1. Q: Name the eight sections of a standard model card. A: Model Details, Intended Use, Factors, Metrics, Evaluation Data, Training Data, Quantitative Analyses, Ethical Considerations.
  2. Q: Name three 2026 extensions to the standard model-card structure. A: Any three of: safety eval results, AI-BOM extract, Acceptable Use Policy reference, EU AI Act risk-tier classification, carbon and energy disclosure.
  3. Q: What's the right structural fix for the engineer-vs-procurement audience confusion? A: Structure the card with a procurement-friendly summary at the top (1-2 pages) — Intended Use + key safety results first — and technical depth below. Errs toward external clarity.

Video script (~600 words, ~4.5 min)

[SLIDE 1 — Title]

Model cards: structure, audience, common gaps. Five minutes.

[SLIDE 2 — Definition + history]

A model card is a structured document accompanying a trained ML model that describes its characteristics, intended use, limitations, evaluation results, and ethical considerations. Originated in Mitchell et al, FAT* 2019. Now ubiquitous on model registries.

Model cards exist in tension between technical accuracy — what the model actually does — marketing — what the publisher wants you to believe — and legal — what the publisher is willing to commit to. Reading critically is a skill. Writing well — this lesson — is the producer-side skill.

[SLIDE 3 — Eight sections]

Eight sections from the Mitchell et al structure. Model Details — developer, version, type, training info, license. Intended Use — primary intended uses, primary intended users, out-of-scope use cases. Factors — relevant factors and evaluation factors. Metrics — performance measures, decision thresholds, variation approaches. Evaluation Data — datasets, motivation, preprocessing. Training Data — same fields applied to training; particularly important for LLMs. Quantitative Analyses — unitary results and intersectional results. Ethical Considerations — risks and harms; use cases where the model should not be used.

[SLIDE 4 — 2026 extensions]

Most-adopted 2026 additions to the standard structure. Safety eval results — adversarial robustness, jailbreak resistance, refusal rates on harmful prompts. AI-BOM extract — the model's own supply-chain. Acceptable Use Policy reference. EU AI Act risk-tier classification if applicable. Carbon and energy disclosure — emerging norm; EU AI Act references.

[SLIDE 5 — Audience confusion 1: Engineer vs Procurement]

Two audience confusions. One: engineer-vs-procurement. Engineers want technical depth. Procurement wants signal — "is this safe to use for our case." Trying to satisfy both with the same content produces a card that's too dense for procurement, too shallow for engineers.

Fix: structure with a procurement-friendly summary at the top — 1-2 pages — and technical depth below. Section ordering: Intended Use plus key safety results first. Model Details and Training Data later.

[SLIDE 6 — Audience confusion 2: Internal vs External]

Two: internal-vs-external. Internal engineering reads model cards for build decisions. External customer audiences read for procurement decisions. Same card serves both if it errs toward external clarity.

Fix: write the card as if the most-skeptical-external-reader is the audience. Internal engineers can ask follow-up questions. External audiences can't.

[SLIDE 7 — Common gaps]

Three common gaps. Training data is vague — "publicly available data" without further detail. Signal of lack of provenance discipline. Out-of-scope uses missing or generic — "don't use for malicious purposes" is unhelpful. Specific cases — "not validated for medical diagnostic use" — are useful. Quantitative analyses skip intersectional breakdowns — aggregate accuracy reported, per-demographic per-domain per-language performance isn't. Increasing pressure to disclose intersectional results.

[SLIDE 8 — L8.7 + up next]

L8.7 has you write a complete model card for the M1 RAG app plus accompanying risk assessment. By the end you have a real example.

Next: system cards, data sheets, AI-BOM revisited. Five minutes.

Slide outline

  1. Title — "Model cards: structure, audience, common gaps".
  2. Definition + history — tension diagram (technical / marketing / legal).
  3. Eight sections — eight-card grid.
  4. 2026 extensions — five-bullet additions list.
  5. Audience confusion 1 — engineer-vs-procurement split with fix.
  6. Audience confusion 2 — internal-vs-external split with fix.
  7. Common gaps — three red-flag cards.
  8. L8.7 + up next — lab callout + pointer.

Production notes

  • Recording: ~4.5 min. Cap 5.
  • Slide 3 (eight sections) is the lesson's reference artifact.