L7.5.1 — AI red-team program design¶
Type: Theory · Duration: ~5 min · Status: Mandatory Module: Module 7 — Securing the AI Pipeline (MLSecOps & Defenses) Framework tags: NIST AI RMF Measure 2.7, Manage 1.1 · EU AI Act Article 9
Learning objectives¶
- Articulate the four functions of an AI red-team program (pre-launch, periodic, incident-driven, continuous).
- Identify the org-structure questions to resolve when standing one up.
Core content¶
What an AI red-team program is¶
A coordinated, ongoing function within an engineering org that adversarially tests AI systems for security and safety failures, with documented findings and remediation tracking. Distinct from one-off pentest engagements; AI red-team is a standing capability, not an event.
Four functions¶
1. Pre-launch red-team. Before any AI feature ships (or before a meaningful update goes live), a red-team campaign runs against it. Scope, methodology, severity rubric are defined per the system. Findings triaged; high-severity blocks launch. This is the model-selection / pre-launch gate from L7.1.1.
2. Periodic red-team. Standing red-team activity against deployed systems, on a cadence (quarterly is common). Updates the threat model with new attacks (the field moves fast); catches drift in deployed defenses; produces evidence for auditors.
3. Incident-driven red-team. After an AI incident, focused red-team activity probing the same vulnerability class. Confirms the fix; finds adjacent issues; informs the eval suite update.
4. Continuous (automated) red-team. Tools like Garak, PyRIT, promptfoo (next lesson) running in CI/CD against every model and prompt change. Catches regressions; provides daily-cadence signal.
The four functions complement: pre-launch is gate; periodic is hygiene; incident-driven is reactive; continuous is regression prevention.
Org-structure questions to resolve¶
When standing up a program, six decisions to make:
1. Reports to whom. Common patterns: CISO (treats AI security as a security discipline), Chief AI Officer (treats it as an AI program), Head of Engineering (treats it as engineering practice). No single right answer; what matters is having an owner with budget and escalation path.
2. Centralized vs distributed. Centralized: one team red-teams all AI products. Distributed: each product team has an embedded red-teamer. Hybrid (central team + per-product partners) is common at scale.
3. In-house vs external. External vendors for ad-hoc engagements + in-house for continuous work is the dominant pattern. Pure-external doesn't build internal capability; pure-in-house misses external perspective.
4. Authority and scope. What can the red-team do? Access production systems? Spin up test environments? Block launches? Decisions must be documented and approved by leadership; the rules of engagement are non-trivial for AI work.
5. Severity rubric. How are findings rated? A finding rubric specific to AI risks (not just CVSS-equivalent) — covering data-leakage, agent-escape, jailbreak surface, etc.
6. Remediation tracking. Findings go into a tracker (Linear, Jira, etc.) with severity, owner, due date. Closed when verified by re-test.
What a program produces¶
Three artifacts per cadence:
- Pre-launch report. Per AI feature launch: scope tested, findings, recommendation (launch / launch with mitigations / hold).
- Periodic report. Per cadence (quarterly): trend of findings, defense effectiveness over time, threat-landscape update.
- Incident report. Per incident: chain of attack, root cause, remediation, eval-suite update.
These artifacts feed both engineering decisions and governance (Module 8). They're also the evidence base for compliance (EU AI Act Article 9 risk management system).
What good looks like in 2026¶
A mature AI red-team program in 2026 has: - Named owner with org authority and budget. - All four functions operating (pre-launch, periodic, incident-driven, continuous). - Documented severity rubric, rules of engagement, remediation SLAs. - Continuous tooling integrated into CI/CD (next lesson covers). - Quarterly cadence on at least one report shared with leadership. - Annual external red-team engagement for independent verification.
Most organizations in 2026 have some of this; few have all. Building toward the full set is a multi-year program for most companies — but starting with continuous tooling + pre-launch gate is a meaningful first quarter.
Real-world example¶
OpenAI, Anthropic, Google, and Microsoft all publicly describe their AI red-team programs (in blog posts, model release reports, and now in some cases regulatory filings). The patterns vary; the common shape is: dedicated team + pre-launch gates + periodic deep-dives + post-incident reviews. Enterprise customers in 2025–2026 increasingly require evidence of AI red-team programs from their vendors.
Key terms¶
- AI red-team program — standing capability, not one-off engagement.
- Four functions — pre-launch, periodic, incident-driven, continuous.
- Rules of engagement — documented scope and authority.
- Severity rubric — AI-specific finding rating.
References¶
- OpenAI red-team release reports (per major model).
- Anthropic Responsible Scaling Policy (covers red-team scope).
- "Red-Teaming Language Models" (Perez et al., 2022).
Quiz items¶
- Q: Name the four functions of an AI red-team program. A: Pre-launch red-team, periodic red-team, incident-driven red-team, continuous (automated) red-team.
- Q: What's the dominant in-house vs external pattern in 2026? A: External vendors for ad-hoc engagements + in-house for continuous work. Pure-external doesn't build internal capability; pure-in-house misses external perspective.
- Q: Name three artifacts a mature program produces. A: Pre-launch reports (per launch), periodic reports (per cadence), incident reports (per incident).
Video script (~600 words, ~4.5 min)¶
[SLIDE 1 — Title]
AI red-team program design. Five minutes.
[SLIDE 2 — What an AI red-team program is]
A coordinated, ongoing function within an engineering org that adversarially tests AI systems for security and safety failures, with documented findings and remediation tracking. Distinct from one-off pentest engagements. AI red-team is a standing capability, not an event.
[SLIDE 3 — Four functions]
Four functions. One: pre-launch red-team. Before any AI feature ships, a red-team campaign runs against it. Scope, methodology, severity rubric defined per system. Findings triaged. High-severity blocks launch. Pre-launch gate from L7.1.1. Two: periodic red-team. Standing activity against deployed systems on a cadence — quarterly is common. Updates threat model. Catches drift in deployed defenses. Produces evidence for auditors. Three: incident-driven red-team. After an AI incident, focused activity probing the same vulnerability class. Confirms the fix. Finds adjacent issues. Four: continuous, automated red-team. Tools like Garak, PyRIT, promptfoo — next lesson — running in CI/CD. Catches regressions.
Four functions complement. Pre-launch is gate. Periodic is hygiene. Incident-driven is reactive. Continuous is regression prevention.
[SLIDE 4 — Six org-structure decisions, part 1]
Six decisions to make when standing one up. One: reports to whom. CISO, Chief AI Officer, Head of Engineering. No single right answer; what matters is having an owner with budget and escalation path. Two: centralized vs distributed. One team red-teams all products; or each product team has embedded red-teamer; or hybrid. Hybrid common at scale. Three: in-house vs external. External vendors for ad-hoc, in-house for continuous, is the dominant pattern.
[SLIDE 5 — Six decisions, part 2]
Four: authority and scope. What can the red-team do? Access production? Spin up test environments? Block launches? Documented rules of engagement; non-trivial for AI work. Five: severity rubric. AI-specific risks (data-leakage, agent-escape, jailbreak surface), not just CVSS-equivalent. Six: remediation tracking. Findings into Linear, Jira, etc. with severity, owner, due date. Closed when verified by re-test.
[SLIDE 6 — Three artifacts]
Three artifacts per cadence. Pre-launch report — per feature launch: scope, findings, recommendation. Periodic report — per cadence: trend, defense effectiveness, threat-landscape update. Incident report — per incident: chain, root cause, remediation, eval-suite update.
Artifacts feed engineering decisions and governance — Module 8. Evidence base for compliance — EU AI Act Article 9 risk management system.
[SLIDE 7 — What good looks like in 2026]
Mature AI red-team program in twenty-twenty-six. Named owner with org authority and budget. All four functions operating. Documented severity rubric, ROE, remediation SLAs. Continuous tooling integrated into CI/CD. Quarterly cadence on at least one report shared with leadership. Annual external red-team engagement for independent verification.
Most organizations have some of this. Few have all. Building toward the full set is a multi-year program. Starting with continuous tooling plus pre-launch gate is a meaningful first quarter.
[SLIDE 8 — Up next]
Next lesson: red-team tooling. Garak, PyRIT, promptfoo at production scale. Five minutes. See you there.
Slide outline¶
- Title — "AI red-team program design".
- What it is — pull-quote: "standing capability, not event."
- Four functions — four-card layout: pre-launch · periodic · incident-driven · continuous.
- Six decisions, part 1 — three-card layout: reports-to · centralized/distributed · in-house/external.
- Six decisions, part 2 — three-card layout: authority/scope · severity rubric · remediation tracking.
- Three artifacts — three-card layout.
- What good looks like — six-checkmark mature-program checklist.
- Up next — "L7.5.2 — Red-team tooling, ~5 min."
Production notes¶
- Recording: ~4.5 min. Cap 5.
- Slide 7 (what-good-looks-like) is the slide leaders will screenshot to advocate for budget.