Skip to content

L7.5.1 — AI red-team program design

Type: Theory · Duration: ~5 min · Status: Mandatory Module: Module 7 — Securing the AI Pipeline (MLSecOps & Defenses) Framework tags: NIST AI RMF Measure 2.7, Manage 1.1 · EU AI Act Article 9

Learning objectives

  1. Articulate the four functions of an AI red-team program (pre-launch, periodic, incident-driven, continuous).
  2. Identify the org-structure questions to resolve when standing one up.

Core content

What an AI red-team program is

A coordinated, ongoing function within an engineering org that adversarially tests AI systems for security and safety failures, with documented findings and remediation tracking. Distinct from one-off pentest engagements; AI red-team is a standing capability, not an event.

Four functions

1. Pre-launch red-team. Before any AI feature ships (or before a meaningful update goes live), a red-team campaign runs against it. Scope, methodology, severity rubric are defined per the system. Findings triaged; high-severity blocks launch. This is the model-selection / pre-launch gate from L7.1.1.

2. Periodic red-team. Standing red-team activity against deployed systems, on a cadence (quarterly is common). Updates the threat model with new attacks (the field moves fast); catches drift in deployed defenses; produces evidence for auditors.

3. Incident-driven red-team. After an AI incident, focused red-team activity probing the same vulnerability class. Confirms the fix; finds adjacent issues; informs the eval suite update.

4. Continuous (automated) red-team. Tools like Garak, PyRIT, promptfoo (next lesson) running in CI/CD against every model and prompt change. Catches regressions; provides daily-cadence signal.

The four functions complement: pre-launch is gate; periodic is hygiene; incident-driven is reactive; continuous is regression prevention.

Org-structure questions to resolve

When standing up a program, six decisions to make:

1. Reports to whom. Common patterns: CISO (treats AI security as a security discipline), Chief AI Officer (treats it as an AI program), Head of Engineering (treats it as engineering practice). No single right answer; what matters is having an owner with budget and escalation path.

2. Centralized vs distributed. Centralized: one team red-teams all AI products. Distributed: each product team has an embedded red-teamer. Hybrid (central team + per-product partners) is common at scale.

3. In-house vs external. External vendors for ad-hoc engagements + in-house for continuous work is the dominant pattern. Pure-external doesn't build internal capability; pure-in-house misses external perspective.

4. Authority and scope. What can the red-team do? Access production systems? Spin up test environments? Block launches? Decisions must be documented and approved by leadership; the rules of engagement are non-trivial for AI work.

5. Severity rubric. How are findings rated? A finding rubric specific to AI risks (not just CVSS-equivalent) — covering data-leakage, agent-escape, jailbreak surface, etc.

6. Remediation tracking. Findings go into a tracker (Linear, Jira, etc.) with severity, owner, due date. Closed when verified by re-test.

What a program produces

Three artifacts per cadence:

  • Pre-launch report. Per AI feature launch: scope tested, findings, recommendation (launch / launch with mitigations / hold).
  • Periodic report. Per cadence (quarterly): trend of findings, defense effectiveness over time, threat-landscape update.
  • Incident report. Per incident: chain of attack, root cause, remediation, eval-suite update.

These artifacts feed both engineering decisions and governance (Module 8). They're also the evidence base for compliance (EU AI Act Article 9 risk management system).

What good looks like in 2026

A mature AI red-team program in 2026 has: - Named owner with org authority and budget. - All four functions operating (pre-launch, periodic, incident-driven, continuous). - Documented severity rubric, rules of engagement, remediation SLAs. - Continuous tooling integrated into CI/CD (next lesson covers). - Quarterly cadence on at least one report shared with leadership. - Annual external red-team engagement for independent verification.

Most organizations in 2026 have some of this; few have all. Building toward the full set is a multi-year program for most companies — but starting with continuous tooling + pre-launch gate is a meaningful first quarter.

Real-world example

OpenAI, Anthropic, Google, and Microsoft all publicly describe their AI red-team programs (in blog posts, model release reports, and now in some cases regulatory filings). The patterns vary; the common shape is: dedicated team + pre-launch gates + periodic deep-dives + post-incident reviews. Enterprise customers in 2025–2026 increasingly require evidence of AI red-team programs from their vendors.

Key terms

  • AI red-team program — standing capability, not one-off engagement.
  • Four functions — pre-launch, periodic, incident-driven, continuous.
  • Rules of engagement — documented scope and authority.
  • Severity rubric — AI-specific finding rating.

References

  • OpenAI red-team release reports (per major model).
  • Anthropic Responsible Scaling Policy (covers red-team scope).
  • "Red-Teaming Language Models" (Perez et al., 2022).

Quiz items

  1. Q: Name the four functions of an AI red-team program. A: Pre-launch red-team, periodic red-team, incident-driven red-team, continuous (automated) red-team.
  2. Q: What's the dominant in-house vs external pattern in 2026? A: External vendors for ad-hoc engagements + in-house for continuous work. Pure-external doesn't build internal capability; pure-in-house misses external perspective.
  3. Q: Name three artifacts a mature program produces. A: Pre-launch reports (per launch), periodic reports (per cadence), incident reports (per incident).

Video script (~600 words, ~4.5 min)

[SLIDE 1 — Title]

AI red-team program design. Five minutes.

[SLIDE 2 — What an AI red-team program is]

A coordinated, ongoing function within an engineering org that adversarially tests AI systems for security and safety failures, with documented findings and remediation tracking. Distinct from one-off pentest engagements. AI red-team is a standing capability, not an event.

[SLIDE 3 — Four functions]

Four functions. One: pre-launch red-team. Before any AI feature ships, a red-team campaign runs against it. Scope, methodology, severity rubric defined per system. Findings triaged. High-severity blocks launch. Pre-launch gate from L7.1.1. Two: periodic red-team. Standing activity against deployed systems on a cadence — quarterly is common. Updates threat model. Catches drift in deployed defenses. Produces evidence for auditors. Three: incident-driven red-team. After an AI incident, focused activity probing the same vulnerability class. Confirms the fix. Finds adjacent issues. Four: continuous, automated red-team. Tools like Garak, PyRIT, promptfoo — next lesson — running in CI/CD. Catches regressions.

Four functions complement. Pre-launch is gate. Periodic is hygiene. Incident-driven is reactive. Continuous is regression prevention.

[SLIDE 4 — Six org-structure decisions, part 1]

Six decisions to make when standing one up. One: reports to whom. CISO, Chief AI Officer, Head of Engineering. No single right answer; what matters is having an owner with budget and escalation path. Two: centralized vs distributed. One team red-teams all products; or each product team has embedded red-teamer; or hybrid. Hybrid common at scale. Three: in-house vs external. External vendors for ad-hoc, in-house for continuous, is the dominant pattern.

[SLIDE 5 — Six decisions, part 2]

Four: authority and scope. What can the red-team do? Access production? Spin up test environments? Block launches? Documented rules of engagement; non-trivial for AI work. Five: severity rubric. AI-specific risks (data-leakage, agent-escape, jailbreak surface), not just CVSS-equivalent. Six: remediation tracking. Findings into Linear, Jira, etc. with severity, owner, due date. Closed when verified by re-test.

[SLIDE 6 — Three artifacts]

Three artifacts per cadence. Pre-launch report — per feature launch: scope, findings, recommendation. Periodic report — per cadence: trend, defense effectiveness, threat-landscape update. Incident report — per incident: chain, root cause, remediation, eval-suite update.

Artifacts feed engineering decisions and governance — Module 8. Evidence base for compliance — EU AI Act Article 9 risk management system.

[SLIDE 7 — What good looks like in 2026]

Mature AI red-team program in twenty-twenty-six. Named owner with org authority and budget. All four functions operating. Documented severity rubric, ROE, remediation SLAs. Continuous tooling integrated into CI/CD. Quarterly cadence on at least one report shared with leadership. Annual external red-team engagement for independent verification.

Most organizations have some of this. Few have all. Building toward the full set is a multi-year program. Starting with continuous tooling plus pre-launch gate is a meaningful first quarter.

[SLIDE 8 — Up next]

Next lesson: red-team tooling. Garak, PyRIT, promptfoo at production scale. Five minutes. See you there.

Slide outline

  1. Title — "AI red-team program design".
  2. What it is — pull-quote: "standing capability, not event."
  3. Four functions — four-card layout: pre-launch · periodic · incident-driven · continuous.
  4. Six decisions, part 1 — three-card layout: reports-to · centralized/distributed · in-house/external.
  5. Six decisions, part 2 — three-card layout: authority/scope · severity rubric · remediation tracking.
  6. Three artifacts — three-card layout.
  7. What good looks like — six-checkmark mature-program checklist.
  8. Up next — "L7.5.2 — Red-team tooling, ~5 min."

Production notes

  • Recording: ~4.5 min. Cap 5.
  • Slide 7 (what-good-looks-like) is the slide leaders will screenshot to advocate for budget.