L1.4 — The modern AI pipeline¶
Type: Theory · Duration: ~25 min · Status: Mandatory Module: Module 1 — AI/ML Foundations for Security Engineers Framework tags: foundational — direct on-ramp to ATLAS tactics framework (Reconnaissance → ML Attack Staging → Impact) and NIST AI RMF Map function
Learning objectives¶
By the end of this lesson, the learner can: 1. Name the six stages of the modern AI pipeline (data → training → eval → deployment → monitoring → fine-tune) and what happens at each. 2. Identify who controls each stage and what its trust boundary is. 3. Distinguish three common deployment shapes: API-only, RAG, agentic. 4. Recognize that production AI systems are pipelines of artifacts, each with its own provenance question.
Concept primer¶
You have the parts (data, models, layers, decoding) from the last three lessons. Now we connect them into a system. This lesson is the conceptual scaffolding the next lesson (L1.5) will hang attacks on.
Core content¶
The six stages of the modern AI pipeline¶
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ ① Data │ → │ ② Train │ → │ ③ Eval │ → │ ④ Deploy │ → │ ⑤ Monitor│
│ (curate, │ │ (pretrain│ │ (offline │ │ (API / │ │ (drift, │
│ label, │ │ + RLHF + │ │ + red- │ │ RAG / │ │ abuse, │
│ filter) │ │ align) │ │ team) │ │ agent) │ │ cost) │
└──────────┘ └──────────┘ └──────────┘ └──────────┘ └──────────┘
│
▼
┌──────────┐
│ ⑥ Fine- │
│ tune / │
│ re-train │ ──┐
└──────────┘ │
▲ │
└───────────┘
(loop)
Let's walk it.
① Data¶
Everything starts with data. For pretrained LLMs this means scraping the web, plus licensed corpora, plus synthetic data. Then curation: deduplication, quality filtering, toxicity filtering, decontamination against eval benchmarks. Then for supervised fine-tuning you add curated instruction-following data. Then for RLHF/DPO you add preference data — humans (or AIs) ranking pairs of outputs.
Trust boundary: anyone whose content ended up in your training set has, in some weak sense, influenced your model. For supervised data and preference data this is more direct. Whoever controls a meaningful fraction of any of these data streams can shape the model. This is the entry point for Module 4's data-poisoning and backdoor attacks.
Who controls this stage: data engineers + ML researchers + a vendor's safety team. Almost never the application team that ultimately ships the product.
② Training¶
Pretraining is a multi-week, multi-million-dollar run that produces the base model. Then comes alignment: supervised fine-tuning on instruction data, then RLHF or DPO on preference data. The output is the assistant model that vendors ship.
Trust boundary: anyone with code or config access to the training run can affect every weight in the resulting model. Insider-threat scope is broad.
Who controls this stage: vendor's ML platform team. Application teams almost never see this directly.
③ Evaluation¶
Before a model ships, it's measured against benchmarks (accuracy, perplexity, MMLU, HumanEval, etc.), against safety evals (refusal rates on harmful prompts, jailbreak resistance, bias measurements), and against domain evals if relevant (legal, medical). And — increasingly — against a structured red-team campaign.
Trust boundary: evaluations are only as honest as the held-out test sets that drive them. Test-set contamination (training data accidentally including the test set) is one of the most common failure modes in published benchmarks. Safety evals can also be gamed — Sleeper Agents (L1.2) showed that models can be trained to behave well on observed evals and badly otherwise.
Who controls: evaluation team + red-teamers + sometimes the model release reviewer.
④ Deployment¶
The model gets exposed to users. There are three shapes you'll see almost everywhere.
- API-only. A REST or gRPC endpoint that takes a prompt and returns text. The OpenAI Chat Completions API is the canonical example. Attack surface: prompt-injection and jailbreaks at the input boundary, output handling at the consumer boundary, rate-limit and extraction attacks.
- RAG (Retrieval-Augmented Generation). Before sending the user's query to the LLM, retrieve relevant documents from an index (e.g., embeddings in a vector DB) and stuff them into the prompt as context. The dominant pattern for enterprise LLM apps. Attack surface adds: indirect prompt injection via poisoned retrieved content, embedding-leak attacks against the vector DB, retrieval-poisoning by an attacker who can write to the indexed corpus.
- Agentic. The LLM doesn't just answer; it calls tools (functions, APIs, shell commands) in a loop, often with several rounds of plan-act-observe. Increasingly common in 2025–2026 enterprise products. Attack surface adds: tool-abuse, agent-escape, plan-injection (planting goals via untrusted content), and a full set of new authorization questions ("which tools does this agent get access to, and on whose behalf?").
Trust boundary: the deployment is the surface where untrusted users meet the model. Almost all OWASP LLM Top 10 attacks live here.
Who controls: the application team. This is the boundary you will most often own as an AI security engineer.
⑤ Monitoring¶
A deployed AI system needs runtime instrumentation that classical AppSec/SRE tooling doesn't provide out of the box: prompt and response logging (with PII redaction), drift detection (the input distribution shifting away from training), abuse detection (jailbreak patterns, rate-limit anomalies, embedding cluster shifts), cost monitoring (token spend), latency, and incident response data capture.
Trust boundary: monitoring is your evidence chain for incident response and your trip-wire for emerging attacks. Most AI breaches in 2025 were detected via monitoring (token-spend anomalies, output-content alerts), not via static testing.
Who controls: SRE / platform + AI security engineer collaboratively. Module 7 covers what to build here.
⑥ Fine-tune / re-train¶
The pipeline is not one-shot. Once you're in production, you'll iterate: collect production data (with privacy controls), fine-tune the model on it, re-evaluate, redeploy. Or pull a new base model release and re-do everything. Or run continuous evaluation that flags regressions.
Trust boundary: every iteration is an opportunity to re-introduce data-poisoning and backdoor risk. Production data is not safe training data unless you have provenance and filtering.
Who controls: ML engineering + AI security engineer + data governance.
Who controls what — and why this matters¶
The reason the six stages matter is that they map directly to who you can hold accountable for what. In a typical 2026 architecture:
| Stage | Controlled by | What the application team can / can't change |
|---|---|---|
| Data | Foundation-model vendor | Cannot change; can ask for documentation, can choose vendor |
| Training | Foundation-model vendor | Cannot change; can ask for documentation |
| Eval | Vendor + downstream evaluators | Can supplement with own evals |
| Deployment | Application team | Full control — this is your perimeter |
| Monitoring | Application team | Full control |
| Fine-tune | Application team (using base) | Full control — and full liability for any new harms introduced |
The AI security engineer's primary surface is deployment, monitoring, and fine-tune. You will negotiate with the vendor on the rest via procurement, audits, and contractual representations. Don't burn cycles trying to fix things upstream that you have no leverage on.
The three deployment shapes — pick the right mental model¶
In 2026, when you walk into a new project, ask "what shape is this?" first. Each shape has a different default threat profile.
- API-only chat. Risk surface concentrates on input/output handling. Defenses: prompt-injection filters, structured output, output validators, rate limits, prompt logging.
- RAG. Add: retrieval poisoning, indirect prompt injection via retrieved docs, vector-DB exfiltration, source-attribution gaps. Defenses: source allowlists, content sanitization before retrieval, retrieval-result inspection, PII redaction on the way in and on the way out.
- Agentic. Add: tool-abuse (the model is tricked into using a tool maliciously), agent-escape (over-permissioned tools), goal-hijacking, multi-turn injection chains. Defenses: principle-of-least-authority tool design, human-in-the-loop for high-impact actions, intent verification, tool-call logging, anomaly detection on action sequences.
Most real production systems are combinations — a RAG-backed agent with API exposure. The combined surface is the union of the three, not just the agent surface. Threat-model accordingly.
Production AI is a pipeline of artifacts, each with provenance¶
One more framing point. A production AI system is not a single binary. It is a pipeline of artifacts:
- Base model weights (which vendor? which version? signed?)
- Tokenizer file
- Fine-tune deltas (LoRA weights, full-weight fine-tune outputs)
- Retrieval index (which corpus? when last refreshed? who can write to source?)
- System prompts and prompt templates (versioned? who can edit?)
- Tool definitions and the tools themselves (their auth scope?)
- Guardrail models (Llama Guard, NeMo, custom)
- Eval suites
Each item has a provenance question — where did it come from, who has changed it since, and how do you know? In Module 4 we build an AI Bill of Materials (AI-BOM) that names every artifact and its provenance. In Module 8 we use it for audit response.
Real-world example¶
Air Canada chatbot tribunal ruling (2024). Air Canada's customer-facing chatbot, deployed against their bereavement-fare policy documentation, hallucinated a refund policy that didn't actually exist. A customer relied on it and was denied the refund. The BC Civil Resolution Tribunal held Air Canada responsible for what the chatbot stated, ruling that "Air Canada is responsible for all information on its website. It makes no difference whether the information comes from a static page or a chatbot." The pipeline-failure lens: this is a deployment + monitoring failure. The model didn't go off-script; the application architecture treated model output as authoritative without grounding, citation, or review for any commitment-bearing statement. Trust boundaries were not enforced at the deployment surface. (Source: Moffatt v. Air Canada, 2024 BCCRT 149.)
Key terms¶
- Pretraining — multi-week, multi-million-dollar run that produces the base model from raw data.
- Alignment — supervised fine-tuning + RLHF/DPO that produces the deployed assistant from the base model.
- Base model — the raw pretrained model before alignment. Most open-weights "Base" releases are useful as fine-tune starting points but unaligned (will produce unsafe content readily).
- RAG — Retrieval-Augmented Generation; retrieve docs and stuff them into the prompt at query time.
- Agentic AI — LLM that calls tools in a loop.
- Provenance — the answer to "where did this artifact come from and who has changed it?"
- AI-BOM (AI Bill of Materials) — inventory of all artifacts and their provenance.
References¶
- Bommasani et al., "On the Opportunities and Risks of Foundation Models" (Stanford CRFM, 2021) — https://arxiv.org/abs/2108.07258
- Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (RAG paper, 2020) — https://arxiv.org/abs/2005.11401
- Shinn et al., "ReAct: Synergizing Reasoning and Acting in Language Models" (2022) — https://arxiv.org/abs/2210.03629
- Moffatt v. Air Canada, 2024 BCCRT 149 — full text via CanLII
- OWASP "AI Bill of Materials" guidance (OWASP CycloneDX AI extension) — https://cyclonedx.org/
Quiz items¶
- Q: Name the six stages of the modern AI pipeline. A: Data, Training, Evaluation, Deployment, Monitoring, Fine-tune/Re-train. Why: This is the spine of Module 1.5 and Module 4.
- Q: As the AI security engineer for an application team that uses a frontier model via API, which pipeline stages do you own vs. negotiate? A: Own: Deployment, Monitoring, Fine-tune (if you do any). Negotiate: Data, Training, Eval (via procurement and audits with the vendor). Why: Wrong-stage effort is the most common Day-90 mistake.
- Q: Name one attack class that emerges specifically in the RAG shape but not in the API-only shape. A: Indirect prompt injection via retrieved documents (or retrieval poisoning, or embedding-leak attacks against the vector DB). Why: Drives Module 3 lab L3.7.
- Q: True or false: production data collected with user consent is automatically safe to use as fine-tune data. A: False. Why: Production data has its own provenance and quality questions; using it as training data without filtering is a Module 4 data-poisoning vector.
Video script¶
[SLIDE 1 — Title]
Welcome to lesson 1.4. We have the parts. Now we connect them into a system. By the end of this lesson you'll be able to name the six stages of the modern AI pipeline, identify who owns each one, and recognize the three common deployment shapes in production. This is the conceptual scaffolding the rest of the course hangs attacks on.
[SLIDE 2 — The six-stage pipeline]
Six stages. Data — curate, label, filter. Training — pretrain plus alignment. Evaluation — offline benchmarks plus red-teaming. Deployment — API, RAG, or agent. Monitoring — drift, abuse, cost. Fine-tune or re-train — and back to evaluation, deployment, monitoring. The pipeline is not one-shot. It loops.
[SLIDE 3 — Data stage]
Stage one, data. Everything starts here. For pretrained LLMs, this means web scrapes plus licensed corpora plus synthetic data. Then curation. Then for supervised fine-tuning, curated instruction-following data. Then for RLHF or DPO, preference data — humans or AIs ranking pairs of outputs. Trust boundary: anyone whose content ended up in your training set has, in some weak sense, influenced your model. Whoever controls a meaningful fraction of any of these data streams can shape the model. This is the entry point for data-poisoning and backdoor attacks in Module 4.
[SLIDE 4 — Training stage]
Stage two, training. Pretraining is a multi-week, multi-million-dollar run that produces the base model. Then alignment turns the base model into the assistant model that vendors ship. Trust boundary: anyone with code or config access to the training run can affect every weight in the resulting model. Insider-threat scope is broad. As an application team, you almost never see this directly.
[SLIDE 5 — Eval stage]
Stage three, evaluation. Before a model ships, it's measured against benchmarks, safety evals, domain evals, and increasingly a structured red-team campaign. Trust boundary: evaluations are only as honest as the held-out test sets that drive them. Test-set contamination is one of the most common failure modes. And the Sleeper Agents work I mentioned last lesson showed that models can be trained to behave well on observed evals and badly otherwise. So "we passed the eval" is necessary but not sufficient.
[SLIDE 6 — Deployment stage — three shapes]
Stage four, deployment. This is the stage you, as an AI security engineer, will most often own. Three shapes you'll see almost everywhere. API-only: a REST endpoint, returns text, attack surface concentrates on input and output handling. RAG: retrieve documents from an index, stuff them into the prompt, attack surface adds indirect injection via retrieved content and exfil against the vector DB. Agentic: the LLM calls tools in a loop, attack surface adds tool abuse, agent escape, plan injection.
[SLIDE 7 — Most production is a combination]
Real production systems are often combinations. A RAG-backed agent exposed via API. The combined attack surface is the union of all three, not just the agent surface. When you walk into a new project, ask "what shape is this" first.
[SLIDE 8 — Monitoring stage]
Stage five, monitoring. A deployed AI system needs runtime instrumentation that classical AppSec and SRE don't provide out of the box. Prompt and response logging with PII redaction. Drift detection. Abuse detection. Token-spend monitoring. Most AI breaches in 2025 were detected via monitoring — token-spend anomalies, output-content alerts — not via static testing. Module 7 covers what to build.
[SLIDE 9 — Fine-tune stage]
Stage six, fine-tune and re-train. The pipeline loops. You collect production data, fine-tune on it, re-evaluate, redeploy. Critical reminder: production data is not safe training data unless you have provenance and filtering. Every iteration is an opportunity to re-introduce data-poisoning and backdoor risk. We'll exploit this in Module 4.
[SLIDE 10 — Who owns what]
The six stages matter because they map to who you can hold accountable for what. As an AI security engineer on the application team, you own deployment, monitoring, and fine-tune. You negotiate with the vendor on data, training, and evaluation, via procurement and contractual representations. Don't burn cycles trying to fix things upstream that you have no leverage on.
[SLIDE 11 — Production AI is a pipeline of artifacts]
One more framing. A production AI system is not a single binary. It is a pipeline of artifacts: base model weights, tokenizer file, fine-tune deltas, retrieval index, system prompts, tool definitions, guardrail models, eval suites. Each item has a provenance question. In Module 4 we build an AI Bill of Materials that names every artifact and its provenance. In Module 8 we use it for audit response.
[SLIDE 12 — Up next]
Next lesson, we map attacks onto the six stages. One short lesson, then we go hands-on. See you there.
Slide outline¶
- Title — "The modern AI pipeline".
- Six-stage diagram — horizontal flow: ① Data → ② Train → ③ Eval → ④ Deploy → ⑤ Monitor → ⑥ Fine-tune (loops back to ② or ⑤).
- Data stage — funnel diagram: raw → curated → labeled → preference data. Red callout: "Attacker entry: any of these streams."
- Training stage — supercomputer icon; pretrain → alignment box. Callout: "Vendor-owned. You can't change this. You can audit it."
- Eval stage — benchmarks table + Sleeper Agents callout: "Passing the eval ≠ no backdoor."
- Deployment — three shapes — triptych: API-only / RAG / Agentic. Each with a small architecture sketch + 3-bullet attack-surface list.
- Real production is a combination — Venn of the three shapes; union shaded; "your real attack surface."
- Monitoring stage — dashboard mockup with metrics: tokens/min, refusal rate, output PII alerts, drift score. Caption: "Most 2025 AI breaches were detected here."
- Fine-tune stage — loop arrow back to training. Red callout: "Production data is not safe training data."
- Who owns what — the table from the lesson body. Highlight Deployment / Monitoring / Fine-tune as "Your perimeter."
- Pipeline of artifacts — bill-of-materials checklist with provenance question marks beside each item.
- Up next — "L1.5 — Where attacks happen at each pipeline stage, ~20 min."
Production notes¶
- Recording: ~24–26 min raw, target 22–24 min final.
- Slide 6 is where most learners will be referring back to this lesson during later modules — make the three deployment-shape sketches especially clear. Worth a designer pass.
- Slide 8: dashboard mockup can use Grafana-style aesthetics; mock data is fine.
- Tone: structural, almost architectural. This is the lesson where you sound like a staff engineer reviewing an architecture diagram.