L0.2 — The AI security landscape in 2026¶
Type: Theory · Duration: ~25 min · Status: Mandatory Module: Module 0 — Orientation & Environment Setup Framework tags: course-wide framing — introduces OWASP LLM Top 10, MITRE ATLAS, NIST AI RMF, EU AI Act at a high level
Learning objectives¶
By the end of this lesson, the learner can: 1. Explain why AI security is a distinct discipline in 2026, with three drivers: capability shift (agents + tool use), incident track record, and regulatory pressure. 2. Name and briefly describe at least four real-world AI security incidents from 2023–2025 and the failure mode each one illustrates. 3. Identify the four reference frameworks this course maps to and what each is for. 4. Articulate the AI security engineer's place in an org chart — where the role reports, what it owns, and where its handoffs are.
Concept primer (skip if you already know this)¶
A few terms you'll hear repeatedly. Don't memorize them; you'll absorb them as we go.
- LLM (Large Language Model) — a neural network trained on huge text corpora to predict the next token in a sequence. GPT-4, Claude, Llama, Gemini are all LLMs.
- Inference — running a trained model to get an output. The thing your users do when they chat with a bot.
- Training — the (much more expensive) process of producing the model from data. Most security work in this course targets inference; data poisoning targets training.
- Fine-tuning — taking an already-trained model and further training it on a smaller dataset to specialize it. Cheaper than training from scratch; also a new attack surface.
- RAG (Retrieval-Augmented Generation) — a pattern where, instead of relying only on the model's memorized knowledge, you retrieve relevant documents at query time and stuff them into the prompt. The dominant pattern for enterprise LLM apps. Also the dominant vector for indirect prompt injection.
- Agent — an LLM that doesn't just answer; it calls tools (functions, APIs, shell commands) in a loop. Most 2025–2026 LLM products are agentic in some form. Most new attack surface is here.
Core content¶
Why AI security became its own discipline¶
Three things converged between 2022 and 2026.
Capability shift. Before late 2022, "AI security" meant adversarial examples on image classifiers and a small academic subfield around membership inference. After ChatGPT, every product team in every company shipped an LLM feature, and by 2024–2025 most of those features had grown into agents that took actions on the user's behalf — read email, draft code, call internal APIs, move money. The attack surface went from "the model" to "the model, plus the prompt, plus the retrieved context, plus every tool the agent can call, plus every system those tools touch." Classical AppSec didn't have a framework for this. Classical MLOps didn't either.
Incident track record. Between 2023 and 2025, the public learned that: - LLM chatbots can be tricked into revealing their system prompts, including embedded API keys and proprietary instructions (Bing/Sydney, 2023; recurrent across every major launch). - Employees paste sensitive data into public LLMs and it ends up in training corpora (Samsung, March 2023, source-code leak). - Indirect prompt injection — instructions hidden inside content the LLM is asked to summarize or process — is a real, exploited attack class, not a theoretical one (multiple disclosures across 2024 against email-assistant and browsing agents). - ML model supply chains are exploitable: malicious pickles uploaded to HuggingFace; typosquatted model names; backdoors in fine-tuning datasets. - Agentic LLMs with tool access can be coerced into taking real-world actions — sending emails, making refunds, exfiltrating data — through prompts in untrusted content.
Regulatory pressure. The EU AI Act entered force in August 2024 with a phased compliance timeline running through 2026–2027. NIST published the AI RMF and its Generative AI Profile. U.S. Executive Order 14110 (October 2023) directed federal agencies on AI safety; even after its 2025 rescission and reissuance, federal procurement standards continued referencing NIST. State laws (Colorado AI Act, California SB-1047 successor bills) added obligations. Enterprises started requiring AI security attestations from vendors. Insurance carriers started asking about AI risk programs.
The combined effect: by 2026, every mid-size and larger company shipping an AI feature needs at least one person who understands the attack surface and can defend it. That person is you.
A short tour of recent incidents¶
This is not exhaustive. Each one illustrates a class of failure we'll deep-dive later in the course.
- Samsung ChatGPT data leak (March 2023) — Samsung engineers pasted proprietary source code into ChatGPT for help, and the data became part of the training pipeline. Samsung banned generative-AI use internally for months. Failure mode: sensitive data exfiltration via third-party LLM use. Module 7 (observability + DLP).
- Bing/Sydney system-prompt extraction (Feb 2023) — Users coaxed Bing Chat into revealing its full system prompt and internal codename "Sydney." Failure mode: system-prompt extraction. Module 3 (prompt injection).
- Air Canada chatbot refund lawsuit (Feb 2024) — A tribunal held Air Canada liable for a refund its chatbot hallucinated, ruling the company could not disown its own bot's output. Failure mode: hallucinated output as binding business action; control-failure on review of model-generated commitments. Module 3 & Module 7.
- HuggingFace malicious pickle disclosures (2023–2024) — Multiple disclosures of models uploaded to HuggingFace containing arbitrary-code-execution payloads via pickle deserialization. Failure mode: ML supply-chain compromise. Module 4.
- EchoLeak / Microsoft 365 Copilot indirect prompt injection (2025) — Researchers disclosed an indirect prompt-injection chain in M365 Copilot that could exfiltrate data from a user's tenant via crafted email content. Failure mode: indirect prompt injection in agentic productivity tools. Module 3.
- OmniGPT data exposure (Feb 2025) — An LLM aggregator service was breached, exposing user chat logs including pasted credentials and PII. Failure mode: LLM intermediary as a high-value target; data-at-rest controls. Module 7.
- PoisonGPT proof-of-concept (Mistral, 2023) — Researchers fine-tuned a model to confidently emit false historical "facts" then uploaded it under a near-identical name to a popular model. Failure mode: typosquatting + targeted misinformation via fine-tune. Module 4.
We'll return to most of these as case studies in their respective modules.
The four frameworks this course maps to¶
You'll see these tags in every lesson footer. Internalize what each is for.
| Framework | What it is | When you reach for it |
|---|---|---|
| OWASP Top 10 for LLM Applications | A practitioner-facing list of the ten highest-impact vulnerability classes in LLM-powered apps (prompt injection, insecure output handling, training data poisoning, supply chain, etc.). | Use as your checklist when reviewing or building an LLM feature. The closest analog to the OWASP Top 10 you already know. |
| MITRE ATLAS | A knowledge base of adversary tactics and techniques against ML systems — modeled on MITRE ATT&CK. Includes real-world case studies. | Use when threat-modeling or building a red-team plan. ATT&CK's framing, AI's content. |
| NIST AI Risk Management Framework (AI RMF) + GenAI Profile | A governance framework with four functions — Govern, Map, Measure, Manage — plus categories and subcategories. The U.S. federal reference. | Use when building or auditing an AI risk program. Auditors will ask which RMF subcategory your control addresses. |
| EU AI Act | The EU regulation classifying AI systems into risk tiers (prohibited, high-risk, limited-risk, minimal-risk) with corresponding obligations. General-Purpose AI (GPAI) models have their own obligations. | Use when scoping compliance for any product touching the EU market — which, in practice, is most products. |
These frameworks overlap. Don't think of them as competitors; think of them as four lenses on the same problem. A finding might be tagged OWASP: LLM01 (prompt injection), ATLAS: AML.T0051.000 (LLM prompt injection — direct), NIST AI RMF: Measure 2.7 (security and resilience evaluated), EU AI Act: Article 15 (accuracy, robustness, cybersecurity for high-risk systems). One incident, four ways to defend it to four different audiences.
Where the AI security engineer fits in an org¶
The role isn't yet standardized, but a few patterns dominate by 2026.
- In smaller orgs, AI security lives inside the product security team, often as a single specialist who consults across product groups.
- In larger orgs, it's a dedicated AI security or AI red-team function, reporting into the CISO, with formal engagement with ML/data-science orgs.
- A few orgs put it under the responsible AI / AI governance function, which leans more toward policy. We assume the engineering posture in this course, not the policy posture — though Module 8 covers what you need to interface with the governance side.
What you own (typical): - Threat models for AI features before launch. - Red-team / adversarial evaluation of models and applications. - Guardrail and eval-harness selection or build-out. - AI incident response. - AI-specific input to the secure SDLC.
What you don't own (typical): - Model training and core ML infrastructure (that's ML engineering). - Classical AppSec for non-AI surfaces of the same app (that's product security). - Regulatory filings and external audit response (that's governance / legal).
Knowing the handoffs prevents the most common Day-90 mistake: trying to fix problems that aren't yours, while the ones that are yours go unattended.
Real-world example¶
EchoLeak (M365 Copilot, 2025). Aim Security disclosed a zero-click vulnerability in Microsoft 365 Copilot. By sending a crafted email to a target, an attacker could plant instructions that Copilot would later execute when the user asked Copilot to summarize their inbox — exfiltrating data from the user's M365 tenant without the user clicking anything. It is the canonical 2025 demonstration that indirect prompt injection in agentic productivity tools is a real, weaponizable threat — not a research curiosity. Microsoft patched the specific vector; the underlying class of attack remains an open problem. (Source: Aim Security disclosure; Microsoft MSRC advisory.)
Key terms¶
- Capability shift — the post-2022 transition from classifiers-with-narrow-output to agents-that-take-actions.
- Agentic LLM — an LLM application that calls tools/functions in a loop, not just returns text.
- Indirect prompt injection — prompt-injection payload delivered via content the LLM consumes (email, doc, web page) rather than typed by the user.
- GPAI (General-Purpose AI model) — under the EU AI Act, a foundation model that can perform a wide range of tasks; carries specific obligations.
- Generative AI Profile — NIST's AI RMF profile specifically addressing generative-AI risks (NIST AI 600-1, 2024).
References¶
- OWASP Top 10 for Large Language Model Applications — https://owasp.org/www-project-top-10-for-large-language-model-applications/
- MITRE ATLAS — https://atlas.mitre.org/
- NIST AI Risk Management Framework 1.0 — https://www.nist.gov/itl/ai-risk-management-framework
- NIST AI 600-1: Generative AI Profile (2024) — https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
- EU AI Act (Regulation (EU) 2024/1689) — via EUR-Lex
- Aim Security, "EchoLeak: Zero-Click Vulnerability in Microsoft 365 Copilot" (2025) — vendor disclosure
- Greshake et al., "Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection" (USENIX Security 2023) — https://arxiv.org/abs/2302.12173
- Air Canada v. Moffatt (2024 BCCRT 149) — chatbot misrepresentation tribunal decision
- Samsung internal use ban coverage — Bloomberg, May 2023
Quiz items¶
- Q: Which of the following best describes "indirect prompt injection"? (a) A user types a jailbreak into a chatbot. (b) A malicious instruction is delivered through content the LLM is asked to process. (c) A model is trained on poisoned data. (d) An attacker extracts the model's weights via API queries. A: (b). Why: Direct injection is user-typed; indirect injection rides in via documents, emails, web pages, or tool outputs.
- Q: True or false: MITRE ATLAS replaces MITRE ATT&CK for AI systems. A: False. Why: ATLAS is modeled on ATT&CK and is complementary — many AI attack chains still traverse ATT&CK-mapped infrastructure stages.
- Q: Under the EU AI Act, which risk tier carries the strictest obligations for in-scope systems? A: High-risk. Why: "Prohibited" systems are outright banned; "high-risk" systems are permitted but carry the heaviest substantive obligations (conformity assessment, risk management, transparency, human oversight).
- Q: Which 2025 disclosure is the canonical example that indirect prompt injection in agentic productivity tools is a weaponizable, not theoretical, threat? A: EchoLeak (M365 Copilot). Why: Zero-click, real-world exfil chain.
- Q: A finding maps to OWASP LLM01, ATLAS AML.T0051.000, NIST AI RMF Measure 2.7, and EU AI Act Article 15. Why do we tag with all four rather than picking one? A: Because different stakeholders (engineers, red-teamers, auditors, regulators) speak different framework languages — multi-tagging lets you defend the same finding to all of them.
Video script¶
[SLIDE 1 — Title]
Welcome back. In this lesson we're going to answer a question I get asked constantly: why is AI security a thing now? Why isn't this just AppSec with a new buzzword? The short answer is that three things converged between 2022 and 2026, and after this lesson you'll be able to explain that to anyone who asks.
[SLIDE 2 — The three converging forces]
Force number one: capability shift. Before late 2022, if you said "AI security" to a researcher, they thought about adversarial examples on image classifiers — small academic field, narrow attack surface. After ChatGPT, every company shipped an LLM feature, and within two years most of those features had grown into agents — LLMs that take actions on the user's behalf. Read email. Draft code. Call internal APIs. Move money. The attack surface exploded.
Force number two: incident track record. Between 2023 and 2025 the public learned, the hard way, that LLM systems break in new and creative ways. Samsung's source-code leak. The Bing chatbot revealing its internal codename "Sydney." Air Canada's chatbot getting them sued. HuggingFace models shipping with malicious pickles. M365 Copilot exfiltrating data via a crafted email. We'll come back to several of these in detail throughout the course.
Force number three: regulation. The EU AI Act came into force in 2024 with obligations rolling out through 2026 and 2027. NIST published the AI Risk Management Framework. State laws started landing in the U.S. By 2026, if you ship an AI feature to enterprise customers, they expect an AI security attestation. If you ship to the EU, you have substantive obligations. The "let's see what happens" era is over.
[SLIDE 3 — Why this is a new discipline]
Put the three together and you get a new discipline. Classical AppSec engineers know how to defend a login form. They don't, by default, know how to defend a chatbot that's been jailbroken into running shell commands. Classical ML engineers know how to deploy a model. They don't, by default, know how to threat-model the prompt path or the retrieval pipeline. The AI security engineer lives in the middle.
[SLIDE 4 — Incident tour]
Let me give you a quick tour of incidents that mapped this terrain. Samsung — pasted source code into ChatGPT, ended up in the corpus. Bing — system prompt extraction. Air Canada — held liable for what their chatbot said. HuggingFace — malicious pickles, repeated disclosures. EchoLeak — zero-click data exfil via crafted email in M365 Copilot. OmniGPT — breach of an LLM aggregator exposing user chat logs. PoisonGPT — a proof-of-concept showing how easy it is to typosquat a fine-tuned model that confidently lies. Each one of these is a class of failure, and we'll deep-dive each class in the modules ahead.
[SLIDE 5 — The four frameworks]
The course maps every lesson to four frameworks. OWASP Top 10 for LLMs — your practitioner checklist, the closest analog to the OWASP Top 10 you already know. MITRE ATLAS — adversary tactics and techniques, modeled on ATT&CK, perfect for threat modeling and red-team planning. NIST AI Risk Management Framework — the U.S. federal governance reference. EU AI Act — the regulation that will define your obligations if you touch the EU market, which is almost everyone.
[SLIDE 6 — How to think about the four frameworks]
Don't think of these as competitors. Think of them as four lenses on the same problem. A single finding might map to all four. Prompt injection is OWASP LLM01, it's ATLAS technique T0051, it's NIST AI RMF Measure 2.7, and it's EU AI Act Article 15. One finding, four ways to defend it to four different audiences — engineers, red-teamers, auditors, regulators. That fluency is what makes you valuable.
[SLIDE 7 — Where you fit in the org]
Last thing. The AI security engineer isn't yet a fully standardized role, but the patterns are forming. In smaller orgs, you live inside product security. In larger orgs, you're a dedicated AI red-team function reporting into the CISO. You own threat models for AI features, red-team and adversarial eval, guardrails and eval harnesses, AI incident response, and AI-specific SDLC inputs. You do not own model training infrastructure, classical AppSec for non-AI surfaces, or regulatory filings. Know the handoffs.
[SLIDE 8 — Up next]
Next lesson, we get you into your environment. Python, Docker, an LLM you can talk to, the course repo. Twenty minutes from now, you'll be running your first AI security tool. See you there.
Slide outline¶
- Title — "The AI security landscape in 2026". Subtitle: "Why this is a discipline, what's happened, what frameworks anchor it."
- Three converging forces — three-column layout: 1) Capability shift (icon: agent), 2) Incident track record (icon: alarm), 3) Regulatory pressure (icon: gavel).
- Why a new discipline — Venn diagram: AppSec ∩ MLOps. Empty middle. Label the middle "AI Security Engineering".
- Incident tour — timeline 2023→2025 with 7 incident cards (Samsung, Sydney, Air Canada, HF pickles, EchoLeak, OmniGPT, PoisonGPT). Each card: name + 1-line failure mode + module reference.
- The four frameworks — quad-panel: OWASP / ATLAS / NIST / EU. Each panel: logo, one-line description, "use when…" caption.
- Four lenses on one problem — single finding "prompt injection" at center, four arrows out to four framework tags.
- Where the AI security engineer fits — org-chart fragment with three placement patterns (small org / large org / responsible-AI). Highlight "you own" vs "you don't own" as two columns.
- Up next — "L0.3 — Environment setup & sanity check, ~45 min, hands-on."
Production notes¶
- Recording time: ~25 min raw, target 22–25 min final. This lesson can absorb cuts in the incident tour if it runs long.
- Slide 4 (incident tour): use real but neutralized incident images — newspaper clipping treatment for visual rhythm.
- Update incident list before each course re-record (annual). The 2023–2025 cluster will age; replace with then-current cases.
- Slide 5: link out the four framework logos to their official sites in the slide deck so screenshots remain attributable.
- Tone: this is the lesson where you establish authority. Cite incidents precisely; don't hand-wave.