Skip to content

L2.2.2 — Model & data planes in depth

Type: Theory · Duration: ~5 min · Status: Mandatory Module: Module 2 — AI Security Foundations Framework tags: OWASP LLM01, LLM03, LLM06, LLM10 · ATLAS AML.TA0007 (ML Attack Staging), AML.TA0005 (Initial Access via ML Model Access)

Learning objectives

  1. List four attack surfaces unique to the model plane and one unique to the data plane.
  2. Recognize why the data plane is the largest attack surface in any RAG-based system.

Core content

Model plane — four primary attack surfaces

  1. The inference interface. Anywhere the model takes input — API endpoint, chat box, retrieval-augmented prompt. Prompt injection lives here; jailbreaks live here; query-based extraction lives here. The single most-exploited surface in 2025–2026.

  2. The weights themselves. A trained model is a file. That file can be stolen (extraction, weight exfiltration), modified (post-hoc tampering, adapter swap), or replaced (supply-chain switcheroo). On disk, the format matters — .safetensors is safe to load; .pt / .bin (pickle-based) can execute arbitrary code on load.

  3. The tokenizer. A separately-trained artifact with its own attack surface. Glitch tokens, token smuggling via unicode normalization, and tokenizer-version mismatches between training and inference all live here.

  4. The decoding configuration. Temperature, top-p, system seeds, stop tokens. A defender who sets temperature=0 for reproducibility and an attacker who tests at default settings will reach different conclusions about whether a defense works.

Data plane — five primary attack surfaces

  1. Training data. Most distant from the application team — usually the foundation-model vendor's territory. Poisoning here is hard to detect downstream but devastating when successful.

  2. Fine-tune data. Within the application team's control. Same threats as training data plus an entire category of harmful-fine-tuning attacks that strip alignment cheaply (Module 4 lab L4 territory).

  3. The retrieval corpus (RAG only). The largest data-plane surface for any RAG system. Whoever can write to the corpus can plant indirect prompt injection. Most production breaches we've discussed (EchoLeak, Air Canada-adjacent failures) traversed this surface.

  4. Vector embeddings. Stored in the vector DB. Two threats: (a) the embeddings can be exfiltrated and reversed to leak the underlying text (embedding-leak attacks, Module 5); (b) adversarial embeddings can be crafted to be retrieved for specific queries the attacker chooses.

  5. Evaluation and preference data. Less obvious. If your evaluations are gameable (test-set contamination, evaluator-LLM manipulation), your safety claims are gameable. If preference data is poisoned, your alignment is poisoned.

Why the data plane is the biggest surface in RAG systems

A RAG system's prompt at query time is a concatenation of: - The system prompt (you control) - The user query (untrusted) - Retrieved chunks (provenance depends on who can write to the corpus)

If your corpus is editable by 200 engineers — your Confluence space, say — every one of those engineers is, with full literal accuracy, a writer to your model's prompt at runtime. That's the data-plane surface. It is almost always larger than the user-query surface, because the corpus is usually larger than the user-query population and more open in write access.

Real-world example

The 2024 disclosure of attacks against ChatGPT's "Browsing" capability showed that web pages reached by the model could plant instructions executable by the model. Same shape as RAG, different corpus — the live web. OpenAI responded by restricting which content the browsing tool could surface and by adding instruction-extraction filters at the model layer. The data-plane defense (corpus controls) and the model-plane defense (input filters) were both required; neither alone was sufficient.

Key terms

  • Inference interface — the boundary where user input meets the model at query time.
  • Adversarial embeddings — embeddings crafted to be retrieved for specific queries the attacker targets.
  • Retrieval corpus surface — the population of writers to a RAG knowledge base.

References

  • Greshake et al., "Not what you've signed up for" (Indirect Prompt Injection, 2023) — https://arxiv.org/abs/2302.12173
  • MITRE ATLAS technique pages, particularly AML.T0051 (LLM Prompt Injection) and AML.T0070 (RAG Poisoning) — https://atlas.mitre.org/techniques/

Quiz items

  1. Q: Why is the data plane usually the largest attack surface in a RAG system? A: Because the corpus is typically writable by a larger and more open population than the user-query interface, and every corpus writer is a writer to the runtime prompt.
  2. Q: Name two ways the tokenizer is an independent attack surface from the model itself. A: Glitch tokens (training-data sparsity in vocab entries) and token smuggling via unicode normalization differences between tokenizer and safety filter.

Video script (~620 words, ~4.5 min)

[SLIDE 1 — Title]

Model and data planes in depth. Five minutes. By the end you'll know the four attack surfaces unique to the model plane and the five unique to the data plane, and why the data plane is the biggest surface in any RAG system.

[SLIDE 2 — Model plane: four surfaces]

Model plane — four surfaces. One: the inference interface. Anywhere the model takes input — API, chat, retrieval-augmented prompt. Prompt injection lives here. Jailbreaks live here. Query-based extraction lives here. The single most-exploited surface in twenty-twenty-six.

Two: the weights themselves. A trained model is a file. That file can be stolen, modified, or replaced. Format matters — safetensors is safe. Dot-pt and dot-bin, which are pickle-based, can execute arbitrary code on load. We exploit this in Module 4.

Three: the tokenizer. A separately-trained artifact with its own attack surface. Glitch tokens. Token smuggling via unicode normalization. Tokenizer-version mismatches between training and inference.

Four: the decoding configuration. Temperature, top-p, system seeds, stop tokens. A defender who sets temperature to zero for reproducibility and an attacker who tests at default settings will reach different conclusions about whether a defense works. We've seen this in L1.3 — it's worth repeating.

[SLIDE 3 — Data plane: five surfaces]

Data plane — five surfaces. One: training data. Most distant from the application team — usually the vendor's territory. Poisoning here is hard to detect and devastating when successful. Two: fine-tune data. Within your control. Same threats as training plus harmful fine-tuning that strips alignment cheaply.

Three: the retrieval corpus, for RAG. The largest data-plane surface in any RAG system. Whoever can write to the corpus can plant indirect prompt injection. Four: vector embeddings in the DB. Exfiltrate them, you can sometimes reverse them to recover the underlying text. Craft them, you can plant content that gets retrieved for specific queries you choose. Five: evaluation and preference data. If your evaluations are gameable, your safety claims are gameable. If your preference data is poisoned, your alignment is poisoned.

[SLIDE 4 — Why data plane dominates RAG]

Why the data plane dominates in RAG. A RAG system's prompt at query time is a concatenation of three things — your system prompt, the user query, and retrieved chunks. Your system prompt is yours; the user query is untrusted by definition; the retrieved chunks have provenance that depends on who can write to the corpus.

If your corpus is editable by two hundred engineers — your Confluence space, say — every one of those engineers is a writer to your model's prompt at runtime. That's the data-plane surface. It is almost always larger than the user-query surface, because the corpus population is usually larger and more open than the query population.

[SLIDE 5 — Browsing-attack walk-through]

One real-world anchor. The 2024 disclosure of attacks against ChatGPT's Browsing capability showed that web pages reached by the model could plant instructions executable by the model. Same shape as RAG, different corpus — the live web. OpenAI responded with both corpus controls — restricting which content the tool could surface — and model-layer filters. Both required. Neither alone sufficient. This is the defense-in-depth pattern you'll see throughout this course.

[SLIDE 6 — Up next]

Next lesson: the application, agent, and supply-chain planes. Five minutes. See you there.

Slide outline

  1. Title — "Model & data planes in depth".
  2. Model plane four surfaces — quadrant: inference interface · weights · tokenizer · decoding. Icon + 1-line attack callout each.
  3. Data plane five surfaces — five-card layout with one attack callout each.
  4. Why data plane dominates RAG — anatomy of a RAG prompt at query time, three sections color-coded by provenance.
  5. Browsing-attack walk-through — sketch of the ChatGPT Browsing tool with corpus + model defenses both highlighted.
  6. Up next — "L2.2.3 — Application, agent & supply-chain planes, ~5 min."

Production notes

  • Recording: ~4.5 min. Hard cap 5.
  • Slide 4 is the most novel; the "if your Confluence is editable by 200 engineers…" line is the one the audience will quote back.