L5.3.2 — Embedding-leak attacks¶

Type: Theory · Duration: ~5 min · Status: Mandatory Module: Module 5 — Model Extraction, Inversion & Membership Inference Framework tags: OWASP LLM06 · MITRE ATLAS AML.T0048

Learning objectives¶

Explain why embeddings are not anonymized text and identify two reconstruction approaches.
Recognize the vector-DB attack surface as a privacy-critical asset.

Core content¶

Embeddings ≠ anonymization¶

A widespread mental shortcut among teams shipping LLM features: "we store user queries as embeddings in our vector DB, so the queries are anonymized." This is wrong, in a way that's worth understanding precisely.

An embedding is a fixed-dimension vector produced by an embedding model — a deterministic function of the input text. The vector is dense, often 384–8192 dimensions, and carries enough information about the input that a similar input maps to a similar vector (the property that makes RAG retrieval work).

The same property — similar input → similar vector — means the embedding contains substantial information about the input. Given an embedding, you can often reconstruct meaningful approximations of the input. Sometimes word-for-word. Sometimes a paraphrase that preserves all the sensitive content.

The L5.6 lab demonstrates this directly: invert sentence embeddings back to recognizable text using a publicly-available inversion model.

Two reconstruction approaches¶

1. Direct inversion (text-to-embedding reversed). Train an inverter network: input is the embedding, output is text. With a dataset of (text, embedding) pairs, this is a supervised learning problem. The Morris et al. work (2023) demonstrated production-quality inversion of text-embedding-ada-002 and sentence-transformers embeddings, recovering ~70% of original word content on average.

2. Nearest-neighbor retrieval against a known corpus. If the attacker knows the corpus from which the target embedding was generated, simple nearest-neighbor against a re-embedded corpus often recovers exact or near-exact text. Worse than direct inversion in adversarial conditions; trivially easy when conditions allow.

Why the vector DB is the privacy-critical asset¶

Most production RAG systems store embeddings in a vector DB. If the DB is breached, the embeddings are exfiltrated. The breached embeddings can be inverted to recover the underlying text. The text often contains: - User query content (often PII or sensitive context). - Document chunks from the source corpus (sometimes private business data). - System prompts or scaffolding (cached as embeddings in some architectures).

Result: a vector-DB breach is functionally close to a database breach of the raw text — even though the team may have classified the embeddings as "non-PII."

Defenses¶

The defensive playbook is short:

1. Treat embeddings as PII. If the underlying text would be PII, the embeddings are PII. Classify, store, access-control, log accordingly.

2. Strict access control on the vector DB. Same standards as any sensitive data store. Auth on every query. Audit logs. Network isolation. Tenant boundary enforcement.

3. Reduce embedding dimensionality where feasible. Lower-dimension embeddings retain less information; harder to invert. Not always usable (retrieval quality suffers below a threshold).

4. Avoid storing embeddings of high-sensitivity content where possible. Process-and-discard patterns for high-risk content; persist only when retrieval requires it.

5. Combine with output filters. When embedding-derived content is surfaced to a user, apply PII redaction at the boundary.

Operational reality in 2026¶

Most production teams in 2026 do not treat embeddings as PII. They classify as "low sensitivity" and apply weaker access controls than they would to the underlying text. This is a class of finding you'll see frequently in real engagements; pointing it out and recommending the playbook above is high-ROI work.

Real-world example¶

Morris et al. (2023), "Text Embeddings Reveal (Almost) As Much As Text," demonstrated that sentence-embedding inversion can recover ~70% of the original word content from production embedding models including OpenAI's text-embedding-ada-002. The paper is the canonical citation for the "embeddings are not anonymization" claim.

Key terms¶

Embedding inversion — reconstructing input text from an embedding vector.
Embeddings-as-PII — the operational posture: classify embeddings at the sensitivity of their underlying text.
Vector DB as privacy-critical asset — same treatment as the underlying data store.

References¶

Morris et al., "Text Embeddings Reveal (Almost) As Much As Text" (EMNLP 2023) — https://arxiv.org/abs/2310.06816
Song et al., "Information Leakage in Embedding Models" (2020).

Quiz items¶

Q: Why is "we store user queries as embeddings" not equivalent to "we anonymize user queries"? A: Because embeddings retain substantial information about the underlying text — enough that inverter networks can recover ~70% of the word content. Similar input → similar vector, by construction.
Q: Name two embedding-reconstruction approaches. A: Direct inversion (train an inverter network on embedding → text); nearest-neighbor retrieval against a known corpus.
Q: State the defensive principle for vector DBs in one sentence. A: Treat embeddings as PII at the sensitivity of their underlying text; access-control, audit-log, and isolate the vector DB at the same standard as the original data store.

Video script (~620 words, ~4.5 min)¶

[SLIDE 1 — Title]

Embedding-leak attacks. Five minutes. By the end you'll know why embeddings aren't anonymization and the defensive playbook.

[SLIDE 2 — The misunderstanding]

A widespread mental shortcut among teams shipping LLM features. "We store user queries as embeddings in our vector DB, so the queries are anonymized." This is wrong, in a way worth understanding precisely.

[SLIDE 3 — What an embedding is]

An embedding is a fixed-dimension vector produced by an embedding model — a deterministic function of the input text. The vector is dense. Often 384 to 8192 dimensions. Carries enough information about the input that a similar input maps to a similar vector. That's the property that makes RAG retrieval work.

The same property — similar input maps to similar vector — means the embedding contains substantial information about the input. Given an embedding, you can often reconstruct meaningful approximations of the input. Sometimes word-for-word. Sometimes a paraphrase that preserves all the sensitive content.

The L5.6 lab — actually L5.7 in extension form — demonstrates this directly: invert sentence embeddings back to recognizable text using a publicly-available inversion model.

[SLIDE 4 — Two reconstruction approaches]

Two approaches. One: direct inversion. Train an inverter network. Input is the embedding. Output is text. With a dataset of text-embedding pairs, this is supervised learning. Morris et al, 2023, demonstrated production-quality inversion of text-embedding-ada-002 and sentence-transformers embeddings, recovering about 70 percent of original word content on average. Two: nearest-neighbor retrieval against a known corpus. If the attacker knows the corpus from which the target embedding came, simple nearest-neighbor against a re-embedded corpus often recovers exact or near-exact text.

[SLIDE 5 — Why the vector DB is privacy-critical]

Most production RAG systems store embeddings in a vector DB. If the DB is breached, the embeddings are exfiltrated. The breached embeddings can be inverted to recover the underlying text. The text often contains: user query content — often PII or sensitive context. Document chunks from the source corpus — sometimes private business data. System prompts or scaffolding, cached as embeddings in some architectures.

Result: a vector-DB breach is functionally close to a database breach of the raw text. Even though the team may have classified the embeddings as "non-PII."

[SLIDE 6 — Defensive playbook]

The defensive playbook is short. One: treat embeddings as PII. If the underlying text would be PII, the embeddings are PII. Classify, store, access-control, log accordingly. Two: strict access control on the vector DB. Same standards as any sensitive data store. Auth on every query. Audit logs. Network isolation. Tenant boundary enforcement. Three: reduce embedding dimensionality where feasible. Lower-dimension embeddings retain less information. Harder to invert. Not always usable. Four: avoid storing embeddings of high-sensitivity content where possible. Process-and-discard patterns for high-risk content. Persist only when retrieval requires it. Five: combine with output filters. When embedding-derived content is surfaced to a user, apply PII redaction at the boundary.

[SLIDE 7 — Operational reality in 2026]

Operational reality in twenty-twenty-six. Most production teams do not treat embeddings as PII. They classify as low-sensitivity and apply weaker access controls than they would to the underlying text. This is a class of finding you'll see frequently in real engagements. Pointing it out and recommending the playbook above is high-ROI work.

[SLIDE 8 — Up next]

Last two theory lessons: privacy defenses in detail — DP-SGD and federated learning, then operational defenses. Then labs. See you there.

Slide outline¶

Title — "Embedding-leak attacks".
The misunderstanding — pull-quote of the wrong mental model.
What an embedding is — embedding-as-vector visualization + arrow back to text.
Two reconstruction approaches — two cards.
Vector DB as privacy-critical — breach scenario diagram: vector DB compromised → embeddings out → inverter → text out.
Defensive playbook — five-item checklist.
Operational reality — finding-shaped callout: "Most teams don't do this; high-ROI to point it out."
Up next — "L5.4.1 — DP-SGD and federated learning, ~5 min."

Production notes¶

Recording: ~4.5 min. Cap 5.
Slide 2 (the wrong mental model) sets up the lesson — pause on the wrong claim.