Skip to content

L5.1.2 — Query-based extraction techniques

Type: Theory · Duration: ~5 min · Status: Mandatory Module: Module 5 — Model Extraction, Inversion & Membership Inference Framework tags: OWASP LLM10 · MITRE ATLAS AML.T0024

Learning objectives

  1. Recognize the four-stage attack pattern for query-based functional extraction.
  2. Identify three signals defenders use to detect extraction-shaped query traffic.

Core content

The four-stage attack pattern

Most functional-substitute extraction attacks follow the same shape, varying mostly in cleverness of step 2:

1. Reconnaissance. Identify the target model. Its modality (text, image, tabular). Its input/output signature. Its rough domain coverage. Its rate limits.

2. Query generation. Generate input queries that, in aggregate, span the target's behavior space. Three common strategies: - Naive random sampling. Generate diverse inputs from a plausible distribution. Many queries needed; transferability is poor. - Active learning. Use a substitute-in-progress to pick the next query that most reduces uncertainty about the target's decision boundary. Far fewer queries needed; the canonical academic strategy. - Knowledge-distillation queries. Use the target's outputs (especially logits or probability distributions, if exposed) as teacher signals to train the substitute. Combines extraction with classical knowledge distillation.

3. Collection. Send queries, collect responses, store input/output pairs. Often spread across rate-limited accounts, time-distributed to avoid detection.

4. Training the substitute. Train a model on the collected pairs. The substitute's architecture doesn't need to match the target's — distillation results show smaller architectures can capture most of the behavior. Validation against held-out target queries to verify behavioral match.

Query count: how many is enough?

Reference numbers (will be out of date):

Target type Approximate queries for functional substitute
Small image classifier (10 classes, CIFAR-scale) 1,000–10,000
Tabular fraud classifier 5,000–50,000
Specialized domain LLM (narrow use case) 10,000–100,000
Frontier general LLM Millions (but partial extraction useful much sooner)

Active learning reduces these by 5–50×. Knowledge distillation when logits are exposed reduces further.

Defender detection signals

Three signal classes defenders use to detect extraction-shaped traffic:

1. Query diversity anomalies. Legitimate users query within a narrow distribution related to their use case. Extractors deliberately span the input space. A single account showing unusually high input-distribution coverage is a flag.

2. Sequential / systematic patterns. Extractors often probe systematically (e.g., grid-walk pixel perturbations, lexicon-walk in NLP). Pattern detection on input sequences flags this.

3. Output-driven query choice. Extractors using active learning visibly adapt — their queries cluster near classes/regions where the substitute is uncertain. Defenders who can model the next expected query (or its uncertainty profile) can flag the adaptation pattern.

None of these is perfect. False positives include power users with legitimately broad needs, evaluation pipelines, and integration tests. The cost-benefit of stricter detection is real.

The 2026 defender's toolkit

What the defender actually deploys in 2026:

  • Per-tenant query monitoring with anomaly detection on diversity, rate, and pattern features.
  • Tiered access. Power users self-identify at signup, agree to attribution and usage caps.
  • Watermarking (where applicable) — outputs carry subtle, hard-to-remove signals that propagate into the substitute.
  • Terms of service + legal layer. Extraction violates TOS; vendors pursue when detected. Won't prevent the attack but raises costs and provides recovery path.

What works less well: - Strict rate limits alone (extractors distribute across accounts). - Output noise injection (degrades legitimate users; sophisticated extractors average over it).

Real-world example

Tramèr et al. (2016) demonstrated functional extraction of production ML models (BigML, AmazonML, Microsoft Azure ML at the time) using on the order of 1,000–10,000 queries — well within free-tier limits at the time. The vendors patched their specific telemetry; the underlying attack class remains effective against generic ML APIs in 2026, defended primarily by the toolkit above.

Key terms

  • Active learning extraction — substitute drives query selection to minimize uncertainty.
  • Knowledge-distillation extraction — uses logits/probabilities as teacher signals.
  • Watermarking — defender-side technique embedding signals in outputs that propagate into substitutes.

References

  • Tramèr et al., "Stealing Machine Learning Models via Prediction APIs" (USENIX 2016).
  • Krishna et al., "Thieves on Sesame Street!" (2019) — extraction against NLP models.
  • Papernot et al., "Practical Black-Box Attacks against Machine Learning" (2017) — transferability foundation.

Quiz items

  1. Q: What are the four stages of a query-based functional extraction attack? A: Reconnaissance; query generation; collection; training the substitute.
  2. Q: Why does active learning reduce required query counts? A: Because the in-progress substitute picks the next query to maximize information gain about the target's decision boundary, instead of sampling randomly.
  3. Q: Name two of the three defender detection signals for extraction traffic. A: Any two of: query diversity anomalies, sequential/systematic patterns, output-driven query adaptation.

Video script (~620 words, ~4.5 min)

[SLIDE 1 — Title]

Query-based extraction techniques. Five minutes.

[SLIDE 2 — Four-stage pattern]

Most functional-substitute extraction attacks follow the same four-stage pattern. Reconnaissance: identify the target's modality, input/output signature, rough domain coverage, rate limits. Query generation: generate inputs that span the target's behavior space. Collection: send queries, collect responses, store input/output pairs. Often spread across rate-limited accounts and time-distributed. Training the substitute: train a model on the collected pairs. Substitute architecture doesn't need to match the target.

[SLIDE 3 — Three query-generation strategies]

Three common query-generation strategies. One: naive random sampling. Generate diverse inputs from a plausible distribution. Many queries needed. Transferability is poor. Two: active learning. Use a substitute-in-progress to pick the next query that most reduces uncertainty about the target's decision boundary. Far fewer queries needed. The canonical academic strategy. Three: knowledge-distillation queries. Use the target's outputs — especially logits or probability distributions, if exposed — as teacher signals. Combines extraction with classical distillation.

[SLIDE 4 — Query count reference]

How many queries? Approximate numbers. Small image classifier, ten classes, CIFAR-scale: one thousand to ten thousand. Tabular fraud classifier: five thousand to fifty thousand. Specialized-domain LLM, narrow use case: ten thousand to one hundred thousand. Frontier general LLM: millions, but partial extraction useful much sooner. Active learning reduces these by five to fifty times. Knowledge distillation when logits exposed reduces further.

[SLIDE 5 — Three defender detection signals]

Three signal classes defenders use to detect extraction-shaped traffic. One: query diversity anomalies. Legitimate users query within a narrow distribution related to their use case. Extractors deliberately span the input space. A single account showing unusually high input-distribution coverage is a flag. Two: sequential or systematic patterns. Extractors often probe systematically — grid-walk pixel perturbations, lexicon-walk in NLP. Pattern detection on input sequences flags this. Three: output-driven query choice. Extractors using active learning visibly adapt. Queries cluster near classes or regions where the substitute is uncertain. Defenders who can model the next expected query can flag the adaptation pattern.

None of these is perfect. False positives include power users with legitimately broad needs, evaluation pipelines, integration tests. Cost-benefit of stricter detection is real.

[SLIDE 6 — 2026 defender toolkit]

What the defender actually deploys in 2026. Per-tenant query monitoring with anomaly detection on diversity, rate, pattern. Tiered access — power users self-identify at signup, agree to attribution and usage caps. Watermarking where applicable — outputs carry subtle, hard-to-remove signals that propagate into the substitute. Terms of service plus legal layer — extraction violates TOS; vendors pursue when detected. Won't prevent the attack but raises costs.

What works less well: strict rate limits alone — extractors distribute across accounts. Output noise injection — degrades legitimate users; sophisticated extractors average over it.

[SLIDE 7 — Up next]

Next: membership inference. Different attack class, related defensive concerns. Five minutes. See you there.

Slide outline

  1. Title — "Query-based extraction techniques".
  2. Four-stage pattern — horizontal flow: recon → queries → collection → train.
  3. Three query strategies — three cards: random · active learning · distillation.
  4. Query count reference — table from lesson body.
  5. Three detection signals — three-card grid.
  6. 2026 defender toolkit — what-works vs what-doesn't columns.
  7. Up next — "L5.2.1 — Membership inference, ~5 min."

Production notes

  • Recording: ~4.5 min. Cap 5.
  • Slide 4 (the query-count reference table) will date quickly — refresh annually.