L1.1 — Machine learning in 30 minutes¶
Type: Theory · Duration: ~30 min · Status: Mandatory Module: Module 1 — AI/ML Foundations for Security Engineers Framework tags: foundational — no specific framework tag (enables all downstream tags)
Learning objectives¶
By the end of this lesson, the learner can: 1. Distinguish supervised, unsupervised, and reinforcement learning, with one example of each. 2. Explain the difference between training and inference and why each phase has a distinct attack surface. 3. Name the major classical model families (linear/logistic regression, decision trees, gradient boosting, neural networks) and when each is used. 4. Define five terms that every later lesson assumes: model, features, labels, loss function, generalization.
Concept primer¶
This entire lesson is the primer. We're going from zero ML literacy to "you can read an ML system diagram and ask the right security questions." If you already work with ML day-to-day, skim and confirm the security framing.
Core content¶
What "machine learning" actually means¶
Forget the hype framing. A machine-learning model is a function. Specifically: it's a function whose internal coefficients (called parameters or weights) were chosen automatically, by an algorithm, to minimize the error between the function's outputs and some target values on a dataset.
Three pieces: - Data. Examples that show what good input/output pairs look like. - Model. A parameterized function — linear equation, decision tree, neural network — with empty slots for coefficients. - Training algorithm. Code that fills in the coefficients by repeatedly comparing the model's outputs to the data and adjusting.
The output is a trained model — the function with its coefficients filled in. You can now feed it new input and get a prediction. That step — using the trained model — is called inference.
This split between training and inference is the single most important mental model for AI security. Almost every attack class lives in one or the other:
| Phase | Who controls the data | Who controls the model | Attacks rooted here |
|---|---|---|---|
| Training | model owner (usually) | model owner | data poisoning, backdoors, supply-chain compromise |
| Inference | end user (often untrusted) | model owner | prompt injection, jailbreaks, evasion, extraction, membership inference |
Internalize this table. Most of the rest of the course is a deep dive on one row or the other.
The three flavors of ML¶
Supervised learning. You have labeled examples: "this email is spam, this email is not." The model learns a mapping from input (the email) to label (spam / not). The vast majority of production ML — fraud detection, content moderation, credit scoring, image classification — is supervised. The attack-surface implication: the labels themselves are an attack vector. If an attacker can manipulate even a small fraction of labels in your training set, they can plant subtle misclassifications. We cover this in Module 4.
Unsupervised learning. No labels. The model finds structure in the data on its own — clusters, anomalies, lower-dimensional representations. Used for customer segmentation, anomaly detection, recommendation systems, and the embedding step that powers retrieval-augmented LLM apps (more on embeddings in L1.3). Attack-surface implication: clusters move when input distribution shifts. An attacker who can poison a small slice of the input stream can move the cluster boundary in their favor.
Reinforcement learning (RL). The model (here usually called an agent) takes actions in an environment, gets a reward signal, and learns a policy that maximizes long-term reward. Game-playing AIs are the canonical example. The reason you'll hear about RL constantly in 2026 is RLHF — Reinforcement Learning from Human Feedback — the alignment step that makes LLMs polite, refuse harmful requests, and follow instructions. RLHF and its successors (DPO, KTO, RLAIF) are how every frontier model gets its "behavior." Attack-surface implication: the reward model is itself a model, and it can be gamed. We touch this in Module 3 (jailbreaks) and Module 4 (harmful fine-tuning).
Classical model families — what's behind that "AI" feature¶
Before LLMs, "AI" in production almost always meant one of these. They are still everywhere, and many of the most painful production AI breaches still happen against them. Know what they are.
- Linear / logistic regression. A weighted sum of inputs. Logistic regression squashes the sum through a sigmoid to get a probability. Used heavily in credit, insurance, and ad-tech. Interpretable, fast, surprisingly hard to beat for tabular data.
- Decision trees and gradient-boosted trees (XGBoost, LightGBM, CatBoost). A series of if/else splits learned from data. Gradient-boosted trees are still state-of-the-art on most tabular benchmarks. Used heavily in fraud, finance, retail.
- Support vector machines and k-nearest-neighbors. Older but still around. Mostly historical for our purposes.
- Neural networks. Functions composed of layers of weighted sums and nonlinearities. Deep learning is the special case where the network has many layers and is trained on a lot of data. LLMs and image models are deep neural networks. Module L1.2 zooms in here.
The reason this matters for an AI security engineer: when someone says "our AI flagged this transaction as fraud," they usually mean a gradient-boosted tree, not an LLM. The attack surface looks different. Adversarial evasion against an XGBoost fraud model (Module 6) involves tabular feature perturbation, not prompt injection. If you can't tell which model is behind the feature, you'll defend the wrong surface.
Five terms you'll see in every lesson¶
- Model — a function with learned parameters that maps inputs to outputs.
- Features — the inputs the model sees, after preprocessing. Email-token counts, image pixels, the JSON fields of a transaction.
- Labels — the target values the model is supposed to predict during supervised training.
- Loss function — the formula that measures how wrong the model is on a given example. Training drives loss down.
- Generalization — the model's ability to perform well on data it hasn't seen during training. A model that's good on training data but bad on new data is overfit; that gap is itself an attack signal (the foundation of membership-inference attacks in Module 5).
Train, validate, test — and why it matters for security¶
In any real ML project, the dataset gets split into three: - Training set — what the model learns on. - Validation set — used during training to tune hyperparameters (e.g., learning rate, depth, dropout) without leaking signal from the held-out test set. - Test set — held out until the very end to give an honest measure of generalization.
A common production bug: the test set leaks into training (the same row appears in both). The model looks great in eval and embarrassing in prod. From a security standpoint: an attacker who knows the test set composition can craft inputs that resemble held-out examples — a primitive form of membership inference. Also, if a vendor's "92% accuracy" comes from a test set you can guess (public benchmark, leaked GitHub artifact), you should not trust the number.
Real-world example¶
The Microsoft Tay disaster (2016). Microsoft released a Twitter chatbot designed to learn from user interactions. Within 24 hours, coordinated users had taught it to produce racist content. It is the most-cited early example of training-time data poisoning hitting a live system at scale — and a vivid demonstration that "model learns from user inputs" without provenance controls is an extinction-level decision. The 2026 equivalent is any RAG system that retrieves from user-editable content. Source: Microsoft post-incident statement; Lee et al. coverage in TechCrunch.
Key terms¶
- Model — parameterized function whose parameters are learned from data.
- Training vs inference — the two phases; each has its own attack surface.
- Supervised vs unsupervised vs reinforcement learning — the three flavors of ML by how the learning signal is supplied.
- Overfitting — model memorizes training data; underlying mechanism of membership inference attacks.
- RLHF — Reinforcement Learning from Human Feedback; the alignment step that gives modern LLMs their "behavior."
References¶
- Goodfellow, Bengio, Courville — Deep Learning (MIT Press, 2016) — Chapter 5 "Machine Learning Basics." Free online: https://www.deeplearningbook.org/
- Christoph Molnar — Interpretable Machine Learning (online book) for an accessible overview of model families: https://christophm.github.io/interpretable-ml-book/
- Microsoft Tay incident — Microsoft Blog, "Learning from Tay's introduction," March 25, 2016
- Stuart Russell, Peter Norvig — Artificial Intelligence: A Modern Approach, 4th ed., Pearson — for RL framing
Quiz items¶
- Q: Which ML phase do the following attacks target — (a) prompt injection, (b) data poisoning? A: Inference, training. Why: Prompt injection happens at query time against a deployed model; data poisoning manipulates the training set before the model is built.
- Q: Why is the train/test split a security-relevant detail, not just an ML hygiene detail? A: A leaked or guessable test set enables membership-inference primitives and inflates trust in the model's reported accuracy. Why: Security findings sometimes hinge on whether test data was actually independent of training.
- Q: A fintech says "our AI rejected the loan." Which model family is it most likely to be in 2026? A: A gradient-boosted decision tree (XGBoost / LightGBM / CatBoost), not an LLM. Why: Tabular financial data is still the home of GBDTs; defending it means thinking about feature perturbation, not prompt injection.
- Q: Define overfitting in one sentence and name one attack class that exploits it. A: Overfitting is when a model performs well on training data but poorly on unseen data; membership-inference attacks exploit this gap to determine whether a specific record was in the training set. Why: Module 5 builds on this directly.
Video script¶
[SLIDE 1 — Title]
Welcome back. This is the lesson where we go from zero ML literacy to "you can read an ML system diagram and ask the right security questions." Thirty minutes. If you already work with ML day-to-day, skim and confirm I'm framing this for our security context correctly.
[SLIDE 2 — What is ML, really]
Forget the hype. A machine-learning model is a function. Specifically, it's a function whose internal coefficients — the weights, the parameters — were chosen automatically by an algorithm to minimize the error between the function's outputs and some target values on a dataset. Three pieces: data, model, training algorithm. The output is a trained model, which you can now use to make predictions on new input. Using the trained model — that's inference.
[SLIDE 3 — The single most important table in this course]
Here's the table you need to memorize. Training and inference are two phases. They have two different attack surfaces. Training-phase attacks: data poisoning, backdoors, supply-chain compromise. Inference-phase attacks: prompt injection, jailbreaks, evasion, extraction, membership inference. When someone describes an AI vulnerability to you, the first question to ask yourself is: which phase. That sets the entire defensive playbook.
[SLIDE 4 — Three flavors of ML]
Three flavors. Supervised — you have labeled examples and the model learns the mapping. This is most production ML: fraud detection, content moderation, credit scoring. Unsupervised — no labels, the model finds structure on its own. This is what makes embedding-based retrieval work, which is why it matters for RAG systems we'll attack later. Reinforcement learning — an agent acts in an environment, gets reward, learns a policy. The reason you'll hear about RL constantly in 2026 is RLHF — reinforcement learning from human feedback — the alignment step that gives modern LLMs their behavior.
[SLIDE 5 — Classical model families]
Before LLMs, "AI" almost always meant one of these. They are still everywhere. Linear and logistic regression. Decision trees and gradient-boosted trees — XGBoost, LightGBM, CatBoost — still state-of-the-art on tabular data, used heavily in fraud and finance. Support vector machines and k-nearest-neighbors — older. Neural networks — composed of layers of weighted sums and nonlinearities, deep learning is the special case with many layers. LLMs and image models are deep neural networks. We'll zoom in next lesson.
[SLIDE 6 — Why the family matters for you]
When someone says "our AI flagged this transaction as fraud," in 2026 that usually means a gradient-boosted tree, not an LLM. The attack surface is completely different. Adversarial evasion against an XGBoost fraud model — tabular feature perturbation, not prompt injection. If you can't tell which model family is behind the AI feature, you'll defend the wrong surface and miss the actual issue.
[SLIDE 7 — Five terms to remember]
Five terms you will see in every lesson from here on out. Model: parameterized function. Features: the inputs the model sees after preprocessing. Labels: the targets during supervised training. Loss function: the formula that measures how wrong the model is on a given example. Generalization: how well the model performs on data it hasn't seen. The gap between training and test performance is overfitting, and overfitting is the underlying mechanism of membership inference attacks — we'll get to that in Module 5.
[SLIDE 8 — Train, validate, test]
Real ML projects split data three ways: train, validate, test. Common production bug: the test set leaks into training, the model looks great in eval and embarrassing in prod. From a security standpoint, an attacker who knows the test set composition can craft inputs that resemble held-out examples — primitive membership inference. And if a vendor's accuracy number comes from a test set you can guess — a public benchmark, a leaked artifact — don't trust the number.
[SLIDE 9 — Up next]
Next lesson: neural networks and deep learning. We zoom in on the family that powers everything modern, including LLMs. See you there.
Slide outline¶
- Title — "Machine learning in 30 minutes". Subtitle: "Just enough to defend it."
- What is ML, really — three-box diagram: Data → Training algorithm → Trained model → (arrow) → Inference. Caption: "ML is just a function whose coefficients were chosen automatically."
- The single most important table — full-screen table of training vs inference attack classes (the one in the lesson body). Bold the column headers.
- Three flavors of ML — three columns: Supervised | Unsupervised | RL. One concrete example each. One attack-surface implication each.
- Classical model families — five cards with logo / icon: Linear/Logistic, GBDT, SVM/KNN, Neural Networks. Highlight "GBDT is still state-of-the-art for tabular."
- Family-matters callout — image of a "Loan rejected by AI" notification; caption: "When you hear 'AI', ask which family."
- Five terms — list of the five terms, definitions in smaller font.
- Train/validate/test — Venn-style diagram showing leak between train and test, with a red "X" through it.
- Up next — "L1.2 — Neural networks and deep learning, ~30 min."
Production notes¶
- Recording: ~28-32 min raw, target 27-30 min final.
- Slide 3 (the attack-surface table) is the visual anchor of the whole module — make it as clean and memorable as possible. Worth a dedicated B-roll or animated reveal if your editing pipeline supports it.
- Slide 5: licensed logos for XGBoost/LightGBM/CatBoost are fine to use under their respective marks; check before record date.
- Tone: this is the lesson where we set the foundation. Pace is slower than L0.2; resist the urge to compress.