L9.2 — The capstone deliverable: four artifacts + rubric¶

Type: Reading + planning · Duration: ~30 min · Status: Mandatory Module: Module 9 — Capstone Project

Overview¶

You will produce a single folder, runs/lab9/helios-capstone/, containing four artifacts. This page specifies what each artifact must contain and the grading rubric.

Read this before starting the work. Refer back to it while you work. Self-grade against it after.

Deliverable 1: `threat-model.md`¶

Required content¶

System overview (≤ 200 words): purpose, components, primary stakeholders. Cite the scenario brief (L9.1) rather than re-stating it.
Data-flow diagram (DFD) with trust boundaries. Mermaid or equivalent. Must show: nurse, Triage UI, Helios API, RAG corpus, vector DB, FHIR API, LLM (Anthropic), agent tools, EHR (downstream).
Trust boundary annotations. At minimum, mark the boundaries between: nurse↔UI, UI↔API, API↔LLM, API↔FHIR, API↔EHR, hospital-admin↔RAG-corpus, Helios-curator↔RAG-corpus.
STRIDE-MA threat table. At least 12 rows (more is better; aim for 15-20). Columns: Threat ID, DFD element, Category (S/T/R/I/D/E/M/A), Description, Impact (Low/Med/High), Likelihood (Low/Med/High), Status (Open/Mitigated). At least 3 rows must be M (Model manipulation) and at least 2 must be A (Agency abuse).
ATLAS technique mapping for the top 5 threats.
OWASP LLM Top 10 coverage matrix. All 10 entries assessed: Applicable? Currently controlled? Gap?

Rubric¶

Criterion	Excellent	Acceptable	Below
DFD completeness	All components + boundaries shown	Most components + ≥3 boundaries	Components missing or no boundaries
STRIDE-MA coverage	≥15 rows, all 8 letters represented where applicable	≥12 rows, ≥6 letters	<12 rows or letters obviously missing
ATLAS mapping accuracy	Correct technique IDs for top 5, with sub-techniques where appropriate	Correct top-level technique IDs	Mappings inaccurate or absent
OWASP coverage	All 10 with rationale + control assessment	All 10 with at-least-binary assessment	Multiple LLM entries unassessed
Plausibility	Threats are realistic and specific to the Triage Copilot, not generic	Mostly specific, some generic	Mostly generic, copy-paste-feel

Pass: Acceptable in all 5. Excellent: Excellent in at least 3.

Deliverable 2: `red-team-report.md`¶

Required content¶

A red-team report with at least 5 distinct findings against Triage Copilot. Each finding contains:

Title (descriptive, severity-coded — e.g., "Critical: Indirect Prompt Injection via Hospital Protocol Corpus").
Severity (Critical / High / Medium / Low / Informational) with one-line justification (impact × likelihood × exploitability).
Multi-framework citations in one block: ATLAS technique IDs, OWASP LLM IDs, NIST AI RMF subcategory, EU AI Act article(s), HIPAA Safeguards rule(s) if applicable.
Reproduction steps (the attacker's playbook — what to do, in what order, what success looks like).
Impact (one paragraph: what this enables, who's affected, worst-case outcome).
Recommendation (technical control(s) + governance control(s); reference Module 7 / Module 8 patterns where applicable).

Finding distribution: at least 2 Critical or High, at least 1 from the data-/supply-chain class (M4), at least 1 from the agent-escape class (M3 L3.4), at least 1 from the extraction/privacy class (M5), at least 1 from the governance gap class (M8).

Rubric¶

Criterion	Excellent	Acceptable	Below
Number of findings	≥7 distinct findings	≥5 distinct findings	<5
Severity calibration	Each severity has rationale; distribution is realistic	Most severities justified	Severities feel arbitrary
Multi-framework tags	All findings cite ATLAS + OWASP + NIST + EU AI Act + (HIPAA where relevant)	Most findings cite ≥3 frameworks	Single-framework citations only
Reproduction quality	A peer could reproduce from your steps	Steps clear but some gaps	Steps too vague to reproduce
Recommendation quality	Specific tech + governance controls; references M7/M8 patterns; addresses root cause not symptom	Controls named but generic	Controls vague or symptom-only

Pass: Acceptable in all 5.

Deliverable 3: `remediation-plan.md`¶

Required content¶

Prioritization summary (table): each finding from the red-team report mapped to a remediation action with owner, target date, status.
Three time horizons:
Before launch (block-launch) — what must happen before July 1, 2026. Realistic list given the 5-week timeline.
At launch (launch-with-mitigations) — what ships with documented residual risk and accepted mitigations.
Post-launch (next 90 days) — what gets remediated after launch with documented schedule.
Technical controls — guardrails, logging, structured output, dual-LLM, etc. Reference Module 7 patterns by lesson.
Governance controls — risk register entries, Article 11 documentation gaps to close, IR playbook updates, AI-BOM creation. Reference Module 8 patterns by lesson.
Residual-risk acceptance — explicitly: what residual risks does Helios accept post-launch, and what's the rationale.
Resource estimate — rough effort (engineering days, dollars) for each remediation action.

Rubric¶

Criterion	Excellent	Acceptable	Below
Prioritization logic	Defensible — block/launch/post-launch split makes sense given July 1 constraint	Defensible split	Arbitrary or unrealistic split
Technical specificity	Specific controls named, tied to course patterns	Controls named generically	Controls vague
Governance specificity	Risk register / Article 11 / IR / AI-BOM addressed concretely	Most governance items addressed	Governance under-addressed
Residual risk handling	Explicit list with rationale	Mentioned but light	Not addressed
Resource realism	Effort estimates are plausible	Estimates present, some unrealistic	Estimates missing

Pass: Acceptable in all 5.

Deliverable 4: `pre-launch-checklist.md`¶

Required content¶

A checklist Helios runs before July 1 (and at any future major release). Each line item:

Description of the check.
Framework citation (NIST AI RMF subcategory, EU AI Act article, HIPAA rule).
Classification: Launch-blocking (must pass to launch), Launch-with-mitigations (can ship if documented), or Informational (track but doesn't gate).
Status for Triage Copilot specifically (Pass / Fail / Partial).
Owner (Engineering / Security / Legal-Compliance / Clinical).

Required coverage: - NIST AI RMF: at least one item per function (Govern, Map, Measure, Manage). - EU AI Act: at least one item per applicable article (9, 11, 12, 14, 15). - HIPAA: at least one item on PHI handling. - OWASP LLM Top 10: at least one item touching each high-applicability entry.

Minimum ~25 items across the categories.

Rubric¶

Criterion	Excellent	Acceptable	Below
Coverage	All required frameworks + OWASP touched	All required frameworks touched	Framework gaps
Specificity	Items are testable and unambiguous	Items mostly testable	Items vague or non-testable
Classification	Block/mitigations/info split is defensible	Classification mostly defensible	Classification arbitrary
Status for Triage Copilot	All items have a status reflecting actual scenario	Most items have a status	Status missing or unjustified
Practical usability	Helios could literally run this checklist	Mostly usable, some gaps	Not usable as written

Pass: Acceptable in all 5.

Overall rubric for the capstone¶

You pass if you score Acceptable across all four deliverables. Excellent in 2+ deliverables earns "Capstone with distinction" on your certificate (if your LMS supports it).

Most failures concentrate in two places: - Threat model — generic / not specific to Triage Copilot. - Pre-launch checklist — not concretely testable / no per-item status.

The reference solution (L9.3) shows what good looks like; consult after your first pass, not before.

Self-grading after submission¶

After producing your four artifacts:

Read each rubric criterion above.
Score yourself Excellent / Acceptable / Below on each.
If you scored Below on anything: revise that artifact.
If you scored Acceptable across all 4: you pass. Consult L9.3 reference solution for ideas to push to Excellent.

Time-budget reminders¶

Don't get stuck on perfect. A good threat model with 15 specific threats beats a great threat model with 8.
Reference, don't restate. Your red-team report should reference scenario-brief details, not re-explain Triage Copilot.
Use the templates from earlier modules. L2.6 threat-model template, L3 finding template, L8.7 governance artifacts. The course has built every template you need.

When you're done, go to L9.3 for the reference solution to compare. Don't peek before; the comparison is more valuable after your own attempt.

L9.2 — The capstone deliverable: four artifacts + rubric¶

Overview¶

Deliverable 1: threat-model.md¶

Required content¶

Rubric¶

Deliverable 2: red-team-report.md¶

Required content¶

Rubric¶

Deliverable 3: remediation-plan.md¶

Required content¶

Rubric¶

Deliverable 4: pre-launch-checklist.md¶

Required content¶

Rubric¶

Overall rubric for the capstone¶

Self-grading after submission¶

Time-budget reminders¶

Deliverable 1: `threat-model.md`¶

Deliverable 2: `red-team-report.md`¶

Deliverable 3: `remediation-plan.md`¶

Deliverable 4: `pre-launch-checklist.md`¶