Spaces:

yakilee
/

TrialPath

Sleeping

yakilee Claude Opus 4.6 commited on Feb 6

Commit

1abff4e

0 Parent(s):

chore: initialize project skeleton with pyproject.toml

- Add pyproject.toml with core deps (pydantic, httpx, streamlit, pytest)
- Empty package structure: trialpath/ (models, services, agent) and app/ (pages, components, services)
- Configure ruff and pytest

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (14) hide show

CLAUDE.md +82 -0
app/__init__.py +0 -0
app/tests/__init__.py +0 -0
docs/TrialPath AI technical design.md +487 -0
docs/Trialpath PRD.md +246 -0
docs/tdd-guide-backend-service.md +0 -0
docs/tdd-guide-data-evaluation.md +2384 -0
docs/tdd-guide-ux-frontend.md +1524 -0
pyproject.toml +29 -0
trialpath/__init__.py +0 -0
trialpath/agent/__init__.py +0 -0
trialpath/models/__init__.py +0 -0
trialpath/services/__init__.py +0 -0
trialpath/tests/__init__.py +0 -0

CLAUDE.md ADDED Viewed

	@@ -0,0 +1,82 @@

+# CLAUDE.md
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+## Project Overview
+TrialPath is an AI-powered clinical trial matching system for NSCLC (Non-Small Cell Lung Cancer) patients. It is currently in **pre-implementation design phase** — only design documents exist, no source code yet.
+**Core idea:** Help patients understand which clinical trials they may qualify for, transform "rejection" into "actionable next steps" via gap analysis.
+## Design Documents
+- `Trialpath PRD.md` — Product requirements, success metrics, HAI-DEF submission plan
+- `TrialPath AI Synergy in Digital Health Trials.md` — Technical architecture, data contracts, Parlant workflow design
+## Architecture (5 Components)
+1. **UI & Orchestrator** — Streamlit/FastAPI app embedding Parlant engine
+2. **Parlant Agent + Journey** — Single agent (`patient_trial_copilot`) with 5 states: `INGEST` → `PRESCREEN` → `VALIDATE_TRIALS` → `GAP_FOLLOWUP` → `SUMMARY`
+3. **MedGemma 4B** (HF endpoint) — Multimodal extraction from PDFs/images → `PatientProfile` + evidence spans
+4. **Gemini 3 Pro** — LLM planner: generates `SearchAnchors` from profile, reranks trials, orchestrates criterion evaluation
+5. **ClinicalTrials MCP Server** (existing, not custom) — Wraps ClinicalTrials.gov REST API v2
+## Key Design Decisions
+- **No vector DB / RAG** — Uses agentic search via ClinicalTrials.gov API with iterative query refinement
+- **Reuse existing MCP** — Don't build custom trial search; use off-the-shelf ClinicalTrials MCP servers
+- **Two-stage clinical screening** — Mirrors real-world: prescreen (minimal dataset) → validation (full criterion-by-criterion)
+- **Evidence-linked** — Every decision must cite source doc/page/span
+- **Gap analysis as core differentiator** — "You'd qualify IF you had X" rather than just "No match"
+## Data Contracts (JSON Schemas)
+Four core contracts defined in the tech design doc (section 4):
+- **PatientProfile v1** — MedGemma output with demographics, diagnosis, biomarkers, labs, treatments, unknowns
+- **SearchAnchors v1** — Gemini-generated query params for MCP search
+- **TrialCandidate v1** — Normalized MCP search results
+- **EligibilityLedger v1** — Per-trial criterion-level assessment with evidence pointers and gaps
+## Planned Code Structure
+From PRD deliverables section:
+```
+data/generate_synthetic_patients.py
+data/generate_noisy_pdfs.py
+matching/medgemma_extractor.py
+matching/agentic_search.py          # Parlant + Gemini + MCP
+evaluation/run_trec_benchmark.py
+```
+## Planned Tech Stack
+- Python (Streamlit or FastAPI)
+- Google Gemini 3 Pro (orchestration)
+- MedGemma 4B via Hugging Face endpoint (multimodal extraction)
+- Parlant (agentic workflow engine)
+- Synthea FHIR (synthetic patient generation)
+- TREC Clinical Trials Track 2021/2022 (benchmarking)
+## Success Targets
+- MedGemma Extraction F1 >= 0.85
+- Trial Retrieval Recall@50 >= 0.75
+- Trial Ranking NDCG@10 >= 0.60
+- Criterion Decision Accuracy >= 0.85
+- Latency < 15s, Cost < $0.50/session
+## Scope
+- Disease: NSCLC only
+- Data: Synthetic patients only (no real PHI)
+- Timeline: 3-month PoC
+## Dev tools
+- use huggingface cli for model deployment
+- use uv, ruff, astral ty
+- use ripgrep
+## Commit atomically

app/__init__.py ADDED Viewed

File without changes

app/tests/__init__.py ADDED Viewed

File without changes

docs/TrialPath AI technical design.md ADDED Viewed

	@@ -0,0 +1,487 @@

+Below is a compact but deepened tech design doc that applies your three constraints:
+1. Reuse existing ClinicalTrials MCPs.
+2. Make Parlant workflows map tightly onto real clinical screening.
+3. Lay out a general patient plan (using synthetic data) that feels like a real-world journey.
+No code; just user flow, data contracts, and architecture.
+---
+## **1\. Scope & Positioning**
+**PoC Goal (2‑week sprint, YAGNI):**
+A working, demoable *patient‑centric* trial-matching copilot that:
+* Takes **synthetic NSCLC patients** (documents \+ minimal metadata).
+* Uses **MedGemma 4B multimodal** to understand those artifacts.
+* Uses **Gemini 3 Pro \+ Parlant** to orchestrate **patient‑to‑trials matching** via an **off‑the‑shelf ClinicalTrials MCP server**.
+* Produces an **eligibility ledger \+ gap analysis** aligned with real clinical screening workflows (prescreen → validation), not “toy” UX.
+We explicitly **don’t** build our own trial MCP, own search stack, or multi-service infra. Everything runs in a thin orchestrator \+ UI process.
+---
+## **2\. Real-World Screening Workflow Mapping**
+Evidence from clinical practice and trial‑matching research converges on a two‑stage flow:[appliedclinicaltrialsonline+4](https://www.appliedclinicaltrialsonline.com/view/clinical-trial-matching-solutions-understanding-the-landscape)
+1. **Prescreening**
+   * Quick eligibility judgment on a *minimal dataset*: diagnosis, stage, functional status (ECOG), basic labs, key comorbidities.
+   * Usually: oncologist \+ coordinator \+ minimal EHR context.
+   * Goal: “Is this patient worth deeper chart review for any trials here?”
+2. **Validation (Full Match / Chart Review)**
+   * Detailed comparison of **full record** vs **full inclusion/exclusion**, often 40–60 criteria per trial.
+   * Typically done by a coordinator/CRA with investigator sign‑off.
+   * Goal: for a *specific trial*, decide: *eligible / excluded / unclear → needs further tests*.
+Our PoC should simulate this **two‑stage workflow**:
+* **Stage 1 \= “Patient‑First Prescreen”** → shortlist trials via MCP \+ Gemini using MedGemma‑extracted “minimal dataset”.
+* **Stage 2 \= “Trial‑Specific Validation”** → trial‑by‑trial, criterion‑by‑criterion ledger using MedGemma evidence.
+Parlant Journeys become the *explicit codification* of these two stages \+ transitions.
+---
+## **3\. High-Level Architecture (YAGNI, Reusing MCP)**
+## **3.1 Components**
+**1\) UI & Orchestrator (single process)**
+* Streamlit/FastAPI-style app (exact stack is secondary) that:
+  * Hosts the chat/stepper UI.
+  * Embeds **Parlant** and maintains session state.
+  * Calls external tools (Gemini API, MedGemma HF endpoint, ClinicalTrials MCP).
+**2\) Parlant Agent \+ Journey**
+* Single Parlant agent, e.g. `patient_trial_copilot`.
+* One **Journey** with explicit stages mirroring real-world workflow:
+  * `INGEST` → `PRESCREEN` → `VALIDATE_TRIALS` → `GAP_FOLLOWUP` → `SUMMARY`.
+* Parlant rules enforce:
+  * When to call which tool.
+  * When to move from prescreen to validation.
+  * When to ask the patient (synthetic persona) for more documents.
+**3\) MedGemma 4B Multimodal Service (HF endpoint)**
+* Input: PDF(s) \+ optional images.
+* Output: structured **PatientProfile** \+ **evidence spans** (doc/page/region references).
+* Used twice:
+  * Once for **prescreen dataset** extraction.
+  * Once for **criterion‑level validation** (patient vs trial snippets).
+**4\) Gemini 3 Pro (LLM Planner & Re‑ranker)**
+* Uses Google AI / Vertex Gemini 3 Pro for:
+  * Generating query parameters for ClinicalTrials MCP from PatientProfile.
+  * Interpreting MCP results & producing ranked **TrialCandidate** list.
+  * Orchestrating criterion slicing and gap reasoning.
+* Strategy: keep Gemini in **tools \+ structured outputs** mode; no direct free-form “actions”.
+**5\) ClinicalTrials MCP Server (Existing)**
+* Choose an existing **ClinicalTrials MCP server** rather than hand-rolling: e.g. one of the open-source MCP servers wrapping the ClinicalTrials.gov REST API v2.[github+3](https://github.com/JackKuo666/ClinicalTrials-MCP-Server)
+* Must support at least:
+  * `search_trials(parameters)` → list of (NCT ID, title, conditions, locations, status, phase, eligibility text).
+  * `get_trial(nct_id)` → full record including inclusion/exclusion criteria.
+## **3.2 Why Reuse MCP is Critical**
+* **Time**: ClinicalTrials.gov v2 API is detailed and somewhat finicky; paging, filters, field lists. Existing MCPs already encode those details \+ JSON schemas.[nlm.nih+1](https://www.nlm.nih.gov/pubs/techbull/ma24/ma24_clinicaltrials_api.html)
+* **Alignment with agentic ecosystems**: These MCP servers are already shaped as “tools” for LLMs. We just plug Parlant/Gemini on top.
+* **YAGNI**: custom MCP or RAG index for trials is a post‑PoC optimization.
+---
+## **4\. Data Contracts (Core JSON Schemas)**
+We keep contracts minimal but explicit, so we can test each piece in isolation.
+## **4.1 PatientProfile (v1)**
+Output of MedGemma’s **prescreen extraction**; updated as new docs arrive:
+json
+`{`
+  `"patient_id": "string",`
+  `"source_docs": [`
+    `{ "doc_id": "string", "type": "clinic_letter|pathology|lab|imaging", "meta": {} }`
+  `],`
+  `"demographics": {`
+    `"age": 52,`
+    `"sex": "female"`
+  `},`
+  `"diagnosis": {`
+    `"primary_condition": "Non-Small Cell Lung Cancer",`
+    `"histology": "adenocarcinoma",`
+    `"stage": "IVa",`
+    `"diagnosis_date": "2025-11-15"`
+  `},`
+  `"performance_status": {`
+    `"scale": "ECOG",`
+    `"value": 1,`
+    `"evidence": [{ "doc_id": "clinic_1", "page": 2, "span_id": "s_17" }]`
+  `},`
+  `"biomarkers": [`
+    `{`
+      `"name": "EGFR",`
+      `"result": "Exon 19 deletion",`
+      `"date": "2026-01-10",`
+      `"evidence": [{ "doc_id": "path_egfr", "page": 1, "span_id": "s_3" }]`
+    `}`
+  `],`
+  `"key_labs": [`
+    `{`
+      `"name": "ANC",`
+      `"value": 1.8,`
+      `"unit": "10^9/L",`
+      `"date": "2026-01-28",`
+      `"evidence": [{ "doc_id": "labs_jan", "page": 1, "span_id": "tbl_anc" }]`
+    `}`
+  `],`
+  `"treatments": [`
+    `{`
+      `"drug_name": "Pembrolizumab",`
+      `"start_date": "2024-06-01",`
+      `"end_date": "2024-11-30",`
+      `"line": 1,`
+      `"evidence": [{ "doc_id": "clinic_2", "page": 3, "span_id": "s_45" }]`
+    `}`
+  `],`
+  `"comorbidities": [`
+    `{`
+      `"name": "CKD",`
+      `"grade": "Stage 3",`
+      `"evidence": [{ "doc_id": "clinic_1", "page": 2, "span_id": "s_20" }]`
+    `}`
+  `],`
+  `"imaging_summary": [`
+    `{`
+      `"modality": "MRI brain",`
+      `"date": "2026-01-20",`
+      `"finding": "Stable 3mm left frontal lesion, no enhancement",`
+      `"interpretation": "likely inactive scar",`
+      `"certainty": "low|medium|high",`
+      `"evidence": [{ "doc_id": "mri_report", "page": 1, "span_id": "s_9" }]`
+    `}`
+  `],`
+  `"unknowns": [`
+    `{ "field": "EGFR", "reason": "No clear mention", "importance": "high" }`
+  `]`
+`}`
+Notes:
+* `unknowns` is **explicit**, enabling Parlant to decide what to ask for in `GAP_FOLLOWUP`.
+* `evidence` structure enables later criterion-level ledger to reference the same spans.
+* This is **not** a fully normalized EHR; it’s what’s needed for prescreening.[pmc.ncbi.nlm.nih+1](https://pmc.ncbi.nlm.nih.gov/articles/PMC11612666/)
+## **4.2 SearchAnchors (v1)**
+Intermediate structure Gemini produces from PatientProfile to drive the MCP search:
+json
+`{`
+  `"condition": "Non-Small Cell Lung Cancer",`
+  `"subtype": "adenocarcinoma",`
+  `"biomarkers": ["EGFR exon 19 deletion"],`
+  `"stage": "IV",`
+  `"geography": {`
+    `"country": "DE",`
+    `"max_distance_km": 200`
+  `},`
+  `"age": 52,`
+  `"performance_status_max": 1,`
+  `"trial_filters": {`
+    `"recruitment_status": ["Recruiting", "Not yet recruiting"],`
+    `"phase": ["Phase 2", "Phase 3"]`
+  `},`
+  `"relaxation_order": [`
+    `"phase",`
+    `"distance",`
+    `"biomarker_strictness"`
+  `]`
+`}`
+This mirrors patient‑centric matching literature: patient characteristics \+ geography \+ site status.[nature+1](https://www.nature.com/articles/s41467-024-53081-z)
+## **4.3 TrialCandidate (v1)**
+Returned by ClinicalTrials MCP search and lightly normalized:
+json
+`{`
+  `"nct_id": "NCT01234567",`
+  `"title": "Phase 3 Study of Osimertinib in EGFR+ NSCLC",`
+  `"conditions": ["NSCLC"],`
+  `"phase": "Phase 3",`
+  `"status": "Recruiting",`
+  `"locations": [`
+    `{ "country": "DE", "city": "Berlin" },`
+    `{ "country": "DE", "city": "Hamburg" }`
+  `],`
+  `"age_range": { "min": 18, "max": 75 },`
+  `"fingerprint_text": "short concatenation of title + key inclusion/exclusion + keywords",`
+  `"eligibility_text": {`
+    `"inclusion": "raw inclusion criteria text ...",`
+    `"exclusion": "raw exclusion criteria text ..."`
+  `}`
+`}`
+`fingerprint_text` is purposely short and designed for Gemini reranking; full eligibility goes to MedGemma for criterion analysis.
+## **4.4 EligibilityLedger (v1)**
+Final artifact per trial, shown to the “clinician” or patient:
+json
+`{`
+  `"patient_id": "P001",`
+  `"nct_id": "NCT01234567",`
+  `"overall_assessment": "likely_eligible|likely_ineligible|uncertain",`
+  `"criteria": [`
+    `{`
+      `"criterion_id": "inc_1",`
+      `"type": "inclusion",`
+      `"text": "Histologically confirmed NSCLC, stage IIIB/IV",`
+      `"decision": "met|not_met|unknown",`
+      `"patient_evidence": [{ "doc_id": "clinic_1", "page": 1, "span_id": "s_12" }],`
+      `"trial_evidence": [{ "field": "eligibility_text.inclusion", "offset_start": 0, "offset_end": 80 }]`
+    `},`
+    `{`
+      `"criterion_id": "exc_3",`
+      `"type": "exclusion",`
+      `"text": "No prior treatment with immune checkpoint inhibitors",`
+      `"decision": "not_met",`
+      `"patient_evidence": [{ "doc_id": "clinic_2", "page": 3, "span_id": "s_45" }],`
+      `"trial_evidence": [{ "field": "eligibility_text.exclusion", "offset_start": 211, "offset_end": 280 }]`
+    `}`
+  `],`
+  `"gaps": [`
+    `{`
+      `"description": "Requires brain MRI within 28 days; last MRI is 45 days old",`
+      `"recommended_action": "Repeat brain MRI",`
+      `"clinical_importance": "high"`
+    `}`
+  `]`
+`}`
+This mirrors TrialGPT’s criterion‑level output (explanation \+ evidence locations \+ decision) but tuned to our multimodal extraction and PoC constraints.\[[nature](https://www.nature.com/articles/s41467-024-53081-z)\]
+---
+## **5\. Parlant Workflow Design (Aligned with Real Clinical Work)**
+We design a **single Parlant Journey** that approximates the real-world job of a trial coordinator/oncologist team, but in a patient‑centric context.[pmc.ncbi.nlm.nih+3](https://pmc.ncbi.nlm.nih.gov/articles/PMC6685132/)
+## **5.1 Journey States**
+**States:**
+1. `INGEST` (Document Collection)
+2. `PRESCREEN` (Patient-Level Trial Shortlist)
+3. `VALIDATE_TRIALS` (Trial-Level Eligibility Ledger)
+4. `GAP_FOLLOWUP` (Patient Data Completion Loop)
+5. `SUMMARY` (Shareable Packet & Next Steps)
+## **State 1 — INGEST**
+**Role in real world:** Patient (or referrer) provides records; coordinator checks if enough to do prescreen.[trialchoices+2](https://www.trialchoices.org/post/what-to-expect-during-the-clinical-trial-screening-process)
+**Inputs:**
+* Uploaded PDFs/images (synthetic in PoC).
+* Lightweight metadata (age, sex, location) from user form.
+**Actions:**
+* Parlant calls MedGemma with multimodal input (images \+ text) to generate `PatientProfile.v1`.
+* Parlant agent summarises back to the patient:
+  * What it understood (“You have stage IV NSCLC, ECOG 1, EGFR unknown”).
+  * What it is missing (“I did not find EGFR mutation status or recent brain MRI”).
+**Transitions:**
+* If **minimal prescreen dataset is present** (diagnosis \+ stage \+ ECOG \+ rough labs): → `PRESCREEN`.
+* Else: stays in `INGEST` but triggers `GAP_FOLLOWUP`‑style prompts (“Can you upload a pathology report or discharge summary?”).
+## **State 2 — PRESCREEN**
+**Role in real world:** Pre‑filter to “worth reviewing” trials based on limited data.[pmc.ncbi.nlm.nih+1](https://pmc.ncbi.nlm.nih.gov/articles/PMC11612666/)
+**Inputs:**
+* `PatientProfile.v1`.
+**Actions:**
+* Gemini converts `PatientProfile` → `SearchAnchors.v1`.
+* Parlant calls **existing ClinicalTrials MCP** with `SearchAnchors` mapping to MCP’s parameters:
+  * Condition keywords
+  * Recruitment status
+  * Phase filters
+  * Geography
+* Trials returned as `TrialCandidate` list.
+* Gemini reranks them using `fingerprint_text` \+ `PatientProfile` to produce a shortlist (e.g., top 20).
+* Parlant communicates to user:
+  * “Based on your profile, I found 23 potentially relevant NSCLC trials; I’ll now check each more carefully.”
+**Transitions:**
+* If **0 trials** → `GAP_FOLLOWUP` (relax criteria and/or widen geography).
+* If **\>0 trials** → `VALIDATE_TRIALS`.
+This maps to patient‑centric matching described in the applied literature: single patient → candidate trials, then deeper evaluation.[trec-cds+2](https://www.trec-cds.org/2021.html)
+## **State 3 — VALIDATE\_TRIALS**
+**Role in real world:** Detailed chart review vs full eligibility criteria.[pmc.ncbi.nlm.nih+1](https://pmc.ncbi.nlm.nih.gov/articles/PMC6685132/)
+**Inputs:**
+* Shortlisted `TrialCandidate` (e.g., top 10–20).
+**Actions:**
+For each trial in shortlist:
+1. Gemini slices inclusion/exclusion text into atomic criteria (each with an ID and text).
+2. For each criterion:
+   * Parlant calls **MedGemma** with:
+     * `PatientProfile` \+ selected patient evidence snippets (and where available, underlying images).
+     * Criterion text snippet.
+   * MedGemma outputs:
+     * `decision: met/not_met/unknown`.
+     * `patient_evidence` span references (doc/page/span\_id).
+3. Parlant aggregates per‑trial into `EligibilityLedger.v1`.
+**Outputs:**
+* A ranked list of trials with:
+  * Traffic‑light label (green/yellow/red) for overall eligibility (+ explanation).
+  * Criterion‑level breakdowns & evidence pointers.
+**Transitions:**
+* If **no trial has any green/yellow** (all clearly ineligible):
+  * `GAP_FOLLOWUP` to explore whether missing data (e.g., outdated labs) could change this.
+* Else:
+  * Offer `SUMMARY` while keeping `GAP_FOLLOWUP` open.
+## **State 4 — GAP\_FOLLOWUP**
+**Role in real world:** Additional tests/data to confirm eligibility (e.g., labs, imaging).[pfizerclinicaltrials+2](https://www.pfizerclinicaltrials.com/about/steps-to-join)
+**Inputs:**
+* `PatientProfile.unknowns` \+ `EligibilityLedger.gaps`.
+**Actions:**
+* Gemini synthesizes the **minimal actionable set** of missing data:
+  * E.g., “Most promising trials require: (1) current EGFR mutation status, (2) brain MRI \< 28 days old.”
+* Parlant:
+  * Poses this to the patient in simple language.
+  * For PoC, user (you, or script) uploads new synthetic documents representing those tests.
+* On new upload, we go back through `INGEST` → update `PatientProfile` → fast‑path direct to `PRESCREEN`/`VALIDATE_TRIALS`.
+**Transitions:**
+* On new docs → `INGEST` (update and re‑run).
+* If user declines or no additional data possible → `SUMMARY` with clear explanation (“Here’s why current trials don’t fit”).
+## **State 5 — SUMMARY**
+**Role in real world:** Coordinator/oncologist summarises findings, shares options, and discusses next steps.[pfizerclinicaltrials+2](https://www.pfizerclinicaltrials.com/about/steps-to-join)
+**Inputs:**
+* Final `PatientProfile`.
+* Set of `EligibilityLedger` objects for top trials.
+* List of `gaps`.
+**Actions:**
+* Generate:
+  * **Patient‑friendly summary**: 3–5 bullet explanation of matches.
+  * **Clinician packet**: aggregated ledger and evidence pointers, referencing doc IDs and trial NCT IDs.
+* For PoC: show in UI \+ downloadable JSON/Markdown.
+**Transitions:**
+* End of Journey.
+---
+## **6\. General Patient Plan (Synthetic Data Flow)**
+We simulate realistic but synthetic patients, and run them through exactly the above journey.
+## **6.1 Synthetic Patient Generation & Formats**
+**Source:**
+* TREC Clinical Trials Track 2021/2022 patient topics (free‑text vignettes) as the ground truth for “what the patient’s story should convey”.[trec-cds+3](https://www.trec-cds.org/2022.html)
+* Synthea or custom scripts to generate structured NSCLC trajectories consistent with those vignettes (for additional fields we want).
+**Artifacts per patient:**
+1. **Clinic letter PDF**
+   * Plain text \+ embedded logo; maybe 1–2 key tables (comorbidities, meds).
+2. **Biomarker/pathology PDF**
+   * EGFR/ALK/PD‑L1 etc, with small table or scanned‑like image.
+3. **Lab report PDF**
+   * Hematology and chemistry values, with dates.
+4. **Imaging report PDF** (+ optional illustrative image)
+   * Brain MRI/CT narrative with lesion description; maybe a low‑res “snapshot” image.
+Each artifact is saved with metadata mapping to the underlying TREC topic (so we can label what the “true” conditions/stage/biomarkers are).
+## **6.2 Patient Journey (Narrative)**
+For each synthetic patient “Anna”:
+1. **Pre‑visit (INGEST)**
+   * Anna (or a proxy) uploads her documents to the copilot.
+   * MedGemma extracts a `PatientProfile`.
+   * Parlant confirms: “You have stage IV NSCLC with ECOG 1 and prior pembrolizumab; I don’t see your EGFR mutation test yet.”
+2. **Prescreen (PRESCREEN)**
+   * Using `SearchAnchors`, trials are fetched via ClinicalTrials MCP.
+   * The system returns, e.g., 30 candidates; after reranking, top 10 are selected for validation.
+3. **Trial Validation (VALIDATE\_TRIALS)**
+   * For each of top 10, the eligibility ledger is computed.
+   * System identifies, say, 3 trials with many green criteria but a few unknowns (e.g., recent brain MRI).
+4. **Gap‑Driven Iteration (GAP\_FOLLOWUP)**
+   * Copilot: “You likely qualify for trial NCT01234567 if you have a brain MRI within the last 28 days. Your last MRI is 45 days ago. If your doctor orders a new MRI and the report shows no active brain metastases, you may qualify. For this PoC, you can upload a ‘new MRI report’ file to simulate this.”
+   * New synthetic PDF is uploaded; `PatientProfile` is updated.
+5. **Re‑match & Summary (PRESCREEN → VALIDATE\_TRIALS → SUMMARY)**
+   * System re‑runs with updated `PatientProfile`.
+   * Now 3 trials are “likely eligible”, with red flags on only non‑critical criteria.
+   * Copilot generates:
+     * Patient summary: “Here are three trials that look promising for your situation, and why.”
+     * Clinician packet: ledger \+ evidence pointers that mimic a coordinator’s notes.
+This general patient plan is consistent across synthetic cases but parameterized by each TREC topic (e.g. biomarker variant, comorbidity pattern).
+---
+## **7\. How This Plan Fixes Earlier Gaps**
+1. **No custom trial search stack**
+   * We explicitly plug into existing ClinicalTrials MCPs built for LLM agents, aligning with your “don’t reinvent the wheel” constraint and drastically lowering infra risk in 2 weeks.[github+2](https://github.com/cyanheads/clinicaltrialsgov-mcp-server)
+2. **Parlant used as a real workflow engine, not just a wrapper**
+   * States mirror prescreen vs validation vs gap‑closure described in empirical screening studies and trial‑matching frameworks.[appliedclinicaltrialsonline+3](https://www.appliedclinicaltrialsonline.com/view/clinical-trial-matching-solutions-understanding-the-landscape)
+   * Parlant becomes the place where you encode “when do we ask a human for more information vs when do we refine a query vs when do we stop?”
+3. **Patient plan grounded in real‑world processes**
+   * The synthetic patient journey isn’t just “upload docs → list trials.”
+   * It follows actual clinical workflows: minimal dataset, prescreen, chart review, additional tests, and finally discussion/summary.[trialchoices+3](https://www.trialchoices.org/post/what-to-expect-during-the-clinical-trial-screening-process)
+4. **Minimal, testable contracts**
+   * PatientProfile, SearchAnchors, TrialCandidate, EligibilityLedger together give you:
+     * Places to measure MedGemma extraction F1.
+     * Places to plug TREC qrels (TrialCandidate → NDCG@10).[arxiv+2](https://arxiv.org/pdf/2202.07858.pdf)
+   * They’re small enough to implement quickly but rich enough to survive PoC → MVP.
+Source: [https://www.perplexity.ai/search/simulate-as-an-experienced-cto-i6TIXOP9TX.rqA97awuc1Q?sm=d\#3](https://www.perplexity.ai/search/simulate-as-an-experienced-cto-i6TIXOP9TX.rqA97awuc1Q?sm=d#3)

docs/Trialpath PRD.md ADDED Viewed

	@@ -0,0 +1,246 @@

+# HAI-DEF Pitch: MedGemma Match – Patient Trial Copilot
+**PoC Goal:** Demonstrate MedGemma + Gemini 3 Pro + Parlant agentic architecture for patient-facing clinical trial matching with **explainable eligibility reasoning** and **iterative gap-filling**.
+---
+## 1. Problem & Unmet Need
+### The Challenge
+- **Low trial participation:** <5% of adult cancer patients enroll in clinical trials despite potential eligibility
+- **Complex eligibility criteria:** Free-text criteria mix demographics, biomarkers, labs, imaging findings, and treatment history
+- **Patient barrier:** Patients receive PDFs/reports but have no way to understand which trials fit their situation
+- **Manual screening burden:** Clinicians spend hours per patient manually reviewing eligibility; automated tools show mixed real-world performance
+### Why AI? Why Now?
+- Eligibility criteria require synthesis across multiple document types (pathology, labs, imaging, treatment history)—impossible with keyword search alone
+- Recent LLM-based matching systems (TrialGPT, PRISM) show promise but lack patient-centric design and multimodal medical understanding
+- HAI-DEF open-weight health models enable privacy-preserving deployment with medical domain expertise
+---
+## 2. Solution: MedGemma as Clinical Understanding Engine
+### Core Concept
+**"Agentic Search + Multimodal Extraction"** replacing traditional vector-RAG approaches.
+**Architecture:**
+- **MedGemma (HAI-DEF):** Extracts structured clinical facts from messy PDFs/reports + understands medical imaging contexts
+- **Gemini 3 Pro:** Orchestrates agentic search through ClinicalTrials.gov API with iterative query refinement
+- **Parlant:** Enforces state machine (search → filter → verify) and prevents parameter hallucination
+- **ClinicalTrials MCP:** Structured API wrapper for trials data (no vector DB needed)
+### Why MedGemma is Central (Not Replaceable)
+1. **Multimodal medical reasoning:** Designed for radiology reports, pathology, labs—where generic LLMs are weaker
+2. **Domain-aligned extraction:** Medical entity recognition with units, dates, and clinical context preservation
+3. **Open weights:** Enables VPC deployment for future PHI handling (vs closed-weight alternatives)
+4. **Health-safety guardrails:** Model card emphasizes validation/adaptation patterns we follow
+---
+## 3. User Journey (Patient-Centric)
+### Target User (PoC Persona)
+**"Anna"** – 52-year-old NSCLC patient in Berlin with PDFs from her oncologist but no trial navigation support.
+### Journey Flow
+1. **Upload Documents** → Clinic letter, pathology report, lab results (synthetic PDFs in PoC)
+2. **MedGemma Extraction** → System builds "My Clinical Profile (draft)": Stage IVa, EGFR status unknown, ECOG 1
+3. **Agentic Search** → Gemini queries ClinicalTrials.gov via MCP:
+   - Initial: `condition=NSCLC, location=DE, status=RECRUITING, keywords=EGFR` → 47 results
+   - Refines: Adds `phase=PHASE3` → 12 results
+   - Reads summaries, filters to 5 relevant trials
+4. **Eligibility Analysis** → For each trial, MedGemma evaluates criteria against extracted facts
+5. **Gap Identification** → System highlights: *"You'd likely qualify IF you had EGFR mutation test"*
+6. **Iteration** → Anna uploads biomarker report → System re-matches → 3 new trials appear
+7. **Share with Doctor** → Generate clinician packet with evidence-linked eligibility ledger
+### Key Differentiator: The "Gap Analysis"
+- We don't just say "No Match"
+- We say: **"You would match NCT12345 IF you had: recent brain MRI showing no active CNS disease"**
+- This transforms "rejection" into "actionable next steps"
+---
+## 4. Technical Innovation: Smart Agentic Search (No Vector DB)
+### Traditional Approach (What We're *Not* Doing)
+```
+Patient text → Embeddings → Vector similarity search →
+Retrieve top-K trials → LLM re-ranks
+```
+**Problem:** Vector search is "dumb" about structured constraints (Phase, Location, Status) and negations.
+### Our Approach: Iterative Query Refinement
+```
+MedGemma extracts "Search Anchors" (Condition, Biomarkers, Location) →
+Gemini formulates API query with filters →
+ClinicalTrials MCP returns results →
+Too many (>50)? → Parlant enforces refinement (add phase/keywords)
+Too few (0)? → Parlant enforces relaxation (remove city filter)
+Right size (10-30)? → Gemini reads summaries in 2M context window →
+Shortlist 5 NCT IDs → Deep eligibility verification with MedGemma
+```
+**Why This is Better:**
+- **Precision:** Leverages native API filters (Phase, Status, Location) that vectors can't handle
+- **Transparency:** Every search step is logged and explainable ("I searched X, got Y results, refined to Z")
+- **Feasibility:** No vector DB infrastructure; uses live API
+- **Showcases Gemini reasoning:** Demonstrates multi-step planning vs one-shot retrieval
+---
+## 5. MedGemma Showcase Moments (HAI-DEF "Fullest Potential")
+### Use Case 1: Temporal Lab Extraction
+**Challenge:** Criterion requires "ANC ≥ 1.5 �� 10⁹/L within 14 days of enrollment"
+- **MedGemma extracts:** Value=1.8, Units=10⁹/L, Date=2026-01-28, DocID=labs_jan.pdf
+- **System verifies:** Current date Feb 4 → 7 days ago → ✓ MEETS criterion
+- **Evidence link:** User can click to see exact lab table and date
+### Use Case 2: Multimodal Imaging Context
+**Challenge:** Criterion requires "No active CNS metastases"
+- **MedGemma reads:** Brain MRI report text: *"Stable 3mm left frontal lesion, no enhancement, likely scarring from prior SRS"*
+- **System interprets:** "Stable" + "no enhancement" + "scarring" → Likely inactive → Flags as ⚠️ UNKNOWN (requires clinician confirmation)
+- **Evidence link:** Highlights report section for doctor review
+### Use Case 3: Treatment Line Reconstruction
+**Challenge:** Criterion excludes "Prior immune checkpoint inhibitor therapy"
+- **MedGemma reconstructs:** From medication list and notes → Patient received Pembrolizumab 2024-06 to 2024-11
+- **System verifies:** → ✗ EXCLUDED
+- **Evidence link:** Shows medication timeline with dates and sources
+---
+## 6. PoC Scope & Data Strategy
+### In Scope (3-Month PoC)
+- **Disease:** NSCLC only (complex biomarkers, high trial volume)
+- **Data:** Synthetic patients only (no real PHI)
+- **Deliverables:**
+  - Working web prototype (video demo)
+  - Experimental validation on TREC benchmarks
+  - Technical write-up + public code repo
+### Data Sources
+**Patients (Synthetic):**
+- Structured ground truth: Synthea FHIR (500 NSCLC patients)
+- Unstructured artifacts: LLM-generated clinic letters + lab PDFs with controlled noise (abbreviations, OCR errors, missing values)
+**Trials (Real):**
+- ClinicalTrials.gov live API via MCP wrapper
+- Focus on NSCLC recruiting trials in Europe + US
+**Benchmarking:**
+- TREC Clinical Trials Track 2021/2022 (75 patient topics + judged relevance)
+- Custom criterion-extraction test set (labeled synthetic reports)
+---
+## 7. Success Metrics & Evaluation Plan
+### Model Performance
+| Metric | Target | Baseline | Method |
+|--------|--------|----------|--------|
+| **MedGemma Extraction F1** | ≥0.85 | Gemini-only: 0.65-0.75 | Field-level (stage, ECOG, biomarkers, labs) on labeled synthetic reports |
+| **Trial Retrieval Recall@50** | ≥0.75 | BM25: ~0.60 | TREC 2021 patient topics |
+| **Trial Ranking NDCG@10** | ≥0.60 | Non-LLM baseline: ~0.45 | TREC judged relevance |
+| **Criterion Decision Accuracy** | ≥0.85 | Rule-based: ~0.70 | Per-criterion classification on synthetic patient-trial pairs |
+### Product Quality
+- **Latency:** <15s from upload to first match results
+- **Explainability:** 100% of "met/not met" decisions must include evidence pointer (trial text + patient doc ID)
+- **Cost:** <$0.50 per patient session (token + GPU usage)
+### UX Validation (Small Study)
+- Task completion: Can lay users identify ≥1 plausible trial from shortlist?
+- Explanation clarity: SUS-style usability score ≥70
+- Reading level: B1/8th-grade equivalent (Flesch-Kincaid)
+---
+## 8. Impact Potential
+### If PoC Succeeds (Quantified)
+**Near-term (PoC phase):**
+- Demonstrate 15-25% relative improvement in ranking quality (NDCG) vs non-LLM baselines on TREC benchmarks
+- Show multimodal extraction advantage: MedGemma F1 ≥0.10 higher than Gemini-only on medical fields
+**Post-PoC (Real-world projection):**
+- **Patient impact:** Based on literature showing automated tools can surface 20-30% more eligible trials vs manual search, and considering NSCLC patients often face 50+ active trials but only learn about 2-3 from their oncologist
+- **Clinician impact:** Trial coordinators report spending 2-4 hours per patient on manual screening; if our tool pre-screens with 85% sensitivity, reduces manual verification by ~60%
+- **Trial enrollment:** Even a 10% increase in eligible patient identification could improve trial recruitment timelines (major pharma pain point)
+---
+## 9. Risks & Mitigations
+| Risk | Mitigation |
+|------|-----------|
+| **Synthetic data too clean** | Add controlled noise to PDFs (OCR errors, abbreviations); validate against TREC which uses realistic synthetic cases |
+| **MedGemma hallucination on edge cases** | Implement evidence-pointer system (every decision must cite doc ID + span); flag low-confidence as "unknown" not "met" |
+| **API rate limits** | Cache trial protocols; batch requests during search refinement |
+| **Regulatory misunderstanding** | Explicit "information only, not medical advice" framing throughout UI; follow MedGemma model card guidance on validation/adaptation |
+---
+## 10. Deliverables for HAI-DEF Submission
+### Video Demo (~5-7 min)
+- Patient persona introduction
+- Upload → extraction visualization (showing MedGemma in action)
+- Agentic search loop (showing query refinement)
+- Match results with traffic-light eligibility cards
+- Gap-filling iteration (upload biomarker → new matches)
+- "Share with doctor" packet generation
+### Technical Write-up
+1. Problem + why HAI-DEF models
+2. Architecture diagram (Parlant journey + MedGemma + Gemini + MCP)
+3. Data generation pipeline
+4. Experiments: extraction, retrieval, ranking (tables + ablations)
+5. Limitations + path to real PHI deployment
+### Code Repository
+- `data/generate_synthetic_patients.py`
+- `data/generate_noisy_pdfs.py`
+- `matching/medgemma_extractor.py`
+- `matching/agentic_search.py` (Parlant + Gemini + MCP)
+- `evaluation/run_trec_benchmark.py`
+- Clear README with one-command reproducibility
+---
+## 11. Why This Wins HAI-DEF
+### Effective Use of Models (20%)
+✓ MedGemma as primary clinical understanding engine (extraction + multimodal)
+✓ Concrete demos showing where non-HAI-DEF models fail (extraction accuracy gaps)
+✓ Plan for task-specific evaluation showing measurable improvement
+### Problem Domain (15%)
+✓ Clear unmet need (low trial enrollment, manual screening burden)
+✓ Patient-centric storytelling ("Anna's journey")
+✓ Evidence-based magnitude (enrollment stats, screening time data)
+### Impact Potential (15%)
+✓ Quantified near-term (benchmark improvements) and long-term (enrollment lift) impact
+✓ Clear calculation logic grounded in literature
+### Product Feasibility (20%)
+✓ Detailed technical architecture (agentic search innovation)
+✓ Realistic synthetic data strategy
+✓ Concrete evaluation plan with baselines
+✓ Deployment considerations (latency, cost, safety)
+### Execution & Communication (30%)
+✓ Cohesive narrative across video + write-up + code
+✓ Reproducible experiments
+✓ Clear explanation of design choices
+✓ Professional polish (evidence pointers, explanations, UX details)
+---
+**Timeline:** 3 months to PoC demo ready for HAI-DEF submission.
+**Team needs:** 1 ML engineer (MedGemma fine-tuning + evaluation), 1 full-stack engineer (web app + Parlant orchestration), 1 CPO (coordination + submission materials).

docs/tdd-guide-backend-service.md ADDED Viewed

The diff for this file is too large to render. See raw diff

docs/tdd-guide-data-evaluation.md ADDED Viewed

	@@ -0,0 +1,2384 @@

+# TrialPath 数据与评估管线 TDD 实现指南
+> 基于 DeepWiki、TREC 官方文档、ir-measures/ir_datasets 库深度研究产出
+---
+## 1. 管线架构概览
+### 1.1 数据流图
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    Data & Evaluation Pipeline                    │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                  │
+│  ┌──────────────┐    ┌──────────────┐    ┌──────────────────┐   │
+│  │   Synthea     │───▶│  FHIR Bundle │───▶│ PatientProfile   │   │
+│  │  (Java CLI)   │    │   (JSON)     │    │  (JSON Schema)   │   │
+│  └──────────────┘    └──────────────┘    └────────┬─────────┘   │
+│                                                    │              │
+│  ┌──────────────┐    ┌──────────────┐              ▼              │
+│  │  LLM Letter  │───▶│  ReportLab   │───▶ Noisy Clinical PDFs   │
+│  │  Generator   │    │  + Augraphy  │    (Letters/Labs/Path)     │
+│  └──────────────┘    └──────────────┘                            │
+│                                                                  │
+│  ┌──────────────┐    ┌──────────────┐    ┌──────────────────┐   │
+│  │  MedGemma    │───▶│  Extracted   │───▶│   F1 Evaluator   │   │
+│  │  Extractor   │    │  Profile     │    │  (scikit-learn)  │   │
+│  └──────────────┘    └──────────────┘    └──────────────────┘   │
+│                                                                  │
+│  ┌──────────────┐    ┌──────────────┐    ┌──────────────────┐   │
+│  │  TREC Topics │───▶│  TrialPath   │───▶│  TREC Evaluator  │   │
+│  │  (ir_datasets)│    │  Matching    │    │  (ir-measures)   │   │
+│  └──────────────┘    └──────────────┘    └──────────────────┘   │
+│                                                                  │
+└─────────────────────────────────────────────────────────────────┘
+```
+### 1.2 模块关系
+| 模块 | 输入 | 输出 | 依赖 |
+|------|------|------|------|
+| `data/generate_synthetic_patients.py` | Synthea FHIR Bundles | `PatientProfile` JSON + Ground Truth | Synthea CLI, FHIR R4 |
+| `data/generate_noisy_pdfs.py` | `PatientProfile` JSON | Clinical PDFs (带噪声) | ReportLab, Augraphy |
+| `evaluation/run_trec_benchmark.py` | TREC Topics + TrialPath Run | Recall@50, NDCG@10, P@10 | ir_datasets, ir-measures |
+| `evaluation/extraction_eval.py` | Extracted vs Ground Truth Profiles | Field-level F1 | scikit-learn |
+| `evaluation/criterion_eval.py` | EligibilityLedger vs Gold Standard | Criterion Accuracy | scikit-learn |
+| `evaluation/latency_cost_tracker.py` | API call logs | Latency/Cost reports | time, logging |
+### 1.3 目录结构
+```
+data/
+├── generate_synthetic_patients.py   # Synthea FHIR → PatientProfile
+├── generate_noisy_pdfs.py           # PatientProfile → Clinical PDFs
+├── synthea_config/
+│   ├── synthea.properties           # Synthea 配置
+│   └── modules/
+│       └── lung_cancer_extended.json # 扩展 NSCLC 模块 (含 biomarkers)
+├── templates/
+│   ├── clinical_letter.py           # 临床信件模板
+│   ├── pathology_report.py          # 病理报告模板
+│   ├── lab_report.py                # 实验室报告模板
+│   └── imaging_report.py           # 影像报告模板
+├── noise/
+│   └── noise_injector.py            # 噪声注入引擎
+└── output/
+    ├── fhir/                        # Synthea 原始 FHIR 输出
+    ├── profiles/                    # 转换后的 PatientProfile JSON
+    ├── pdfs/                        # 生成的临床 PDF
+    └── ground_truth/                # 标注数据
+evaluation/
+├── run_trec_benchmark.py            # TREC 检索评估
+├── extraction_eval.py               # MedGemma 提取 F1
+├── criterion_eval.py                # Criterion Decision Accuracy
+├── latency_cost_tracker.py          # 延迟与成本追踪
+├── trec_data/
+│   ├── topics2021.xml               # TREC 2021 topics
+│   ├── qrels2021.txt                # TREC 2021 relevance judgments
+│   └── topics2022.xml               # TREC 2022 topics
+└── reports/                         # 评估报告输出
+tests/
+├── test_synthea_data.py             # Synthea 数据验证
+├── test_pdf_generation.py           # PDF 生成正确性
+├── test_noise_injection.py          # 噪声注入效果
+├── test_trec_evaluation.py          # TREC 评估计算
+├── test_extraction_f1.py            # F1 计算测试
+├── test_latency_cost.py             # 延迟成本测试
+└── test_e2e_pipeline.py             # 端到端管线测试
+```
+---
+## 2. Synthea 合成患者生成指南
+### 2.1 Synthea 概述
+Synthea 是 MITRE 开发的开源合成患者模拟器，基于 Java 实现。它通过 JSON 状态机模块模拟疾病轨迹，输出标准 FHIR R4 Bundle。
+**关键特性（来源：DeepWiki synthetichealth/synthea）：**
+- 基于模块的疾病模拟：每种疾病定义为 JSON 状态机
+- 支持 FHIR R4/STU3/DSTU2 导出
+- 内置 `lung_cancer.json` 模块，85% NSCLC / 15% SCLC 分布
+- 支持 Stage I-IV 分期和化疗/放疗治疗路径
+- **不含 NSCLC 特异性 biomarkers（EGFR, ALK, PD-L1, KRAS, ROS1）—— 需要自定义扩展**
+### 2.2 安装和配置
+**系统要求：**
+- Java JDK 11 或更高版本（推荐 LTS 11 或 17）
+**安装方式 A：直接使用 JAR（推荐用于数据生成）**
+```bash
+# 下载最新 release JAR
+# 从 https://github.com/synthetichealth/synthea/releases 获取
+wget https://github.com/synthetichealth/synthea/releases/download/master-branch-latest/synthea-with-dependencies.jar
+# 验证安装
+java -jar synthea-with-dependencies.jar --help
+```
+**安装方式 B：从源码构建（需要自定义模块时使用）**
+```bash
+git clone https://github.com/synthetichealth/synthea.git
+cd synthea
+./gradlew build check test
+```
+### 2.3 NSCLC 模块配置
+#### 2.3.1 现有 lung_cancer 模块分析
+来源：DeepWiki 对 `synthetichealth/synthea` 的 `lung_cancer.json` 模块分析：
+- **入口条件**：45-65 岁人群，基于概率计算
+- **诊断流程**：症状（咳嗽、咯血、气短） → 胸部 X 光 → 胸部 CT → 活检/细胞学
+- **分型**：85% NSCLC，15% SCLC
+- **分期**：Stage I-IV，基于 `lung_cancer_nondiagnosis_counter`
+- **治疗**：NSCLC 使用 Cisplatin + Paclitaxel → 放疗
+#### 2.3.2 自定义 NSCLC Biomarker 扩展模块
+由于原生模块不含 EGFR/ALK/PD-L1 等 biomarkers，需要创建扩展子模块。
+**文件：`data/synthea_config/modules/lung_cancer_biomarkers.json`**
+基于 DeepWiki 研究的 Synthea 模块状态类型，可用的状态类型包括：
+- `Initial` — 模块入口
+- `Terminal` — 模块出口
+- `Observation` — 记录临床观察值（用于 biomarkers）
+- `SetAttribute` — 设置患者属性
+- `Guard` — 条件门控
+- `Simple` — 简单转换状态
+- `Encounter` — 就诊状态
+Biomarker 观察状态示例结构：
+```json
+{
+  "name": "NSCLC Biomarker Panel",
+  "states": {
+    "Initial": {
+      "type": "Initial",
+      "conditional_transition": [
+        {
+          "condition": {
+            "condition_type": "Attribute",
+            "attribute": "Lung Cancer Type",
+            "operator": "==",
+            "value": "NSCLC"
+          },
+          "transition": "EGFR_Test_Encounter"
+        },
+        {
+          "transition": "Terminal"
+        }
+      ]
+    },
+    "EGFR_Test_Encounter": {
+      "type": "Encounter",
+      "encounter_class": "ambulatory",
+      "codes": [
+        {
+          "system": "SNOMED-CT",
+          "code": "185349003",
+          "display": "Encounter for check up"
+        }
+      ],
+      "direct_transition": "EGFR_Mutation_Status"
+    },
+    "EGFR_Mutation_Status": {
+      "type": "Observation",
+      "category": "laboratory",
+      "codes": [
+        {
+          "system": "LOINC",
+          "code": "41103-3",
+          "display": "EGFR gene mutations found"
+        }
+      ],
+      "distributed_transition": [
+        {
+          "distribution": 0.15,
+          "transition": "EGFR_Positive"
+        },
+        {
+          "distribution": 0.85,
+          "transition": "EGFR_Negative"
+        }
+      ]
+    },
+    "EGFR_Positive": {
+      "type": "SetAttribute",
+      "attribute": "egfr_status",
+      "value": "positive",
+      "direct_transition": "ALK_Rearrangement_Status"
+    },
+    "EGFR_Negative": {
+      "type": "SetAttribute",
+      "attribute": "egfr_status",
+      "value": "negative",
+      "direct_transition": "ALK_Rearrangement_Status"
+    },
+    "ALK_Rearrangement_Status": {
+      "type": "Observation",
+      "category": "laboratory",
+      "codes": [
+        {
+          "system": "LOINC",
+          "code": "46264-8",
+          "display": "ALK gene rearrangement"
+        }
+      ],
+      "distributed_transition": [
+        {
+          "distribution": 0.05,
+          "transition": "ALK_Positive"
+        },
+        {
+          "distribution": 0.95,
+          "transition": "ALK_Negative"
+        }
+      ]
+    },
+    "ALK_Positive": {
+      "type": "SetAttribute",
+      "attribute": "alk_status",
+      "value": "positive",
+      "direct_transition": "PDL1_Expression"
+    },
+    "ALK_Negative": {
+      "type": "SetAttribute",
+      "attribute": "alk_status",
+      "value": "negative",
+      "direct_transition": "PDL1_Expression"
+    },
+    "PDL1_Expression": {
+      "type": "Observation",
+      "category": "laboratory",
+      "codes": [
+        {
+          "system": "LOINC",
+          "code": "85147-0",
+          "display": "PD-L1 by immune stain"
+        }
+      ],
+      "distributed_transition": [
+        {
+          "distribution": 0.30,
+          "transition": "PDL1_High"
+        },
+        {
+          "distribution": 0.35,
+          "transition": "PDL1_Low"
+        },
+        {
+          "distribution": 0.35,
+          "transition": "PDL1_Negative"
+        }
+      ]
+    },
+    "PDL1_High": {
+      "type": "SetAttribute",
+      "attribute": "pdl1_tps",
+      "value": ">=50%",
+      "direct_transition": "KRAS_Mutation_Status"
+    },
+    "PDL1_Low": {
+      "type": "SetAttribute",
+      "attribute": "pdl1_tps",
+      "value": "1-49%",
+      "direct_transition": "KRAS_Mutation_Status"
+    },
+    "PDL1_Negative": {
+      "type": "SetAttribute",
+      "attribute": "pdl1_tps",
+      "value": "<1%",
+      "direct_transition": "KRAS_Mutation_Status"
+    },
+    "KRAS_Mutation_Status": {
+      "type": "Observation",
+      "category": "laboratory",
+      "codes": [
+        {
+          "system": "LOINC",
+          "code": "21717-3",
+          "display": "KRAS gene mutations found"
+        }
+      ],
+      "distributed_transition": [
+        {
+          "distribution": 0.25,
+          "transition": "KRAS_Positive"
+        },
+        {
+          "distribution": 0.75,
+          "transition": "KRAS_Negative"
+        }
+      ]
+    },
+    "KRAS_Positive": {
+      "type": "SetAttribute",
+      "attribute": "kras_status",
+      "value": "positive",
+      "direct_transition": "Terminal"
+    },
+    "KRAS_Negative": {
+      "type": "SetAttribute",
+      "attribute": "kras_status",
+      "value": "negative",
+      "direct_transition": "Terminal"
+    },
+    "Terminal": {
+      "type": "Terminal"
+    }
+  }
+}
+```
+**Biomarker 流行率分布（基于 NSCLC 文献）：**
+| Biomarker | 阳性率 | LOINC Code | 说明 |
+|-----------|--------|------------|------|
+| EGFR mutation | ~15% | 41103-3 | 非吸烟亚裔女性更高 |
+| ALK rearrangement | ~5% | 46264-8 | 年轻非吸烟者更常见 |
+| PD-L1 TPS>=50% | ~30% | 85147-0 | 免疫治疗适用标准 |
+| KRAS G12C | ~13% | 21717-3 | Sotorasib 靶向 |
+| ROS1 fusion | ~1-2% | 46265-5 | Crizotinib 靶向 |
+### 2.4 批量生成命令
+```bash
+# 生成 500 个 NSCLC 患者，使用种子确保可重现
+java -jar synthea-with-dependencies.jar \
+  -p 500 \
+  -s 42 \
+  -m lung_cancer \
+  --exporter.fhir.export=true \
+  --exporter.fhir_stu3.export=false \
+  --exporter.fhir_dstu2.export=false \
+  --exporter.ccda.export=false \
+  --exporter.csv.export=false \
+  --exporter.hospital.fhir.export=false \
+  --exporter.practitioner.fhir.export=false \
+  --exporter.pretty_print=true \
+  Massachusetts
+# 参数说明:
+# -p 500       : 生成 500 个患者
+# -s 42        : 随机种子 (可重现)
+# -m lung_cancer : 仅运行 lung_cancer 模块
+# --exporter.fhir.export=true : 启用 FHIR R4 导出
+# Massachusetts : 生成地区
+```
+**输出位置：** `./output/fhir/` 目录下，每个患者一个 JSON 文件。
+### 2.5 FHIR Bundle 输出格式
+来源：DeepWiki `synthetichealth/synthea` 关于 FHIR 导出系统的分析。
+**顶层结构：**
+```json
+{
+  "resourceType": "Bundle",
+  "type": "transaction",
+  "entry": [
+    {
+      "fullUrl": "urn:uuid:patient-uuid-here",
+      "resource": { "resourceType": "Patient", ... },
+      "request": { "method": "POST", "url": "Patient" }
+    },
+    {
+      "fullUrl": "urn:uuid:condition-uuid-here",
+      "resource": { "resourceType": "Condition", ... },
+      "request": { "method": "POST", "url": "Condition" }
+    }
+  ]
+}
+```
+**Synthea 生成的 FHIR Resource 类型（DeepWiki 确认）：**
+- `Patient` — 患者基本信息
+- `Condition` — 诊断（如 NSCLC）
+- `Observation` — 实验室检查和生命体征
+- `MedicationRequest` — 用药处方
+- `Procedure` — 手术和操作
+- `DiagnosticReport` — 诊断报告
+- `DocumentReference` — 临床文档（需 US Core IG 启用）
+- `Encounter` — 就诊记录
+- `AllergyIntolerance` — 过敏史
+- `Immunization` — 免疫接种
+- `CarePlan` — 护理计划
+- `ImagingStudy` — 影像检查
+### 2.6 FHIR Resource 到 PatientProfile 的映射
+```python
+# data/generate_synthetic_patients.py 中的映射逻辑
+FHIR_TO_PATIENT_PROFILE_MAP = {
+    # Patient Resource → demographics
+    "Patient.name": "demographics.name",
+    "Patient.gender": "demographics.sex",
+    "Patient.birthDate": "demographics.date_of_birth",
+    "Patient.address.state": "demographics.state",
+    # Condition Resource → diagnosis
+    "Condition[code=SNOMED:254637007]": "diagnosis.primary",  # NSCLC
+    "Condition.stage.summary": "diagnosis.stage",
+    "Condition.bodySite": "diagnosis.histology",
+    # Observation Resources → biomarkers
+    "Observation[code=LOINC:41103-3]": "biomarkers.egfr",
+    "Observation[code=LOINC:46264-8]": "biomarkers.alk",
+    "Observation[code=LOINC:85147-0]": "biomarkers.pdl1_tps",
+    "Observation[code=LOINC:21717-3]": "biomarkers.kras",
+    # Observation Resources → labs
+    "Observation[category=laboratory]": "labs[]",
+    # MedicationRequest → prior_treatments
+    "MedicationRequest.medicationCodeableConcept": "treatments[].medication",
+    # Procedure → prior_treatments
+    "Procedure.code": "treatments[].procedure",
+}
+```
+**转换函数模式：**
+```python
+import json
+from pathlib import Path
+from dataclasses import dataclass, field, asdict
+from typing import Optional
+@dataclass
+class Demographics:
+    name: str = ""
+    sex: str = ""
+    date_of_birth: str = ""
+    age: int = 0
+    state: str = ""
+@dataclass
+class Diagnosis:
+    primary: str = ""
+    stage: str = ""
+    histology: str = ""
+    diagnosis_date: str = ""
+@dataclass
+class Biomarkers:
+    egfr: Optional[str] = None
+    alk: Optional[str] = None
+    pdl1_tps: Optional[str] = None
+    kras: Optional[str] = None
+    ros1: Optional[str] = None
+@dataclass
+class LabResult:
+    name: str = ""
+    value: float = 0.0
+    unit: str = ""
+    date: str = ""
+    loinc_code: str = ""
+@dataclass
+class Treatment:
+    name: str = ""
+    type: str = ""  # "medication" | "procedure" | "radiation"
+    start_date: str = ""
+    end_date: Optional[str] = None
+@dataclass
+class PatientProfile:
+    patient_id: str = ""
+    demographics: Demographics = field(default_factory=Demographics)
+    diagnosis: Diagnosis = field(default_factory=Diagnosis)
+    biomarkers: Biomarkers = field(default_factory=Biomarkers)
+    labs: list[LabResult] = field(default_factory=list)
+    treatments: list[Treatment] = field(default_factory=list)
+    unknowns: list[str] = field(default_factory=list)
+    evidence_spans: list[dict] = field(default_factory=list)
+def parse_fhir_bundle(fhir_path: Path) -> PatientProfile:
+    """Parse a Synthea FHIR Bundle JSON into PatientProfile."""
+    with open(fhir_path) as f:
+        bundle = json.load(f)
+    profile = PatientProfile()
+    entries = bundle.get("entry", [])
+    for entry in entries:
+        resource = entry.get("resource", {})
+        resource_type = resource.get("resourceType")
+        if resource_type == "Patient":
+            _parse_patient(resource, profile)
+        elif resource_type == "Condition":
+            _parse_condition(resource, profile)
+        elif resource_type == "Observation":
+            _parse_observation(resource, profile)
+        elif resource_type == "MedicationRequest":
+            _parse_medication(resource, profile)
+        elif resource_type == "Procedure":
+            _parse_procedure(resource, profile)
+    return profile
+def _parse_patient(resource: dict, profile: PatientProfile):
+    """Extract demographics from Patient resource."""
+    names = resource.get("name", [{}])
+    if names:
+        given = " ".join(names[0].get("given", []))
+        family = names[0].get("family", "")
+        profile.demographics.name = f"{given} {family}".strip()
+    profile.demographics.sex = resource.get("gender", "")
+    profile.demographics.date_of_birth = resource.get("birthDate", "")
+    profile.patient_id = resource.get("id", "")
+    addresses = resource.get("address", [{}])
+    if addresses:
+        profile.demographics.state = addresses[0].get("state", "")
+def _parse_condition(resource: dict, profile: PatientProfile):
+    """Extract diagnosis from Condition resource."""
+    code = resource.get("code", {})
+    codings = code.get("coding", [])
+    for coding in codings:
+        # SNOMED codes for lung cancer
+        if coding.get("code") in ["254637007", "254632001"]:
+            profile.diagnosis.primary = coding.get("display", "")
+            onset = resource.get("onsetDateTime", "")
+            profile.diagnosis.diagnosis_date = onset
+            # Extract stage if available
+            stage_info = resource.get("stage", [])
+            if stage_info:
+                summary = stage_info[0].get("summary", {})
+                stage_codings = summary.get("coding", [])
+                if stage_codings:
+                    profile.diagnosis.stage = stage_codings[0].get("display", "")
+def _parse_observation(resource: dict, profile: PatientProfile):
+    """Extract labs and biomarkers from Observation resource."""
+    code = resource.get("code", {})
+    codings = code.get("coding", [])
+    category_list = resource.get("category", [])
+    is_lab = any(
+        cat_coding.get("code") == "laboratory"
+        for cat in category_list
+        for cat_coding in cat.get("coding", [])
+    )
+    for coding in codings:
+        loinc = coding.get("code", "")
+        display = coding.get("display", "")
+        # Biomarker mappings
+        biomarker_map = {
+            "41103-3": "egfr",
+            "46264-8": "alk",
+            "85147-0": "pdl1_tps",
+            "21717-3": "kras",
+            "46265-5": "ros1",
+        }
+        if loinc in biomarker_map:
+            value_cc = resource.get("valueCodeableConcept", {})
+            value_codings = value_cc.get("coding", [])
+            value_str = value_codings[0].get("display", "") if value_codings else ""
+            setattr(profile.biomarkers, biomarker_map[loinc], value_str)
+        elif is_lab:
+            value_qty = resource.get("valueQuantity", {})
+            lab = LabResult(
+                name=display,
+                value=value_qty.get("value", 0.0),
+                unit=value_qty.get("unit", ""),
+                date=resource.get("effectiveDateTime", ""),
+                loinc_code=loinc,
+            )
+            profile.labs.append(lab)
+```
+---
+## 3. 合成 PDF 生成管线
+### 3.1 概述
+目标：将 `PatientProfile` 转换为逼真的临床文档 PDF，并注入受控噪声以模拟真实世界 OCR 场景。
+**技术栈：**
+- **ReportLab** (`pip install reportlab`) — PDF 生成引擎，支持 `SimpleDocTemplate`、`Table`、`Paragraph` 等 Platypus 流式组件
+- **Augraphy** (`pip install augraphy`) — 文档图像退化管线，模拟打印、传真、扫描噪声
+- **Pillow** (`pip install Pillow`) — 图像处理
+- **pdf2image** (`pip install pdf2image`) — PDF 转图像（用于噪声注入后转回 PDF）
+### 3.2 临床信件模板
+```python
+# data/templates/clinical_letter.py
+from reportlab.lib.pagesizes import letter
+from reportlab.lib.units import inch
+from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
+from reportlab.platypus import (
+    SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle
+)
+from reportlab.lib import colors
+def generate_clinical_letter(profile: dict, output_path: str):
+    """Generate a clinical letter PDF from PatientProfile."""
+    doc = SimpleDocTemplate(output_path, pagesize=letter,
+                            topMargin=1*inch, bottomMargin=1*inch)
+    styles = getSampleStyleSheet()
+    story = []
+    # Header
+    header_style = ParagraphStyle(
+        'Header', parent=styles['Heading1'], fontSize=14,
+        spaceAfter=6
+    )
+    story.append(Paragraph("Clinical Summary Letter", header_style))
+    story.append(Spacer(1, 12))
+    # Patient Info
+    info_data = [
+        ["Patient Name:", profile["demographics"]["name"]],
+        ["Date of Birth:", profile["demographics"]["date_of_birth"]],
+        ["Sex:", profile["demographics"]["sex"]],
+        ["MRN:", profile["patient_id"]],
+    ]
+    info_table = Table(info_data, colWidths=[2*inch, 4*inch])
+    info_table.setStyle(TableStyle([
+        ('FONTNAME', (0, 0), (0, -1), 'Helvetica-Bold'),
+        ('FONTNAME', (1, 0), (1, -1), 'Helvetica'),
+        ('FONTSIZE', (0, 0), (-1, -1), 10),
+        ('VALIGN', (0, 0), (-1, -1), 'TOP'),
+    ]))
+    story.append(info_table)
+    story.append(Spacer(1, 18))
+    # Diagnosis Section
+    story.append(Paragraph("Diagnosis", styles['Heading2']))
+    dx = profile.get("diagnosis", {})
+    dx_text = (
+        f"Primary: {dx.get('primary', 'Unknown')}. "
+        f"Stage: {dx.get('stage', 'Unknown')}. "
+        f"Histology: {dx.get('histology', 'Unknown')}. "
+        f"Diagnosed: {dx.get('diagnosis_date', 'Unknown')}."
+    )
+    story.append(Paragraph(dx_text, styles['Normal']))
+    story.append(Spacer(1, 12))
+    # Biomarkers Section
+    story.append(Paragraph("Molecular Testing", styles['Heading2']))
+    bm = profile.get("biomarkers", {})
+    bm_data = [["Biomarker", "Result"]]
+    for marker, value in bm.items():
+        if value is not None:
+            bm_data.append([marker.upper(), str(value)])
+    if len(bm_data) > 1:
+        bm_table = Table(bm_data, colWidths=[2.5*inch, 3.5*inch])
+        bm_table.setStyle(TableStyle([
+            ('BACKGROUND', (0, 0), (-1, 0), colors.lightgrey),
+            ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
+            ('GRID', (0, 0), (-1, -1), 0.5, colors.grey),
+            ('FONTSIZE', (0, 0), (-1, -1), 10),
+        ]))
+        story.append(bm_table)
+    story.append(Spacer(1, 12))
+    # Treatment History
+    story.append(Paragraph("Treatment History", styles['Heading2']))
+    treatments = profile.get("treatments", [])
+    for tx in treatments:
+        tx_text = f"- {tx['name']} ({tx['type']}): {tx.get('start_date', '')}"
+        story.append(Paragraph(tx_text, styles['Normal']))
+    doc.build(story)
+```
+### 3.3 病理报告模板
+```python
+# data/templates/pathology_report.py
+def generate_pathology_report(profile: dict, output_path: str):
+    """Generate a pathology report PDF."""
+    doc = SimpleDocTemplate(output_path, pagesize=letter)
+    styles = getSampleStyleSheet()
+    story = []
+    story.append(Paragraph("SURGICAL PATHOLOGY REPORT", styles['Title']))
+    story.append(Spacer(1, 12))
+    # Specimen Info
+    spec_data = [
+        ["Specimen:", "Right lung, upper lobe, wedge resection"],
+        ["Procedure:", "CT-guided needle biopsy"],
+        ["Date:", profile["diagnosis"]["diagnosis_date"]],
+    ]
+    spec_table = Table(spec_data, colWidths=[2*inch, 4*inch])
+    story.append(spec_table)
+    story.append(Spacer(1, 12))
+    # Final Diagnosis
+    story.append(Paragraph("FINAL DIAGNOSIS", styles['Heading2']))
+    story.append(Paragraph(
+        f"Non-small cell lung carcinoma, {profile['diagnosis'].get('histology', 'adenocarcinoma')}, "
+        f"{profile['diagnosis'].get('stage', 'Stage IIIA')}",
+        styles['Normal']
+    ))
+    # Biomarker Results
+    story.append(Spacer(1, 12))
+    story.append(Paragraph("MOLECULAR/IMMUNOHISTOCHEMISTRY", styles['Heading2']))
+    bm = profile.get("biomarkers", {})
+    results = []
+    if bm.get("egfr"):
+        results.append(f"EGFR mutation analysis: {bm['egfr']}")
+    if bm.get("alk"):
+        results.append(f"ALK rearrangement (FISH): {bm['alk']}")
+    if bm.get("pdl1_tps"):
+        results.append(f"PD-L1 (22C3, TPS): {bm['pdl1_tps']}")
+    if bm.get("kras"):
+        results.append(f"KRAS mutation analysis: {bm['kras']}")
+    for r in results:
+        story.append(Paragraph(r, styles['Normal']))
+    doc.build(story)
+```
+### 3.4 实验室报告模板
+```python
+# data/templates/lab_report.py
+def generate_lab_report(profile: dict, output_path: str):
+    """Generate a laboratory report PDF with CBC, CMP, etc."""
+    doc = SimpleDocTemplate(output_path, pagesize=letter)
+    styles = getSampleStyleSheet()
+    story = []
+    story.append(Paragraph("LABORATORY REPORT", styles['Title']))
+    story.append(Spacer(1, 12))
+    # Lab Results Table
+    lab_data = [["Test", "Result", "Unit", "Reference Range", "Date"]]
+    for lab in profile.get("labs", []):
+        lab_data.append([
+            lab["name"], str(lab["value"]), lab["unit"],
+            "",  # Reference range (can be added)
+            lab["date"][:10] if lab["date"] else ""
+        ])
+    if len(lab_data) > 1:
+        lab_table = Table(lab_data, colWidths=[2*inch, 1*inch, 0.8*inch, 1.2*inch, 1*inch])
+        lab_table.setStyle(TableStyle([
+            ('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#003366')),
+            ('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
+            ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
+            ('GRID', (0, 0), (-1, -1), 0.5, colors.grey),
+            ('FONTSIZE', (0, 0), (-1, -1), 9),
+            ('ROWBACKGROUNDS', (0, 1), (-1, -1), [colors.white, colors.HexColor('#f0f0f0')]),
+        ]))
+        story.append(lab_table)
+    doc.build(story)
+```
+### 3.5 噪声注入策略
+```python
+# data/noise/noise_injector.py
+import random
+import re
+from pathlib import Path
+from PIL import Image
+# Augraphy 管线配置
+try:
+    from augraphy import (
+        AugraphyPipeline, InkBleed, Letterpress, LowInkPeriodicLines,
+        DirtyDrum, SubtleNoise, Jpeg, Brightness, BleedThrough
+    )
+    AUGRAPHY_AVAILABLE = True
+except ImportError:
+    AUGRAPHY_AVAILABLE = False
+class NoiseInjector:
+    """受控噪声注入引擎，模拟真实世界文档退化。"""
+    # OCR 常见错误映射
+    OCR_ERROR_MAP = {
+        "0": ["O", "o", "Q"],
+        "1": ["l", "I", "|"],
+        "5": ["S", "s"],
+        "8": ["B"],
+        "O": ["0", "Q"],
+        "l": ["1", "I", "|"],
+        "rn": ["m"],
+        "cl": ["d"],
+        "vv": ["w"],
+    }
+    # 医学缩写替换
+    ABBREVIATION_MAP = {
+        "non-small cell lung cancer": ["NSCLC", "non-small cell ca", "NSCC"],
+        "adenocarcinoma": ["adeno", "adenoca", "adeno ca"],
+        "squamous cell carcinoma": ["SCC", "squamous ca", "sq cell ca"],
+        "Eastern Cooperative Oncology Group": ["ECOG"],
+        "performance status": ["PS", "perf status"],
+        "milligrams per deciliter": ["mg/dL", "mg/dl"],
+        "computed tomography": ["CT", "cat scan"],
+    }
+    # 噪声级别配置
+    NOISE_LEVELS = {
+        "clean": {"ocr_rate": 0.0, "abbrev_rate": 0.0, "missing_rate": 0.0},
+        "mild": {"ocr_rate": 0.02, "abbrev_rate": 0.1, "missing_rate": 0.05},
+        "moderate": {"ocr_rate": 0.05, "abbrev_rate": 0.2, "missing_rate": 0.1},
+        "severe": {"ocr_rate": 0.10, "abbrev_rate": 0.3, "missing_rate": 0.2},
+    }
+    def __init__(self, noise_level: str = "mild", seed: int = 42):
+        self.config = self.NOISE_LEVELS[noise_level]
+        self.rng = random.Random(seed)
+    def inject_text_noise(self, text: str) -> tuple[str, list[dict]]:
+        """Inject OCR errors and abbreviations into text.
+        Returns (noisy_text, list_of_injected_noise_records).
+        """
+        noise_records = []
+        chars = list(text)
+        # OCR character substitutions
+        i = 0
+        while i < len(chars):
+            if self.rng.random() < self.config["ocr_rate"]:
+                original = chars[i]
+                if original in self.OCR_ERROR_MAP:
+                    replacement = self.rng.choice(self.OCR_ERROR_MAP[original])
+                    chars[i] = replacement
+                    noise_records.append({
+                        "type": "ocr_error",
+                        "position": i,
+                        "original": original,
+                        "replacement": replacement,
+                    })
+            i += 1
+        noisy_text = "".join(chars)
+        # Abbreviation substitutions
+        for full_form, abbreviations in self.ABBREVIATION_MAP.items():
+            if full_form in noisy_text.lower() and self.rng.random() < self.config["abbrev_rate"]:
+                abbrev = self.rng.choice(abbreviations)
+                noisy_text = re.sub(
+                    re.escape(full_form), abbrev, noisy_text, count=1, flags=re.IGNORECASE
+                )
+                noise_records.append({
+                    "type": "abbreviation",
+                    "original": full_form,
+                    "replacement": abbrev,
+                })
+        return noisy_text, noise_records
+    def inject_missing_values(self, profile: dict) -> tuple[dict, list[str]]:
+        """Randomly remove fields from profile to simulate missing data.
+        Returns (modified_profile, list_of_removed_fields).
+        """
+        removed = []
+        removable_fields = [
+            ("biomarkers", "egfr"),
+            ("biomarkers", "alk"),
+            ("biomarkers", "pdl1_tps"),
+            ("biomarkers", "kras"),
+            ("biomarkers", "ros1"),
+            ("diagnosis", "stage"),
+            ("diagnosis", "histology"),
+        ]
+        for section, field_name in removable_fields:
+            if self.rng.random() < self.config["missing_rate"]:
+                if section in profile and field_name in profile[section]:
+                    profile[section][field_name] = None
+                    removed.append(f"{section}.{field_name}")
+        return profile, removed
+    def degrade_image(self, image: Image.Image) -> Image.Image:
+        """Apply Augraphy degradation pipeline to document image."""
+        if not AUGRAPHY_AVAILABLE:
+            return image
+        import numpy as np
+        img_array = np.array(image)
+        pipeline = AugraphyPipeline(
+            ink_phase=[
+                InkBleed(p=0.5),
+                Letterpress(p=0.3),
+                LowInkPeriodicLines(p=0.3),
+            ],
+            paper_phase=[
+                SubtleNoise(p=0.5),
+            ],
+            post_phase=[
+                DirtyDrum(p=0.3),
+                Brightness(p=0.5),
+                Jpeg(p=0.5),
+            ],
+        )
+        degraded = pipeline(img_array)
+        return Image.fromarray(degraded)
+```
+---
+## 4. TREC 基准评估指南
+### 4.1 数据集概述
+**TREC Clinical Trials Track 2021：**
+- 来源：NIST 文本检索会议
+- Topics（查询）：75 个合成患者描述（5-10 句入院记录）
+- 文档集：376,000+ 临床试验（ClinicalTrials.gov 2021 年 4 月快照）
+- Qrels：35,832 条相关性判断
+- 相关性标签：0=不相关，1=排除，2=合格
+**TREC Clinical Trials Track 2022：**
+- Topics：50 个合成患者描述
+- 使用相同的文档集快照
+### 4.2 数据格式
+#### Topics XML 格式
+```xml
+<topics task="2021 TREC Clinical Trials">
+  <topic number="1">
+    A 62-year-old male presents with a 3-month history of
+    progressive dyspnea and a 20-pound weight loss. He has
+    a 40 pack-year smoking history. CT chest reveals a 4.5cm
+    right upper lobe mass with mediastinal lymphadenopathy.
+    Biopsy confirms non-small cell lung cancer, adenocarcinoma.
+    EGFR mutation testing is positive for exon 19 deletion.
+    PD-L1 TPS is 60%. ECOG performance status is 1.
+  </topic>
+  <topic number="2">
+    ...
+  </topic>
+</topics>
+```
+#### Qrels 格式（制表符分隔）
+```
+topic_id  0  doc_id  relevance
+1         0  NCT00760162  2
+1         0  NCT01234567  1
+1         0  NCT09876543  0
+```
+- 列 1：Topic 编号
+- 列 2：固定值 0（迭代次数）
+- 列 3：NCT 文档 ID
+- 列 4：相关性（0=不相关，1=排除，2=合格）
+#### Run 提交格式
+```
+TOPIC_NO Q0 NCT_ID RANK SCORE RUN_NAME
+1 Q0 NCT00760162 1 0.9999 trialpath-v1
+1 Q0 NCT01234567 2 0.9998 trialpath-v1
+```
+### 4.3 使用 ir_datasets 加载数据
+```python
+# evaluation/run_trec_benchmark.py
+import ir_datasets
+def load_trec_2021():
+    """Load TREC CT 2021 topics and qrels via ir_datasets."""
+    dataset = ir_datasets.load("clinicaltrials/2021/trec-ct-2021")
+    # 加载 topics (GenericQuery: query_id, text)
+    topics = {}
+    for query in dataset.queries_iter():
+        topics[query.query_id] = query.text
+    # 加载 qrels (TrecQrel: query_id, doc_id, relevance, iteration)
+    qrels = {}
+    for qrel in dataset.qrels_iter():
+        if qrel.query_id not in qrels:
+            qrels[qrel.query_id] = {}
+        qrels[qrel.query_id][qrel.doc_id] = qrel.relevance
+    return topics, qrels
+def load_trec_2022():
+    """Load TREC CT 2022 topics and qrels."""
+    dataset = ir_datasets.load("clinicaltrials/2021/trec-ct-2022")
+    topics = {q.query_id: q.text for q in dataset.queries_iter()}
+    qrels = {}
+    for qrel in dataset.qrels_iter():
+        if qrel.query_id not in qrels:
+            qrels[qrel.query_id] = {}
+        qrels[qrel.query_id][qrel.doc_id] = qrel.relevance
+    return topics, qrels
+def load_trial_documents():
+    """Load the clinical trial documents from ir_datasets."""
+    dataset = ir_datasets.load("clinicaltrials/2021")
+    # ClinicalTrialsDoc: doc_id, title, condition, summary,
+    #                     detailed_description, eligibility
+    docs = {}
+    for doc in dataset.docs_iter():
+        docs[doc.doc_id] = {
+            "title": doc.title,
+            "condition": doc.condition,
+            "summary": doc.summary,
+            "detailed_description": doc.detailed_description,
+            "eligibility": doc.eligibility,
+        }
+    return docs
+```
+### 4.4 TrialPath 输出到 TREC 格式的映射
+```python
+def convert_trialpath_to_trec_run(
+    results: dict[str, list[dict]],
+    run_name: str = "trialpath-v1"
+) -> str:
+    """Convert TrialPath matching results to TREC run format.
+    Args:
+        results: {topic_id: [{"nct_id": str, "score": float}, ...]}
+        run_name: Run identifier
+    Returns:
+        TREC-format run string
+    """
+    lines = []
+    for topic_id, candidates in results.items():
+        sorted_candidates = sorted(candidates, key=lambda x: x["score"], reverse=True)
+        for rank, candidate in enumerate(sorted_candidates[:1000], 1):
+            lines.append(
+                f"{topic_id} Q0 {candidate['nct_id']} {rank} "
+                f"{candidate['score']:.6f} {run_name}"
+            )
+    return "\n".join(lines)
+def save_trec_run(run_str: str, output_path: str):
+    """Save TREC run to file."""
+    with open(output_path, 'w') as f:
+        f.write(run_str)
+```
+### 4.5 使用 ir-measures 计算评估指标
+```python
+# evaluation/run_trec_benchmark.py (续)
+import ir_measures
+from ir_measures import nDCG, P, Recall, AP, RR, SetP, SetR, SetF
+def evaluate_trec_run(
+    qrels_path: str,
+    run_path: str,
+) -> dict:
+    """Evaluate a TREC run using ir-measures.
+    Target metrics:
+    - Recall@50 >= 0.75
+    - NDCG@10 >= 0.60
+    - P@10 (informational)
+    """
+    qrels = list(ir_measures.read_trec_qrels(qrels_path))
+    run = list(ir_measures.read_trec_run(run_path))
+    # 定义目标指标
+    measures = [
+        nDCG@10,        # Target >= 0.60
+        Recall@50,      # Target >= 0.75
+        P@10,           # Precision at 10
+        AP,             # Mean Average Precision
+        RR,             # Reciprocal Rank
+        nDCG@20,        # Additional depth
+        Recall@100,     # Extended recall
+    ]
+    # 计算聚合指标
+    aggregate = ir_measures.calc_aggregate(measures, qrels, run)
+    # 计算逐查询指标
+    per_query = {}
+    for metric in ir_measures.iter_calc(measures, qrels, run):
+        qid = metric.query_id
+        if qid not in per_query:
+            per_query[qid] = {}
+        per_query[qid][str(metric.measure)] = metric.value
+    return {
+        "aggregate": {str(k): v for k, v in aggregate.items()},
+        "per_query": per_query,
+        "pass_fail": {
+            "ndcg@10": aggregate.get(nDCG@10, 0) >= 0.60,
+            "recall@50": aggregate.get(Recall@50, 0) >= 0.75,
+        }
+    }
+def evaluate_with_eligibility_levels(
+    qrels_path: str,
+    run_path: str,
+) -> dict:
+    """Evaluate with TREC CT graded relevance (0=NR, 1=Excluded, 2=Eligible).
+    Uses rel=2 for strict eligible-only evaluation.
+    """
+    qrels = list(ir_measures.read_trec_qrels(qrels_path))
+    run = list(ir_measures.read_trec_run(run_path))
+    # Standard evaluation (relevance >= 1)
+    standard_measures = [nDCG@10, Recall@50, P@10]
+    standard = ir_measures.calc_aggregate(standard_measures, qrels, run)
+    # Strict evaluation (only eligible = relevance 2)
+    strict_measures = [
+        AP(rel=2),
+        P(rel=2)@10,
+        Recall(rel=2)@50,
+    ]
+    strict = ir_measures.calc_aggregate(strict_measures, qrels, run)
+    return {
+        "standard": {str(k): v for k, v in standard.items()},
+        "strict_eligible_only": {str(k): v for k, v in strict.items()},
+    }
+```
+### 4.6 使用 ir_datasets 的替代 qrels/run 格式
+```python
+def evaluate_from_dicts(
+    qrels_dict: dict[str, dict[str, int]],
+    run_dict: dict[str, list[tuple[str, float]]],
+) -> dict:
+    """Evaluate using Python dict format (no files needed).
+    Args:
+        qrels_dict: {query_id: {doc_id: relevance}}
+        run_dict: {query_id: [(doc_id, score), ...]}
+    """
+    # Convert to ir-measures format
+    qrels = [
+        ir_measures.Qrel(qid, did, rel)
+        for qid, docs in qrels_dict.items()
+        for did, rel in docs.items()
+    ]
+    run = [
+        ir_measures.ScoredDoc(qid, did, score)
+        for qid, docs in run_dict.items()
+        for did, score in docs
+    ]
+    measures = [nDCG@10, Recall@50, P@10, AP]
+    aggregate = ir_measures.calc_aggregate(measures, qrels, run)
+    return {str(k): v for k, v in aggregate.items()}
+```
+---
+## 5. MedGemma 提取评估
+### 5.1 标注数据集设计
+```python
+# evaluation/extraction_eval.py
+from dataclasses import dataclass
+from typing import Optional
+@dataclass
+class AnnotatedField:
+    """A single annotated field with ground truth and extraction result."""
+    field_name: str           # e.g., "biomarkers.egfr"
+    ground_truth: Optional[str]   # From Synthea profile (gold standard)
+    extracted: Optional[str]      # From MedGemma extraction
+    evidence_span: Optional[str]  # Text span in source document
+    source_page: Optional[int]    # Page number in PDF
+@dataclass
+class ExtractionAnnotation:
+    """Complete annotation for one patient's extraction."""
+    patient_id: str
+    fields: list[AnnotatedField]
+    noise_level: str  # "clean", "mild", "moderate", "severe"
+    document_type: str  # "clinical_letter", "pathology_report", etc.
+```
+**标注数据集结构：**
+```json
+{
+  "patient_id": "synth-001",
+  "noise_level": "mild",
+  "document_type": "clinical_letter",
+  "fields": [
+    {
+      "field_name": "demographics.name",
+      "ground_truth": "John Smith",
+      "extracted": "John Smith",
+      "correct": true
+    },
+    {
+      "field_name": "diagnosis.stage",
+      "ground_truth": "Stage IIIA",
+      "extracted": "Stage 3A",
+      "correct": true,
+      "note": "Equivalent representation"
+    },
+    {
+      "field_name": "biomarkers.egfr",
+      "ground_truth": "Exon 19 deletion",
+      "extracted": "EGFR positive",
+      "correct": false,
+      "note": "Partial extraction - missing specific mutation"
+    }
+  ]
+}
+```
+### 5.2 字段级 F1 计算
+```python
+# evaluation/extraction_eval.py
+from sklearn.metrics import (
+    f1_score, precision_score, recall_score,
+    classification_report, confusion_matrix
+)
+import numpy as np
+# 定义所有可提取字段
+EXTRACTION_FIELDS = [
+    "demographics.name",
+    "demographics.sex",
+    "demographics.date_of_birth",
+    "demographics.age",
+    "diagnosis.primary",
+    "diagnosis.stage",
+    "diagnosis.histology",
+    "biomarkers.egfr",
+    "biomarkers.alk",
+    "biomarkers.pdl1_tps",
+    "biomarkers.kras",
+    "biomarkers.ros1",
+    "labs.wbc",
+    "labs.hemoglobin",
+    "labs.platelets",
+    "labs.creatinine",
+    "labs.alt",
+    "labs.ast",
+    "treatments.current_regimen",
+    "performance_status.ecog",
+]
+def compute_field_level_f1(
+    annotations: list[dict],
+) -> dict:
+    """Compute field-level F1, precision, recall.
+    For each field:
+    - TP: ground_truth exists AND extracted matches
+    - FP: extracted exists BUT ground_truth is None or mismatch
+    - FN: ground_truth exists BUT extracted is None or mismatch
+    Args:
+        annotations: List of patient annotation dicts
+    Returns:
+        Per-field and aggregate metrics
+    """
+    field_metrics = {}
+    for field_name in EXTRACTION_FIELDS:
+        y_true = []  # 1 if field has ground truth value
+        y_pred = []  # 1 if field was correctly extracted
+        for ann in annotations:
+            fields = {f["field_name"]: f for f in ann["fields"]}
+            if field_name in fields:
+                f = fields[field_name]
+                has_gt = f["ground_truth"] is not None
+                is_correct = f.get("correct", False)
+                y_true.append(1 if has_gt else 0)
+                y_pred.append(1 if is_correct else 0)
+        if len(y_true) > 0:
+            precision = precision_score(y_true, y_pred, zero_division=0)
+            recall = recall_score(y_true, y_pred, zero_division=0)
+            f1 = f1_score(y_true, y_pred, zero_division=0)
+            field_metrics[field_name] = {
+                "precision": round(precision, 4),
+                "recall": round(recall, 4),
+                "f1": round(f1, 4),
+                "support": sum(y_true),
+            }
+    # Aggregate metrics
+    all_y_true = []
+    all_y_pred = []
+    for ann in annotations:
+        for f in ann["fields"]:
+            has_gt = f["ground_truth"] is not None
+            is_correct = f.get("correct", False)
+            all_y_true.append(1 if has_gt else 0)
+            all_y_pred.append(1 if is_correct else 0)
+    micro_f1 = f1_score(all_y_true, all_y_pred, zero_division=0)
+    macro_f1 = np.mean([m["f1"] for m in field_metrics.values()])
+    return {
+        "per_field": field_metrics,
+        "micro_f1": round(micro_f1, 4),
+        "macro_f1": round(macro_f1, 4),
+        "total_fields": len(all_y_true),
+        "pass": micro_f1 >= 0.85,  # Target: F1 >= 0.85
+    }
+def compute_extraction_report(annotations: list[dict]) -> str:
+    """Generate a scikit-learn classification_report style output."""
+    all_y_true = []
+    all_y_pred = []
+    labels = []
+    for field_name in EXTRACTION_FIELDS:
+        for ann in annotations:
+            fields = {f["field_name"]: f for f in ann["fields"]}
+            if field_name in fields:
+                f = fields[field_name]
+                has_gt = f["ground_truth"] is not None
+                is_correct = f.get("correct", False)
+                all_y_true.append(1 if has_gt else 0)
+                all_y_pred.append(1 if is_correct else 0)
+    return classification_report(
+        all_y_true, all_y_pred,
+        target_names=["absent", "present/correct"],
+        digits=4,
+    )
+def compare_with_baseline(
+    medgemma_annotations: list[dict],
+    gemini_only_annotations: list[dict],
+) -> dict:
+    """Compare MedGemma extraction vs Gemini-only baseline."""
+    medgemma_metrics = compute_field_level_f1(medgemma_annotations)
+    gemini_metrics = compute_field_level_f1(gemini_only_annotations)
+    comparison = {}
+    for field_name in EXTRACTION_FIELDS:
+        mg = medgemma_metrics["per_field"].get(field_name, {})
+        gm = gemini_metrics["per_field"].get(field_name, {})
+        comparison[field_name] = {
+            "medgemma_f1": mg.get("f1", 0),
+            "gemini_f1": gm.get("f1", 0),
+            "delta": round(mg.get("f1", 0) - gm.get("f1", 0), 4),
+        }
+    return {
+        "per_field_comparison": comparison,
+        "medgemma_overall_f1": medgemma_metrics["micro_f1"],
+        "gemini_overall_f1": gemini_metrics["micro_f1"],
+        "improvement": round(
+            medgemma_metrics["micro_f1"] - gemini_metrics["micro_f1"], 4
+        ),
+    }
+```
+### 5.3 噪声级别对提取性能的影响分析
+```python
+def analyze_noise_impact(annotations: list[dict]) -> dict:
+    """Analyze how noise level affects extraction F1."""
+    by_noise = {}
+    for ann in annotations:
+        level = ann["noise_level"]
+        if level not in by_noise:
+            by_noise[level] = []
+        by_noise[level].append(ann)
+    results = {}
+    for level, level_anns in by_noise.items():
+        metrics = compute_field_level_f1(level_anns)
+        results[level] = {
+            "micro_f1": metrics["micro_f1"],
+            "macro_f1": metrics["macro_f1"],
+            "n_patients": len(level_anns),
+        }
+    return results
+```
+---
+## 6. 端到端评估管线
+### 6.1 Criterion Decision Accuracy
+```python
+# evaluation/criterion_eval.py
+def compute_criterion_accuracy(
+    predictions: list[dict],
+    ground_truth: list[dict],
+) -> dict:
+    """Compute criterion-level decision accuracy.
+    Each prediction/ground_truth entry:
+    {
+        "patient_id": str,
+        "trial_id": str,
+        "criteria": [
+            {"criterion_id": str, "decision": "met"|"not_met"|"unknown",
+             "evidence": str}
+        ]
+    }
+    Target: >= 0.85
+    """
+    total = 0
+    correct = 0
+    by_decision_type = {"met": {"tp": 0, "total": 0},
+                        "not_met": {"tp": 0, "total": 0},
+                        "unknown": {"tp": 0, "total": 0}}
+    for pred, gt in zip(predictions, ground_truth):
+        assert pred["patient_id"] == gt["patient_id"]
+        assert pred["trial_id"] == gt["trial_id"]
+        gt_map = {c["criterion_id"]: c["decision"] for c in gt["criteria"]}
+        for criterion in pred["criteria"]:
+            cid = criterion["criterion_id"]
+            if cid in gt_map:
+                total += 1
+                gt_decision = gt_map[cid]
+                pred_decision = criterion["decision"]
+                by_decision_type[gt_decision]["total"] += 1
+                if pred_decision == gt_decision:
+                    correct += 1
+                    by_decision_type[gt_decision]["tp"] += 1
+    accuracy = correct / total if total > 0 else 0.0
+    return {
+        "overall_accuracy": round(accuracy, 4),
+        "total_criteria": total,
+        "correct": correct,
+        "pass": accuracy >= 0.85,
+        "by_decision_type": {
+            k: {
+                "accuracy": round(v["tp"] / v["total"], 4) if v["total"] > 0 else 0,
+                "support": v["total"],
+            }
+            for k, v in by_decision_type.items()
+        },
+    }
+```
+### 6.2 延迟基准测试
+```python
+# evaluation/latency_cost_tracker.py
+import time
+import json
+from dataclasses import dataclass, field, asdict
+from typing import Optional
+from contextlib import contextmanager
+@dataclass
+class APICallRecord:
+    """Record of a single API call."""
+    service: str       # "medgemma", "gemini", "clinicaltrials_mcp"
+    operation: str     # "extract", "search", "evaluate_criterion"
+    latency_ms: float
+    input_tokens: int = 0
+    output_tokens: int = 0
+    cost_usd: float = 0.0
+    timestamp: str = ""
+@dataclass
+class SessionMetrics:
+    """Aggregate metrics for a patient matching session."""
+    patient_id: str
+    total_latency_ms: float = 0.0
+    total_cost_usd: float = 0.0
+    api_calls: list[APICallRecord] = field(default_factory=list)
+    @property
+    def total_latency_s(self) -> float:
+        return self.total_latency_ms / 1000.0
+    @property
+    def pass_latency(self) -> bool:
+        """Target: < 15s per session."""
+        return self.total_latency_s < 15.0
+    @property
+    def pass_cost(self) -> bool:
+        """Target: < $0.50 per session."""
+        return self.total_cost_usd < 0.50
+class LatencyCostTracker:
+    """Track latency and cost across API calls."""
+    # Pricing per 1M tokens (approximate)
+    PRICING = {
+        "medgemma": {"input": 0.0, "output": 0.0},  # Self-hosted
+        "gemini": {"input": 1.25, "output": 5.00},   # Gemini Pro
+        "clinicaltrials_mcp": {"input": 0.0, "output": 0.0},  # Free API
+    }
+    def __init__(self):
+        self.sessions: list[SessionMetrics] = []
+        self._current_session: Optional[SessionMetrics] = None
+    def start_session(self, patient_id: str):
+        self._current_session = SessionMetrics(patient_id=patient_id)
+    def end_session(self) -> SessionMetrics:
+        session = self._current_session
+        if session:
+            session.total_latency_ms = sum(c.latency_ms for c in session.api_calls)
+            session.total_cost_usd = sum(c.cost_usd for c in session.api_calls)
+            self.sessions.append(session)
+        self._current_session = None
+        return session
+    @contextmanager
+    def track_call(self, service: str, operation: str):
+        """Context manager to track an API call."""
+        start = time.monotonic()
+        record = APICallRecord(service=service, operation=operation, latency_ms=0)
+        try:
+            yield record
+        finally:
+            record.latency_ms = (time.monotonic() - start) * 1000
+            # Compute cost
+            pricing = self.PRICING.get(service, {"input": 0, "output": 0})
+            record.cost_usd = (
+                record.input_tokens * pricing["input"] / 1_000_000
+                + record.output_tokens * pricing["output"] / 1_000_000
+            )
+            if self._current_session:
+                self._current_session.api_calls.append(record)
+    def summary(self) -> dict:
+        """Generate aggregate summary across all sessions."""
+        if not self.sessions:
+            return {}
+        latencies = [s.total_latency_s for s in self.sessions]
+        costs = [s.total_cost_usd for s in self.sessions]
+        return {
+            "n_sessions": len(self.sessions),
+            "latency": {
+                "mean_s": round(sum(latencies) / len(latencies), 2),
+                "p50_s": round(sorted(latencies)[len(latencies) // 2], 2),
+                "p95_s": round(sorted(latencies)[int(len(latencies) * 0.95)], 2),
+                "max_s": round(max(latencies), 2),
+                "pass_rate": round(
+                    sum(1 for s in self.sessions if s.pass_latency) / len(self.sessions), 4
+                ),
+            },
+            "cost": {
+                "mean_usd": round(sum(costs) / len(costs), 4),
+                "total_usd": round(sum(costs), 4),
+                "max_usd": round(max(costs), 4),
+                "pass_rate": round(
+                    sum(1 for s in self.sessions if s.pass_cost) / len(self.sessions), 4
+                ),
+            },
+            "targets": {
+                "latency_pass": all(s.pass_latency for s in self.sessions),
+                "cost_pass": all(s.pass_cost for s in self.sessions),
+            },
+        }
+```
+---
+## 7. TDD 测试用例
+### 7.1 Synthea 数据验证测试
+```python
+# tests/test_synthea_data.py
+import pytest
+import json
+from pathlib import Path
+# 预期的 FHIR Resource 类型
+REQUIRED_RESOURCE_TYPES = {"Patient", "Condition", "Observation", "Encounter"}
+class TestSyntheaDataValidation:
+    """Validate Synthea FHIR output for TrialPath requirements."""
+    def test_fhir_bundle_is_valid_json(self, fhir_file):
+        """Bundle must be valid JSON."""
+        with open(fhir_file) as f:
+            data = json.load(f)
+        assert data["resourceType"] == "Bundle"
+        assert "entry" in data
+    def test_bundle_contains_required_resources(self, fhir_file):
+        """Bundle must contain Patient, Condition, Observation, Encounter."""
+        with open(fhir_file) as f:
+            bundle = json.load(f)
+        resource_types = {
+            e["resource"]["resourceType"] for e in bundle["entry"]
+        }
+        for rt in REQUIRED_RESOURCE_TYPES:
+            assert rt in resource_types, f"Missing {rt} resource"
+    def test_patient_has_demographics(self, fhir_file):
+        """Patient resource must have name, gender, birthDate."""
+        with open(fhir_file) as f:
+            bundle = json.load(f)
+        patients = [
+            e["resource"] for e in bundle["entry"]
+            if e["resource"]["resourceType"] == "Patient"
+        ]
+        assert len(patients) == 1
+        patient = patients[0]
+        assert "name" in patient
+        assert "gender" in patient
+        assert "birthDate" in patient
+    def test_lung_cancer_condition_present(self, fhir_file):
+        """At least one Condition must be NSCLC or lung cancer."""
+        with open(fhir_file) as f:
+            bundle = json.load(f)
+        conditions = [
+            e["resource"] for e in bundle["entry"]
+            if e["resource"]["resourceType"] == "Condition"
+        ]
+        lung_cancer_codes = {"254637007", "254632001", "162573006"}
+        has_lung_cancer = False
+        for cond in conditions:
+            codings = cond.get("code", {}).get("coding", [])
+            for c in codings:
+                if c.get("code") in lung_cancer_codes:
+                    has_lung_cancer = True
+        assert has_lung_cancer, "No lung cancer Condition found"
+    def test_patient_profile_conversion(self, fhir_file):
+        """FHIR Bundle must convert to valid PatientProfile."""
+        profile = parse_fhir_bundle(Path(fhir_file))
+        assert profile.patient_id != ""
+        assert profile.demographics.name != ""
+        assert profile.demographics.sex in ("male", "female")
+        assert profile.diagnosis.primary != ""
+    def test_batch_generation_produces_500_patients(self, output_dir):
+        """Batch generation must produce at least 500 FHIR files."""
+        fhir_files = list(Path(output_dir).glob("*.json"))
+        assert len(fhir_files) >= 500
+    def test_nsclc_ratio(self, all_profiles):
+        """~85% of lung cancer patients should be NSCLC."""
+        nsclc_count = sum(
+            1 for p in all_profiles
+            if "non-small cell" in p.diagnosis.primary.lower()
+            or "nsclc" in p.diagnosis.primary.lower()
+        )
+        ratio = nsclc_count / len(all_profiles)
+        assert 0.70 <= ratio <= 0.95, f"NSCLC ratio {ratio} outside expected range"
+```
+### 7.2 PDF 生成正确性测试
+```python
+# tests/test_pdf_generation.py
+import pytest
+from pathlib import Path
+from data.templates.clinical_letter import generate_clinical_letter
+from data.templates.pathology_report import generate_pathology_report
+from data.templates.lab_report import generate_lab_report
+class TestPDFGeneration:
+    """Test that PDF generation produces valid documents."""
+    SAMPLE_PROFILE = {
+        "patient_id": "test-001",
+        "demographics": {
+            "name": "Jane Doe",
+            "sex": "female",
+            "date_of_birth": "1960-05-15",
+        },
+        "diagnosis": {
+            "primary": "Non-small cell lung cancer, adenocarcinoma",
+            "stage": "Stage IIIA",
+            "histology": "adenocarcinoma",
+            "diagnosis_date": "2024-01-15",
+        },
+        "biomarkers": {
+            "egfr": "Exon 19 deletion",
+            "alk": "Negative",
+            "pdl1_tps": "60%",
+            "kras": None,
+        },
+        "labs": [
+            {"name": "WBC", "value": 7.2, "unit": "10*3/uL", "date": "2024-01-10", "loinc_code": "6690-2"},
+            {"name": "Hemoglobin", "value": 12.5, "unit": "g/dL", "date": "2024-01-10", "loinc_code": "718-7"},
+        ],
+        "treatments": [
+            {"name": "Cisplatin", "type": "medication", "start_date": "2024-02-01"},
+        ],
+    }
+    def test_clinical_letter_generates_pdf(self, tmp_path):
+        """Clinical letter must generate a non-empty PDF file."""
+        output = tmp_path / "letter.pdf"
+        generate_clinical_letter(self.SAMPLE_PROFILE, str(output))
+        assert output.exists()
+        assert output.stat().st_size > 0
+    def test_pathology_report_generates_pdf(self, tmp_path):
+        """Pathology report must generate a non-empty PDF file."""
+        output = tmp_path / "pathology.pdf"
+        generate_pathology_report(self.SAMPLE_PROFILE, str(output))
+        assert output.exists()
+        assert output.stat().st_size > 0
+    def test_lab_report_generates_pdf(self, tmp_path):
+        """Lab report must generate a non-empty PDF file."""
+        output = tmp_path / "lab.pdf"
+        generate_lab_report(self.SAMPLE_PROFILE, str(output))
+        assert output.exists()
+        assert output.stat().st_size > 0
+    def test_pdf_contains_patient_name(self, tmp_path):
+        """Generated PDF must contain patient name (OCR-verifiable)."""
+        output = tmp_path / "letter.pdf"
+        generate_clinical_letter(self.SAMPLE_PROFILE, str(output))
+        # Read PDF text (using pdfplumber or PyPDF2)
+        import pdfplumber
+        with pdfplumber.open(str(output)) as pdf:
+            text = ""
+            for page in pdf.pages:
+                text += page.extract_text() or ""
+        assert "Jane Doe" in text
+    def test_pdf_contains_biomarkers(self, tmp_path):
+        """Generated PDF must contain biomarker results."""
+        output = tmp_path / "pathology.pdf"
+        generate_pathology_report(self.SAMPLE_PROFILE, str(output))
+        import pdfplumber
+        with pdfplumber.open(str(output)) as pdf:
+            text = ""
+            for page in pdf.pages:
+                text += page.extract_text() or ""
+        assert "EGFR" in text
+        assert "Exon 19" in text or "positive" in text.lower()
+    def test_missing_biomarker_handled_gracefully(self, tmp_path):
+        """PDF generation should not crash when biomarkers are None."""
+        profile = self.SAMPLE_PROFILE.copy()
+        profile["biomarkers"] = {
+            "egfr": None, "alk": None, "pdl1_tps": None, "kras": None
+        }
+        output = tmp_path / "letter.pdf"
+        generate_clinical_letter(profile, str(output))
+        assert output.exists()
+```
+### 7.3 噪声注入效果验证测试
+```python
+# tests/test_noise_injection.py
+import pytest
+from data.noise.noise_injector import NoiseInjector
+class TestNoiseInjection:
+    """Test noise injection produces expected results."""
+    def test_clean_noise_no_changes(self):
+        """Clean level should produce no changes."""
+        injector = NoiseInjector(noise_level="clean", seed=42)
+        text = "Patient has EGFR mutation positive"
+        noisy, records = injector.inject_text_noise(text)
+        assert noisy == text
+        assert len(records) == 0
+    def test_mild_noise_produces_some_changes(self):
+        """Mild noise should produce some but limited changes."""
+        injector = NoiseInjector(noise_level="mild", seed=42)
+        # Use longer text to increase chance of noise
+        text = "The patient is a 65 year old male with stage IIIA " * 10
+        noisy, records = injector.inject_text_noise(text)
+        # Should have some changes but not too many
+        assert len(records) >= 0  # May or may not have changes depending on seed
+    def test_severe_noise_produces_many_changes(self):
+        """Severe noise should produce noticeable changes."""
+        injector = NoiseInjector(noise_level="severe", seed=42)
+        text = "The 50 year old patient has stage 1 NSCLC " * 20
+        noisy, records = injector.inject_text_noise(text)
+        assert noisy != text  # Should differ from original
+        assert len(records) > 0
+    def test_ocr_error_types_are_valid(self):
+        """OCR errors should only substitute known character pairs."""
+        injector = NoiseInjector(noise_level="severe", seed=42)
+        text = "0123456789 OIBS" * 10
+        _, records = injector.inject_text_noise(text)
+        for r in records:
+            if r["type"] == "ocr_error":
+                assert r["original"] in NoiseInjector.OCR_ERROR_MAP
+                assert r["replacement"] in NoiseInjector.OCR_ERROR_MAP[r["original"]]
+    def test_missing_value_injection(self):
+        """Missing value injection should remove some fields."""
+        injector = NoiseInjector(noise_level="moderate", seed=42)
+        profile = {
+            "biomarkers": {"egfr": "positive", "alk": "negative",
+                          "pdl1_tps": "60%", "kras": "negative", "ros1": "negative"},
+            "diagnosis": {"stage": "IIIA", "histology": "adenocarcinoma"},
+        }
+        modified, removed = injector.inject_missing_values(profile)
+        # At 10% rate with 7 fields, expect 0-3 removals
+        assert len(removed) <= 7
+        for field_path in removed:
+            section, field_name = field_path.split(".")
+            assert modified[section][field_name] is None
+    def test_noise_is_deterministic_with_seed(self):
+        """Same seed should produce identical results."""
+        text = "Patient has stage IIIA non-small cell lung cancer"
+        inj1 = NoiseInjector(noise_level="moderate", seed=123)
+        inj2 = NoiseInjector(noise_level="moderate", seed=123)
+        noisy1, _ = inj1.inject_text_noise(text)
+        noisy2, _ = inj2.inject_text_noise(text)
+        assert noisy1 == noisy2
+    def test_different_seeds_produce_different_results(self):
+        """Different seeds should generally produce different noise."""
+        text = "The 50 year old patient has 10 biomarker tests 0 1 5 8" * 20
+        inj1 = NoiseInjector(noise_level="severe", seed=1)
+        inj2 = NoiseInjector(noise_level="severe", seed=999)
+        noisy1, _ = inj1.inject_text_noise(text)
+        noisy2, _ = inj2.inject_text_noise(text)
+        # With severe noise on long text, different seeds should differ
+        assert noisy1 != noisy2
+```
+### 7.4 TREC 评估计算测试
+```python
+# tests/test_trec_evaluation.py
+import pytest
+import ir_measures
+from ir_measures import nDCG, Recall, P, AP
+class TestTRECEvaluation:
+    """Test TREC evaluation metric computation."""
+    @pytest.fixture
+    def sample_qrels(self):
+        """Sample qrels with known ground truth."""
+        return [
+            ir_measures.Qrel("q1", "d1", 2),  # eligible
+            ir_measures.Qrel("q1", "d2", 1),  # excluded
+            ir_measures.Qrel("q1", "d3", 0),  # not relevant
+            ir_measures.Qrel("q1", "d4", 2),  # eligible
+            ir_measures.Qrel("q1", "d5", 0),  # not relevant
+        ]
+    @pytest.fixture
+    def perfect_run(self):
+        """Run that ranks all relevant docs at top."""
+        return [
+            ir_measures.ScoredDoc("q1", "d1", 1.0),
+            ir_measures.ScoredDoc("q1", "d4", 0.9),
+            ir_measures.ScoredDoc("q1", "d2", 0.8),
+            ir_measures.ScoredDoc("q1", "d3", 0.1),
+            ir_measures.ScoredDoc("q1", "d5", 0.05),
+        ]
+    @pytest.fixture
+    def worst_run(self):
+        """Run that ranks relevant docs at bottom."""
+        return [
+            ir_measures.ScoredDoc("q1", "d3", 1.0),
+            ir_measures.ScoredDoc("q1", "d5", 0.9),
+            ir_measures.ScoredDoc("q1", "d2", 0.5),
+            ir_measures.ScoredDoc("q1", "d4", 0.2),
+            ir_measures.ScoredDoc("q1", "d1", 0.1),
+        ]
+    def test_perfect_ndcg_at_10(self, sample_qrels, perfect_run):
+        """Perfect ranking should yield NDCG@10 = 1.0."""
+        result = ir_measures.calc_aggregate([nDCG@10], sample_qrels, perfect_run)
+        assert result[nDCG@10] == pytest.approx(1.0, abs=0.01)
+    def test_worst_ndcg_lower(self, sample_qrels, perfect_run, worst_run):
+        """Worst ranking should yield lower NDCG than perfect."""
+        perfect = ir_measures.calc_aggregate([nDCG@10], sample_qrels, perfect_run)
+        worst = ir_measures.calc_aggregate([nDCG@10], sample_qrels, worst_run)
+        assert worst[nDCG@10] < perfect[nDCG@10]
+    def test_recall_at_50_perfect(self, sample_qrels, perfect_run):
+        """Perfect run should retrieve all relevant docs."""
+        result = ir_measures.calc_aggregate([Recall@50], sample_qrels, perfect_run)
+        assert result[Recall@50] == pytest.approx(1.0, abs=0.01)
+    def test_empty_run_yields_zero(self, sample_qrels):
+        """Empty run should yield 0 for all metrics."""
+        empty_run = []
+        result = ir_measures.calc_aggregate(
+            [nDCG@10, Recall@50, P@10], sample_qrels, empty_run
+        )
+        assert result[nDCG@10] == 0.0
+        assert result[Recall@50] == 0.0
+        assert result[P@10] == 0.0
+    def test_per_query_results(self, sample_qrels, perfect_run):
+        """Per-query results should return one entry per query."""
+        results = list(ir_measures.iter_calc(
+            [nDCG@10], sample_qrels, perfect_run
+        ))
+        assert len(results) == 1  # Only q1
+        assert results[0].query_id == "q1"
+    def test_trec_run_format_conversion(self):
+        """Test TrialPath results to TREC format conversion."""
+        results = {
+            "1": [
+                {"nct_id": "NCT001", "score": 0.95},
+                {"nct_id": "NCT002", "score": 0.80},
+            ]
+        }
+        run_str = convert_trialpath_to_trec_run(results, "test-run")
+        lines = run_str.strip().split("\n")
+        assert len(lines) == 2
+        assert "NCT001" in lines[0]
+        assert "1" == lines[0].split()[3]  # rank 1
+        assert "2" == lines[1].split()[3]  # rank 2
+    def test_graded_relevance_evaluation(self, sample_qrels, perfect_run):
+        """Test strict eligible-only evaluation (rel=2)."""
+        strict = ir_measures.calc_aggregate(
+            [AP(rel=2)], sample_qrels, perfect_run
+        )
+        assert strict[AP(rel=2)] > 0.0
+    def test_qrels_dict_format(self):
+        """Test evaluation from dict format."""
+        qrels = {"q1": {"d1": 2, "d2": 1, "d3": 0}}
+        run = [
+            ir_measures.ScoredDoc("q1", "d1", 1.0),
+            ir_measures.ScoredDoc("q1", "d2", 0.5),
+            ir_measures.ScoredDoc("q1", "d3", 0.1),
+        ]
+        result = ir_measures.calc_aggregate([nDCG@10], qrels, run)
+        assert nDCG@10 in result
+```
+### 7.5 F1 计算测试
+```python
+# tests/test_extraction_f1.py
+import pytest
+from evaluation.extraction_eval import compute_field_level_f1
+class TestExtractionF1:
+    """Test F1 computation for field-level extraction."""
+    def test_perfect_extraction(self):
+        """All fields correctly extracted should yield F1=1.0."""
+        annotations = [{
+            "patient_id": "p1",
+            "noise_level": "clean",
+            "document_type": "clinical_letter",
+            "fields": [
+                {"field_name": "demographics.name", "ground_truth": "John", "extracted": "John", "correct": True},
+                {"field_name": "demographics.sex", "ground_truth": "male", "extracted": "male", "correct": True},
+                {"field_name": "diagnosis.primary", "ground_truth": "NSCLC", "extracted": "NSCLC", "correct": True},
+                {"field_name": "biomarkers.egfr", "ground_truth": "positive", "extracted": "positive", "correct": True},
+            ]
+        }]
+        result = compute_field_level_f1(annotations)
+        assert result["micro_f1"] == 1.0
+        assert result["pass"] is True
+    def test_zero_extraction(self):
+        """No correct extractions should yield F1=0."""
+        annotations = [{
+            "patient_id": "p1",
+            "noise_level": "clean",
+            "document_type": "clinical_letter",
+            "fields": [
+                {"field_name": "demographics.name", "ground_truth": "John", "extracted": "Jane", "correct": False},
+                {"field_name": "diagnosis.primary", "ground_truth": "NSCLC", "extracted": None, "correct": False},
+            ]
+        }]
+        result = compute_field_level_f1(annotations)
+        assert result["micro_f1"] == 0.0
+        assert result["pass"] is False
+    def test_partial_extraction(self):
+        """Partial extraction should yield 0 < F1 < 1."""
+        annotations = [{
+            "patient_id": "p1",
+            "noise_level": "mild",
+            "document_type": "clinical_letter",
+            "fields": [
+                {"field_name": "demographics.name", "ground_truth": "John", "extracted": "John", "correct": True},
+                {"field_name": "diagnosis.primary", "ground_truth": "NSCLC", "extracted": "lung ca", "correct": False},
+                {"field_name": "biomarkers.egfr", "ground_truth": "positive", "extracted": "positive", "correct": True},
+                {"field_name": "biomarkers.alk", "ground_truth": "negative", "extracted": None, "correct": False},
+            ]
+        }]
+        result = compute_field_level_f1(annotations)
+        assert 0.0 < result["micro_f1"] < 1.0
+    def test_f1_threshold_boundary(self):
+        """F1 exactly at 0.85 should pass."""
+        # Create annotations that produce exactly 0.85 F1
+        fields = []
+        for i in range(85):
+            fields.append({"field_name": f"field_{i}", "ground_truth": "val", "extracted": "val", "correct": True})
+        for i in range(15):
+            fields.append({"field_name": f"field_miss_{i}", "ground_truth": "val", "extracted": None, "correct": False})
+        annotations = [{"patient_id": "p1", "noise_level": "clean",
+                        "document_type": "test", "fields": fields}]
+        result = compute_field_level_f1(annotations)
+        # With 85/100 correct, F1 should be ~0.85
+        assert result["pass"] is True
+    def test_empty_annotations(self):
+        """Empty annotations should not crash."""
+        result = compute_field_level_f1([])
+        assert result["micro_f1"] == 0.0
+    def test_none_ground_truth_not_counted(self):
+        """Fields with None ground truth should be handled."""
+        annotations = [{
+            "patient_id": "p1",
+            "noise_level": "clean",
+            "document_type": "test",
+            "fields": [
+                {"field_name": "biomarkers.ros1", "ground_truth": None,
+                 "extracted": None, "correct": False},
+            ]
+        }]
+        result = compute_field_level_f1(annotations)
+        # Should not crash, though metrics may be 0
+        assert "micro_f1" in result
+```
+### 7.6 端到端管线测试
+```python
+# tests/test_e2e_pipeline.py
+import pytest
+from pathlib import Path
+class TestE2EPipeline:
+    """End-to-end tests for the complete data & evaluation pipeline."""
+    def test_fhir_to_profile_to_pdf_roundtrip(self, sample_fhir_file, tmp_path):
+        """FHIR → PatientProfile → PDF should complete without error."""
+        from data.generate_synthetic_patients import parse_fhir_bundle
+        from data.templates.clinical_letter import generate_clinical_letter
+        from dataclasses import asdict
+        # Step 1: Parse FHIR
+        profile = parse_fhir_bundle(Path(sample_fhir_file))
+        assert profile.patient_id != ""
+        # Step 2: Generate PDF
+        pdf_path = tmp_path / "test_roundtrip.pdf"
+        generate_clinical_letter(asdict(profile), str(pdf_path))
+        assert pdf_path.exists()
+        assert pdf_path.stat().st_size > 1000  # Reasonable PDF size
+    def test_noisy_pdf_pipeline(self, sample_profile, tmp_path):
+        """Profile → Noisy PDF should inject noise and produce valid PDF."""
+        from data.templates.clinical_letter import generate_clinical_letter
+        from data.noise.noise_injector import NoiseInjector
+        injector = NoiseInjector(noise_level="moderate", seed=42)
+        # Inject text noise into profile fields for PDF rendering
+        profile = sample_profile.copy()
+        dx_text = profile["diagnosis"]["primary"]
+        noisy_dx, records = injector.inject_text_noise(dx_text)
+        profile["diagnosis"]["primary"] = noisy_dx
+        pdf_path = tmp_path / "noisy.pdf"
+        generate_clinical_letter(profile, str(pdf_path))
+        assert pdf_path.exists()
+    def test_trec_evaluation_pipeline(self, tmp_path):
+        """Complete TREC evaluation from dicts should produce metrics."""
+        import ir_measures
+        from ir_measures import nDCG, Recall, P
+        qrels = [
+            ir_measures.Qrel("1", "NCT001", 2),
+            ir_measures.Qrel("1", "NCT002", 1),
+            ir_measures.Qrel("1", "NCT003", 0),
+        ]
+        run = [
+            ir_measures.ScoredDoc("1", "NCT001", 0.9),
+            ir_measures.ScoredDoc("1", "NCT002", 0.5),
+            ir_measures.ScoredDoc("1", "NCT003", 0.1),
+        ]
+        result = ir_measures.calc_aggregate(
+            [nDCG@10, Recall@50, P@10], qrels, run
+        )
+        assert nDCG@10 in result
+        assert Recall@50 in result
+        assert result[nDCG@10] > 0
+    def test_latency_tracker_integration(self):
+        """Latency tracker should record and summarize calls."""
+        import time
+        from evaluation.latency_cost_tracker import LatencyCostTracker
+        tracker = LatencyCostTracker()
+        tracker.start_session("test-patient")
+        with tracker.track_call("gemini", "search_anchors") as record:
+            time.sleep(0.01)  # Simulate API call
+            record.input_tokens = 500
+            record.output_tokens = 200
+        session = tracker.end_session()
+        assert session.total_latency_ms > 0
+        assert len(session.api_calls) == 1
+        summary = tracker.summary()
+        assert summary["n_sessions"] == 1
+        assert summary["latency"]["mean_s"] > 0
+```
+---
+## 8. 附录
+### 8.1 数据格式规范
+#### PatientProfile v1 JSON Schema
+```json
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "type": "object",
+  "required": ["patient_id", "demographics", "diagnosis"],
+  "properties": {
+    "patient_id": {"type": "string"},
+    "demographics": {
+      "type": "object",
+      "properties": {
+        "name": {"type": "string"},
+        "sex": {"type": "string", "enum": ["male", "female"]},
+        "date_of_birth": {"type": "string", "format": "date"},
+        "age": {"type": "integer"},
+        "state": {"type": "string"}
+      }
+    },
+    "diagnosis": {
+      "type": "object",
+      "properties": {
+        "primary": {"type": "string"},
+        "stage": {"type": ["string", "null"]},
+        "histology": {"type": ["string", "null"]},
+        "diagnosis_date": {"type": "string", "format": "date"}
+      }
+    },
+    "biomarkers": {
+      "type": "object",
+      "properties": {
+        "egfr": {"type": ["string", "null"]},
+        "alk": {"type": ["string", "null"]},
+        "pdl1_tps": {"type": ["string", "null"]},
+        "kras": {"type": ["string", "null"]},
+        "ros1": {"type": ["string", "null"]}
+      }
+    },
+    "labs": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "properties": {
+          "name": {"type": "string"},
+          "value": {"type": "number"},
+          "unit": {"type": "string"},
+          "date": {"type": "string"},
+          "loinc_code": {"type": "string"}
+        }
+      }
+    },
+    "treatments": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "properties": {
+          "name": {"type": "string"},
+          "type": {"type": "string", "enum": ["medication", "procedure", "radiation"]},
+          "start_date": {"type": "string"},
+          "end_date": {"type": ["string", "null"]}
+        }
+      }
+    },
+    "unknowns": {"type": "array", "items": {"type": "string"}},
+    "evidence_spans": {"type": "array"}
+  }
+}
+```
+### 8.2 工具 API 参考
+#### ir_datasets
+| API | 说明 | 返回类型 |
+|-----|------|----------|
+| `ir_datasets.load("clinicaltrials/2021/trec-ct-2021")` | 加载 TREC CT 2021 数据集 | Dataset |
+| `dataset.queries_iter()` | 遍历 topics | GenericQuery(query_id, text) |
+| `dataset.qrels_iter()` | 遍历 qrels | TrecQrel(query_id, doc_id, relevance, iteration) |
+| `dataset.docs_iter()` | 遍历文档 | ClinicalTrialsDoc(doc_id, title, condition, summary, detailed_description, eligibility) |
+**数据集 ID：**
+- `clinicaltrials/2021/trec-ct-2021` — 75 queries, 35,832 qrels
+- `clinicaltrials/2021/trec-ct-2022` — 50 queries
+- `clinicaltrials/2021` — 376K 文档（基础集）
+#### ir-measures
+| API | 说明 |
+|-----|------|
+| `ir_measures.calc_aggregate(measures, qrels, run)` | 计算聚合指标 |
+| `ir_measures.iter_calc(measures, qrels, run)` | 逐查询指标迭代 |
+| `ir_measures.read_trec_qrels(path)` | 读取 TREC qrels 文件 |
+| `ir_measures.read_trec_run(path)` | 读取 TREC run 文件 |
+| `ir_measures.Qrel(qid, did, rel)` | 创建 qrel 记录 |
+| `ir_measures.ScoredDoc(qid, did, score)` | 创建评分文档记录 |
+**指标对象：**
+- `nDCG@10` — Normalized DCG at cutoff 10
+- `Recall@50` — Recall at cutoff 50
+- `P@10` — Precision at cutoff 10
+- `AP` — Average Precision
+- `AP(rel=2)` — AP with minimum relevance 2
+- `RR` — Reciprocal Rank
+#### scikit-learn 评估
+| API | 说明 |
+|-----|------|
+| `f1_score(y_true, y_pred, average=None)` | 逐类别 F1 |
+| `f1_score(y_true, y_pred, average='micro')` | 全局 micro F1 |
+| `f1_score(y_true, y_pred, average='macro')` | 逐类别平均 F1 |
+| `precision_score(y_true, y_pred)` | Precision |
+| `recall_score(y_true, y_pred)` | Recall |
+| `classification_report(y_true, y_pred)` | 完整分类报告 |
+| `confusion_matrix(y_true, y_pred)` | 混淆矩阵 |
+#### Synthea CLI
+| 参数 | 说明 | 示例 |
+|------|------|------|
+| `-p N` | 生成 N 个患者 | `-p 500` |
+| `-s SEED` | 随机种子 | `-s 42` |
+| `-m MODULE` | 指定疾病模块 | `-m lung_cancer` |
+| `STATE` | 指定州 | `Massachusetts` |
+| `--exporter.fhir.export` | 启用 FHIR R4 导出 | `=true` |
+| `--exporter.pretty_print` | 美化 JSON 输出 | `=true` |
+#### ReportLab 核心 API
+| 组件 | 说明 |
+|------|------|
+| `SimpleDocTemplate(path, pagesize=letter)` | 创建文档模板 |
+| `Paragraph(text, style)` | 段落流式组件 |
+| `Table(data, colWidths)` | 表格流式组件 |
+| `TableStyle(commands)` | 表格样式 |
+| `Spacer(width, height)` | 间距组件 |
+| `getSampleStyleSheet()` | 获取默认样式表 |
+#### Augraphy 降质管线
+| 组件 | 说明 |
+|------|------|
+| `AugraphyPipeline(ink_phase, paper_phase, post_phase)` | 完整降质管线 |
+| `InkBleed(p=0.5)` | 墨水渗透效果 |
+| `Letterpress(p=0.3)` | 活版印刷效果 |
+| `LowInkPeriodicLines(p=0.3)` | 低墨水周期性线条 |
+| `DirtyDrum(p=0.3)` | 脏鼓效果 |
+| `SubtleNoise(p=0.5)` | 微噪声 |
+| `Jpeg(p=0.5)` | JPEG 压缩伪影 |
+| `Brightness(p=0.5)` | 亮度变化 |
+### 8.3 Python 依赖清单
+```
+# requirements-data-eval.txt
+ir-datasets>=0.5.6
+ir-measures>=0.3.1
+reportlab>=4.0
+augraphy>=8.0
+Pillow>=10.0
+pdfplumber>=0.10
+scikit-learn>=1.3
+numpy>=1.24
+pandas>=2.0
+pdf2image>=1.16
+```
+### 8.4 成功指标速查表
+| 指标 | 目标值 | 评估工具 | 数据源 |
+|------|--------|----------|--------|
+| MedGemma Extraction F1 | >= 0.85 | scikit-learn `f1_score` | 合成患者 + Ground Truth |
+| Trial Retrieval Recall@50 | >= 0.75 | ir-measures `Recall@50` | TREC CT 2021/2022 |
+| Trial Ranking NDCG@10 | >= 0.60 | ir-measures `nDCG@10` | TREC CT 2021/2022 |
+| Criterion Decision Accuracy | >= 0.85 | Custom accuracy | 标注 EligibilityLedger |
+| Latency | < 15s | `LatencyCostTracker` | API call timing |
+| Cost | < $0.50/session | `LatencyCostTracker` | Token counting |

docs/tdd-guide-ux-frontend.md ADDED Viewed

	@@ -0,0 +1,1524 @@

+# TrialPath UX & Frontend TDD-Ready Implementation Guide
+> Generated from DeepWiki research on `streamlit/streamlit` and `emcie-co/parlant`, supplemented by official Parlant documentation (`parlant.io`).
+>
+> **Architecture Decisions:**
+> - Parlant runs as **independent service** (REST API mode), FE communicates via `ParlantClient` (httpx)
+> - Doctor packet export: **JSON + Markdown** (no PDF generation in PoC)
+> - MedGemma: **HF Inference Endpoint** (cloud, no local GPU)
+---
+## 1. Architecture Overview
+### 1.1 File Structure
+```
+app/
+  app.py                        # Entrypoint: st.navigation, shared sidebar, Parlant client init
+  pages/
+    1_upload.py                 # INGEST state: document upload + extraction trigger
+    2_profile_review.py         # PRESCREEN state: PatientProfile review + edit
+    3_trial_matching.py         # VALIDATE_TRIALS state: trial search + eligibility cards
+    4_gap_analysis.py           # GAP_FOLLOWUP state: gap analysis + iterative refinement
+    5_summary.py                # SUMMARY state: final report + doctor packet export
+  components/
+    file_uploader.py            # Multi-file PDF uploader component
+    profile_card.py             # PatientProfile display/edit component
+    trial_card.py               # Traffic-light eligibility card component
+    gap_card.py                 # Gap analysis action card component
+    progress_tracker.py         # Journey state progress indicator
+    chat_panel.py               # Parlant message panel (send/receive)
+    search_process.py           # Search refinement step-by-step visualization
+    disclaimer_banner.py        # Medical disclaimer banner (always visible)
+  services/
+    parlant_client.py           # Parlant REST API wrapper (sessions, events, agents)
+    state_manager.py            # Session state orchestration
+  tests/
+    test_upload_page.py
+    test_profile_review_page.py
+    test_trial_matching_page.py
+    test_gap_analysis_page.py
+    test_summary_page.py
+    test_components.py
+    test_parlant_client.py
+    test_state_manager.py
+```
+### 1.2 Module Dependency Graph
+```
+app.py
+  -> pages/* (via st.navigation)
+  -> services/parlant_client.py (Parlant REST API)
+  -> services/state_manager.py (session state orchestration)
+pages/*
+  -> components/* (UI building blocks)
+  -> services/parlant_client.py
+  -> services/state_manager.py
+components/*
+  -> st.session_state (read/write)
+services/parlant_client.py
+  -> parlant-client SDK or httpx (REST calls to Parlant server)
+services/state_manager.py
+  -> st.session_state
+  -> services/parlant_client.py
+```
+### 1.3 Key Dependencies
+| Package             | Purpose                                    |
+|---------------------|--------------------------------------------|
+| `streamlit>=1.40`   | Frontend framework, multipage app, AppTest |
+| `parlant-client`    | Python SDK for Parlant REST API            |
+| `httpx`             | Async HTTP client (fallback for Parlant)   |
+| `pytest`            | Test runner                                |
+---
+## 2. Streamlit Framework Guide
+### 2.1 Multipage App with `st.navigation`
+TrialPath uses the modern `st.navigation` API (not legacy `pages/` auto-discovery) for explicit page control tied to Journey states.
+**Pattern: Entrypoint with state-aware navigation**
+```python
+# app.py
+import streamlit as st
+from services.state_manager import get_current_journey_state
+st.set_page_config(page_title="TrialPath", page_icon=":material/medical_services:", layout="wide")
+# Define pages mapped to Journey states
+pages = {
+    "Patient Journey": [
+        st.Page("pages/1_upload.py", title="Upload Documents", icon=":material/upload_file:"),
+        st.Page("pages/2_profile_review.py", title="Review Profile", icon=":material/person:"),
+        st.Page("pages/3_trial_matching.py", title="Trial Matching", icon=":material/search:"),
+        st.Page("pages/4_gap_analysis.py", title="Gap Analysis", icon=":material/analytics:"),
+        st.Page("pages/5_summary.py", title="Summary & Export", icon=":material/summarize:"),
+    ]
+}
+current_page = st.navigation(pages)
+# Shared sidebar: progress tracker
+with st.sidebar:
+    st.markdown("### Journey Progress")
+    state = get_current_journey_state()
+    # Render progress indicator based on current Parlant Journey state
+current_page.run()
+```
+**Key API details (from DeepWiki):**
+- `st.navigation(pages, position="sidebar")` returns the current `StreamlitPage`, must call `.run()`.
+- `st.switch_page("pages/2_profile_review.py")` for programmatic navigation (stops current page execution).
+- `st.page_link(page, label, icon)` for clickable navigation links.
+- Pages organized as dict = sections in sidebar nav.
+### 2.2 File Upload (`st.file_uploader`)
+**Pattern: Multi-file PDF upload with validation**
+```python
+# components/file_uploader.py
+import streamlit as st
+from typing import List
+def render_file_uploader() -> List:
+    """Render multi-file uploader for clinical documents."""
+    uploaded_files = st.file_uploader(
+        "Upload clinical documents (PDF)",
+        type=["pdf", "png", "jpg", "jpeg"],
+        accept_multiple_files=True,
+        key="clinical_docs_uploader",
+        help="Upload clinic letters, pathology reports, lab results",
+    )
+    if uploaded_files:
+        st.success(f"{len(uploaded_files)} file(s) uploaded")
+        for f in uploaded_files:
+            st.caption(f"{f.name} ({f.size / 1024:.1f} KB)")
+    return uploaded_files or []
+```
+**Key API details (from DeepWiki):**
+- `accept_multiple_files=True` returns `List[UploadedFile]`.
+- `UploadedFile` extends `io.BytesIO` -- can be passed directly to PDF parsers.
+- Default size limit: 200 MB per file (configurable via `server.maxUploadSize` in `config.toml` or per-widget `max_upload_size` param).
+- `type` parameter is best-effort filtering, not a security guarantee.
+- Files are held in memory after upload.
+- Additive selection: clicking browse again adds files, does not replace.
+### 2.3 Session State Management
+**Pattern: Centralized state initialization**
+```python
+# services/state_manager.py
+import streamlit as st
+JOURNEY_STATES = ["INGEST", "PRESCREEN", "VALIDATE_TRIALS", "GAP_FOLLOWUP", "SUMMARY"]
+def init_session_state():
+    """Initialize all session state variables with defaults."""
+    defaults = {
+        "journey_state": "INGEST",
+        "parlant_session_id": None,
+        "parlant_agent_id": None,
+        "patient_profile": None,       # PatientProfile dict
+        "uploaded_files": [],
+        "search_anchors": None,         # SearchAnchors dict
+        "trial_candidates": [],         # List[TrialCandidate]
+        "eligibility_ledger": [],       # List[EligibilityLedger]
+        "last_event_offset": 0,         # For Parlant long-polling
+    }
+    for key, default_value in defaults.items():
+        if key not in st.session_state:
+            st.session_state[key] = default_value
+def get_current_journey_state() -> str:
+    return st.session_state.get("journey_state", "INGEST")
+def advance_journey(target_state: str):
+    """Advance Journey to target state with validation."""
+    current_idx = JOURNEY_STATES.index(st.session_state.journey_state)
+    target_idx = JOURNEY_STATES.index(target_state)
+    if target_idx > current_idx:
+        st.session_state.journey_state = target_state
+```
+**Key API details (from DeepWiki):**
+- `st.session_state` is a `SessionStateProxy` wrapping thread-safe `SafeSessionState`.
+- Internal three-layer dict: `_old_state` (previous run), `_new_session_state` (user-set), `_new_widget_state` (widget values).
+- Cannot modify widget-bound state after widget instantiation in same run (raises `StreamlitAPIException`).
+- Widget `key` parameter maps to `st.session_state[key]` for read access.
+- Values must be pickle-serializable.
+### 2.4 Real-Time Progress Feedback
+**Pattern: AI inference progress with `st.status`**
+```python
+# Usage in pages/1_upload.py
+def run_extraction(uploaded_files):
+    """Run MedGemma extraction with real-time status feedback."""
+    with st.status("Extracting clinical data from documents...", expanded=True) as status:
+        st.write("Reading uploaded documents...")
+        # Step 1: Send files to MedGemma
+        st.write("Running AI extraction (MedGemma 4B)...")
+        # Step 2: Poll for results
+        st.write("Building patient profile...")
+        # Step 3: Parse results into PatientProfile
+        status.update(label="Extraction complete!", state="complete")
+```
+**Pattern: Streaming LLM output with `st.write_stream`**
+```python
+def stream_gap_analysis(generator):
+    """Stream Gemini gap analysis output with typewriter effect."""
+    st.write_stream(generator)
+```
+**Pattern: Auto-refreshing fragment for Parlant events**
+```python
+@st.fragment(run_every=3)  # Poll every 3 seconds
+def parlant_event_listener():
+    """Fragment that polls Parlant for new events without full page rerun."""
+    from services.parlant_client import poll_events
+    new_events = poll_events(
+        st.session_state.parlant_session_id,
+        st.session_state.last_event_offset
+    )
+    if new_events:
+        for event in new_events:
+            if event["kind"] == "message" and event["source"] == "ai_agent":
+                st.chat_message("assistant").write(event["message"])
+            elif event["kind"] == "status":
+                st.caption(f"Agent status: {event['data']}")
+        st.session_state.last_event_offset = new_events[-1]["offset"] + 1
+```
+**Key API details (from DeepWiki):**
+- `st.status(label, expanded, state)` -- context manager, auto-completes. States: `"running"`, `"complete"`, `"error"`.
+- `st.spinner(text, show_time=True)` -- simple loading indicator.
+- `st.progress(value, text)` -- 0-100 int or 0.0-1.0 float.
+- `st.toast(body, icon, duration)` -- transient notification, top-right.
+- `st.write_stream(generator)` -- typewriter effect for strings, `st.write` for other types. Supports OpenAI `ChatCompletionChunk` and LangChain `AIMessageChunk`.
+- `@st.fragment(run_every=N)` -- partial rerun every N seconds, isolated from full app rerun.
+- `st.rerun(scope="fragment")` -- rerun only the enclosing fragment.
+### 2.5 Layout System (from DeepWiki `streamlit/streamlit`)
+**Layout primitives for TrialPath UI:**
+| Primitive | Purpose in TrialPath | Key Params |
+|-----------|---------------------|------------|
+| `st.columns(spec)` | Trial card grid, profile fields side-by-side | `spec` (int or list of ratios), `gap`, `vertical_alignment` |
+| `st.tabs(labels)` | Switching between trial categories (Eligible/Borderline/Not Eligible) | Returns list of containers |
+| `st.expander(label)` | Collapsible criterion detail, evidence citations | `expanded` (bool), `icon` |
+| `st.container(height, border)` | Scrollable trial list, chat panel | `height` (int px), `horizontal` (bool) |
+| `st.empty()` | Dynamic status updates, replacing content | Single-element, replaceable |
+**Layout composition pattern for trial cards:**
+```python
+# Trial matching page layout
+tabs = st.tabs(["Eligible", "Borderline", "Not Eligible", "Unknown"])
+with tabs[0]:  # Eligible trials
+    for trial in eligible_trials:
+        with st.expander(f"{trial['nct_id']} - {trial['title']}", expanded=False):
+            cols = st.columns([0.7, 0.3])
+            with cols[0]:
+                st.markdown(f"**Phase**: {trial['phase']}")
+                st.markdown(f"**Sponsor**: {trial['sponsor']}")
+            with cols[1]:
+                # Traffic light summary
+                met = sum(1 for c in trial['criteria'] if c['status'] == 'MET')
+                total = len(trial['criteria'])
+                st.metric("Criteria Met", f"{met}/{total}")
+            # Criterion-level detail
+            for criterion in trial['criteria']:
+                col1, col2 = st.columns([0.8, 0.2])
+                with col1:
+                    st.write(criterion['description'])
+                with col2:
+                    color_map = {"MET": "green", "NOT_MET": "red", "BORDERLINE": "orange", "UNKNOWN": "grey"}
+                    st.markdown(f":{color_map[criterion['status']]}[{criterion['status']}]")
+```
+**Responsive behavior:**
+- `st.columns` stacks vertically at viewport width <= 640px.
+- Use `width="stretch"` for elements to fill available space.
+- Avoid nesting columns more than once.
+- Scrolling containers: avoid heights > 500px for mobile.
+### 2.6 Caching System (from DeepWiki `streamlit/streamlit`)
+**Two caching decorators:**
+| Decorator | Returns | Serialization | Use Case |
+|-----------|---------|---------------|----------|
+| `@st.cache_data` | Copy of cached value | Requires pickle | Data transformations, API responses, search results |
+| `@st.cache_resource` | Shared instance (singleton) | No pickle needed | ParlantClient instance, HTTP clients, model objects |
+**TrialPath caching patterns:**
+```python
+@st.cache_resource
+def get_parlant_client() -> ParlantClient:
+    """Singleton Parlant client shared across all sessions."""
+    return ParlantClient(base_url=os.environ.get("PARLANT_URL", "http://localhost:8000"))
+@st.cache_data(ttl=300)  # 5-minute TTL
+def search_trials(query_params: dict) -> list:
+    """Cache trial search results to avoid redundant MCP calls."""
+    client = get_parlant_client()
+    # ... perform search
+    return results
+```
+**Key details:**
+- Cache key = hash of (function source code + arguments).
+- `ttl` (time-to-live): auto-expire entries. Use for API results that may change.
+- `max_entries`: limit cache size.
+- `hash_funcs`: custom hash for unhashable args.
+- Prefix arg with `_` to exclude from hash (e.g., `_client`).
+- `@st.cache_resource` objects are shared across ALL sessions/threads -- must be thread-safe.
+- Do NOT call interactive widgets inside cached functions (triggers warning).
+- Cache invalidated on: argument change, source code change, TTL expiry, `max_entries` overflow, explicit `.clear()`.
+### 2.7 Global Disclaimer Banner (PRD Section 9)
+Every page must display a medical disclaimer. Implement as a shared component called from `app.py` before navigation.
+**Pattern: Global disclaimer in entrypoint**
+```python
+# app/app.py (add before st.navigation)
+from components.disclaimer_banner import render_disclaimer
+# Always render disclaimer at top of every page
+render_disclaimer()
+nav = st.navigation(pages)
+nav.run()
+```
+**Component: disclaimer_banner.py**
+```python
+# app/components/disclaimer_banner.py
+import streamlit as st
+DISCLAIMER_TEXT = (
+    "This tool provides information for educational purposes only and does not "
+    "constitute medical advice. Always consult your healthcare provider before "
+    "making decisions about clinical trial participation."
+)
+def render_disclaimer():
+    """Render medical disclaimer banner. Must appear on every page."""
+    st.info(DISCLAIMER_TEXT, icon="ℹ️")
+```
+---
+## 3. Parlant Frontend Integration Guide
+### 3.1 Architecture: Asynchronous Event-Driven Model
+Parlant uses an **asynchronous, event-driven** conversation model -- NOT traditional request-reply. Both customer and AI agent can post events to a session at any time.
+**Core concepts:**
+- **Session** = timeline of all events (messages, status updates, tool calls, custom events)
+- **Event** = timestamped item with `offset`, `kind`, `source`, `trace_id`
+- **Long-polling** = client polls for new events with `min_offset` and `wait_for_data` timeout
+### 3.2 REST API Endpoints
+| Method | Path                                     | Purpose                                |
+|--------|------------------------------------------|----------------------------------------|
+| POST   | `/agents`                                | Create agent                           |
+| POST   | `/sessions`                              | Create session (agent + customer)      |
+| GET    | `/sessions`                              | List sessions (filter by agent/customer, paginated) |
+| POST   | `/sessions/{id}/events`                  | Send message/event                     |
+| GET    | `/sessions/{id}/events`                  | List/poll events (long-polling)        |
+| PATCH  | `/sessions/{id}/events/{event_id}`       | Update event metadata                  |
+**Create Event request schema** (`EventCreationParamsDTO`):
+- `kind`: `"message"` | `"custom"` | `"status"`
+- `source`: `"customer"` | `"human_agent"` | `"customer_ui"`
+- `message`: string (for message events)
+- `data`: dict (for custom/status events)
+- `metadata`: dict (optional)
+**List Events query params:**
+- `min_offset`: int -- only return events after this offset
+- `wait_for_data`: int (seconds) -- long-poll timeout; returns `504` if no new events
+- `source`, `correlation_id`, `trace_id`, `kinds`: optional filters
+### 3.3 Parlant Client Service
+```python
+# services/parlant_client.py
+import httpx
+from typing import Optional
+PARLANT_BASE_URL = "http://localhost:8000"
+class ParlantClient:
+    """Synchronous wrapper around Parlant REST API for Streamlit."""
+    def __init__(self, base_url: str = PARLANT_BASE_URL):
+        self.base_url = base_url
+        self.http = httpx.Client(base_url=base_url, timeout=65.0)  # > long-poll timeout
+    def create_agent(self, name: str, description: str = "") -> dict:
+        resp = self.http.post("/agents", json={"name": name, "description": description})
+        resp.raise_for_status()
+        return resp.json()
+    def create_session(self, agent_id: str, customer_id: Optional[str] = None) -> dict:
+        payload = {"agent_id": agent_id}
+        if customer_id:
+            payload["customer_id"] = customer_id
+        resp = self.http.post("/sessions", json=payload)
+        resp.raise_for_status()
+        return resp.json()
+    def send_message(self, session_id: str, message: str) -> dict:
+        resp = self.http.post(
+            f"/sessions/{session_id}/events",
+            json={"kind": "message", "source": "customer", "message": message}
+        )
+        resp.raise_for_status()
+        return resp.json()
+    def send_custom_event(self, session_id: str, event_type: str, data: dict) -> dict:
+        """Send custom event (e.g., journey state change, file upload notification)."""
+        resp = self.http.post(
+            f"/sessions/{session_id}/events",
+            json={"kind": "custom", "source": "customer_ui", "data": {"type": event_type, **data}}
+        )
+        resp.raise_for_status()
+        return resp.json()
+    def poll_events(self, session_id: str, min_offset: int = 0, wait_seconds: int = 60) -> list:
+        resp = self.http.get(
+            f"/sessions/{session_id}/events",
+            params={"min_offset": min_offset, "wait_for_data": wait_seconds}
+        )
+        resp.raise_for_status()
+        return resp.json()
+```
+### 3.4 Event Types Reference
+| Kind      | Source(s)                      | Description                          |
+|-----------|--------------------------------|--------------------------------------|
+| message   | customer, ai_agent             | Text message from participant        |
+| status    | ai_agent                       | Agent state: acknowledged, processing, typing, ready, error, cancelled |
+| tool      | ai_agent                       | Tool call result (MedGemma, MCP)     |
+| custom    | customer_ui, system            | App-defined (journey state, uploads) |
+### 3.5 Journey State Synchronization
+Map Parlant events to TrialPath Journey states:
+```python
+# services/state_manager.py (continued)
+JOURNEY_CUSTOM_EVENTS = {
+    "extraction_complete": "PRESCREEN",
+    "profile_confirmed": "VALIDATE_TRIALS",
+    "trials_evaluated": "GAP_FOLLOWUP",
+    "gaps_resolved": "SUMMARY",
+}
+def handle_parlant_event(event: dict):
+    """Process incoming Parlant event and update Journey state if needed."""
+    if event["kind"] == "custom" and event.get("data", {}).get("type") in JOURNEY_CUSTOM_EVENTS:
+        new_state = JOURNEY_CUSTOM_EVENTS[event["data"]["type"]]
+        advance_journey(new_state)
+    elif event["kind"] == "status" and event.get("data") == "error":
+        st.session_state["last_error"] = event.get("message", "Unknown error")
+```
+### 3.6 Parlant Journey System (from DeepWiki `emcie-co/parlant`)
+Parlant's Journey System defines structured multi-step interaction flows. This is the core mechanism for implementing TrialPath's 5-state patient workflow.
+**Journey state types:**
+- **Chat State** -- agent converses with customer, guided by state's `action`. Can stay for multiple turns.
+- **Tool State** -- agent calls external tool, result loaded into context. Must be followed by a chat state.
+- **Fork State** -- agent evaluates conditions and branches the flow.
+**TrialPath Journey definition pattern:**
+```python
+import parlant as p
+async def create_trialpath_journey(agent: p.Agent):
+    journey = await agent.create_journey(
+        title="Clinical Trial Matching",
+        conditions=["The patient wants to find matching clinical trials"],
+        description="Guide NSCLC patients through clinical trial matching: "
+                    "document upload, profile extraction, trial search, "
+                    "eligibility analysis, and gap identification.",
+    )
+    # INGEST: Upload and extract
+    t1 = await journey.initial_state.transition_to(
+        chat_state="Ask patient to upload clinical documents (clinic letters, pathology reports, lab results)"
+    )
+    # Tool state: Run MedGemma extraction
+    t2a = await t1.target.transition_to(
+        condition="Documents uploaded",
+        tool_state=extract_patient_profile  # MedGemma tool
+    )
+    # PRESCREEN: Review extracted profile
+    t2b = await t2a.target.transition_to(
+        chat_state="Present extracted PatientProfile for review and confirmation"
+    )
+    # Tool state: Search trials via MCP
+    t3a = await t2b.target.transition_to(
+        condition="Profile confirmed",
+        tool_state=search_clinical_trials  # ClinicalTrials MCP tool
+    )
+    # VALIDATE_TRIALS: Show results with eligibility
+    t3b = await t3a.target.transition_to(
+        chat_state="Present trial matches with criterion-level eligibility assessment"
+    )
+    # GAP_FOLLOWUP: Identify gaps and suggest actions
+    t4 = await t3b.target.transition_to(
+        condition="Trials evaluated",
+        chat_state="Analyze eligibility gaps and suggest next steps "
+                   "(additional tests, document uploads)"
+    )
+    # Loop back if new documents uploaded
+    await t4.target.transition_to(
+        condition="New documents uploaded for gap resolution",
+        state=t2a.target  # Back to extraction
+    )
+    # SUMMARY: Final report
+    t5 = await t4.target.transition_to(
+        condition="Gaps resolved or patient ready for summary",
+        chat_state="Generate summary report and doctor packet"
+    )
+```
+**Key details (from DeepWiki):**
+- Journeys are activated by `conditions` (observational guidelines matched by `GuidelineMatcher`).
+- Transitions can be **direct** (always taken) or **conditional** (only if condition met).
+- Can transition to existing states (for loops, e.g., gap resolution cycle).
+- `END_JOURNEY` is a special terminal state.
+- Journeys dynamically manage LLM context to include only relevant guidelines at each state.
+### 3.7 Parlant Guideline System (from DeepWiki `emcie-co/parlant`)
+Guidelines define behavioral rules for agents. Two types:
+| Type | Has Action? | Purpose |
+|------|-------------|---------|
+| Observational | No | Track conditions, activate journeys |
+| Actionable | Yes | Drive agent behavior when condition is met |
+**Journey-scoped vs Global guidelines:**
+- **Global** guidelines apply across all conversations.
+- **Journey-scoped** guidelines are only active when their parent journey is active. Created via `journey.create_guideline()`.
+**TrialPath guideline examples:**
+```python
+# Global guideline: always cite evidence
+await agent.create_guideline(
+    condition="the agent makes a clinical assessment",
+    action="cite the source document, page number, and relevant text span"
+)
+# Journey-scoped: only during VALIDATE_TRIALS
+await journey.create_guideline(
+    condition="a criterion cannot be evaluated due to missing data",
+    action="mark it as UNKNOWN and add to the gap list with the specific data needed"
+)
+```
+**Matching pipeline** (from DeepWiki): GuidelineMatcher uses LLM-based evaluation with multiple batch types (observational, actionable, low-criticality, disambiguation, journey-node-selection) to determine which guidelines apply to the current conversation context.
+### 3.8 Parlant Tool Integration (from DeepWiki `emcie-co/parlant`)
+Parlant supports 4 tool service types: `local`, `sdk`/plugin, `openapi`, and `mcp`.
+**TrialPath will use:**
+- **SDK/Plugin tools** for MedGemma extraction
+- **MCP tools** for ClinicalTrials.gov search
+**Tool definition with `@p.tool` decorator:**
+```python
+@p.tool
+async def extract_patient_profile(
+    context: p.ToolContext,
+    document_urls: list[str],
+) -> p.ToolResult:
+    """Extract patient clinical profile from uploaded documents using MedGemma 4B.
+    Args:
+        document_urls: List of URLs/paths to uploaded clinical documents.
+    """
+    # Call MedGemma endpoint
+    profile = await call_medgemma(document_urls)
+    return p.ToolResult(
+        data=profile,
+        metadata={"source": "MedGemma 4B", "doc_count": len(document_urls)},
+    )
+```
+**Tool execution flow** (from DeepWiki):
+1. GuidelineMatch identifies tools associated with matched guidelines
+2. ToolCaller resolves tool parameters from ServiceRegistry
+3. ToolCallBatcher groups tools for efficient LLM inference
+4. LLM infers tool arguments from conversation context
+5. ToolService.call_tool() executes and returns ToolResult
+6. ToolEventGenerator emits ToolEvent to session
+**ToolResult structure:**
+- `data` -- visible to agent for further processing
+- `metadata` -- frontend-only info (not used by agent)
+- `control` -- processing options: `mode` (auto/manual), `lifespan` (response/session)
+### 3.9 Parlant NLP Provider: Gemini (from DeepWiki `emcie-co/parlant`)
+Parlant natively supports Google Gemini, which aligns with TrialPath's planned use of Gemini 3 Pro.
+**Configuration:**
+```bash
+# Install with Gemini support
+pip install parlant[gemini]
+# Set API key
+export GEMINI_API_KEY="your-api-key"
+# Start server with Gemini backend
+parlant-server --gemini
+```
+**Supported providers** (from DeepWiki): OpenAI, Anthropic, Azure, AWS Bedrock, Google Gemini, Vertex AI, Together.ai, LiteLLM, Cerebras, DeepSeek, Ollama, Mistral, and more.
+**Vertex AI alternative** -- for production, can use `pip install parlant[vertex]` with `VERTEX_AI_MODEL=gemini-2.5-pro`.
+### 3.10 AlphaEngine Processing Pipeline (from DeepWiki `emcie-co/parlant`)
+This is the complete flow from customer message to agent response. Critical for understanding latency and UI feedback points.
+**Step-by-step pipeline:**
+```
+1. EVENT CREATION
+   Customer sends message -> POST /sessions/{id}/events
+   -> SessionModule creates event, dispatches background processing
+2. CONTEXT LOADING
+   AlphaEngine.process() loads:
+   - Session history (interaction events)
+   - Agent identity + description
+   - Customer info
+   - Context variables (per-customer/per-tag/global)
+   -> Assembled into EngineContext
+3. PREPARATION LOOP (while not prepared_to_respond)
+   a. GUIDELINE MATCHING
+      GuidelineMatcher evaluates guidelines against conversation context
+      - Observational guidelines (track conditions)
+      - Actionable guidelines (drive behavior)
+      - Journey-node guidelines (determine next journey step)
+      Uses LLM to score relevance -> GuidelineMatch objects
+   b. TOOL CALLING (if guidelines require tools)
+      ToolCaller resolves + executes tools
+      - ToolCallBatcher groups for efficient LLM inference
+      - LLM infers arguments from context
+      - ToolService.call_tool() executes
+      - ToolEventGenerator emits ToolEvent to session
+      -> Tool results may trigger re-evaluation of guidelines
+4. PREAMBLE GENERATION (optional)
+   Quick acknowledgment for perceived responsiveness
+   -> Emitted as early status event ("acknowledged" / "processing")
+5. MESSAGE COMPOSITION
+   Based on agent's CompositionMode:
+   - FLUID: MessageGenerator builds prompt, generates via SchematicGenerator
+     -> Revision loop with temperature-based retries
+   - CANNED_STRICT: Only uses predefined templates
+   - CANNED_COMPOSITED: Mimics style of canned responses
+   - CANNED_FLUID: Prefers canned but falls back to fluid
+6. EVENT EMISSION
+   Generated message -> emitted as message event
+   "ready" status event signals completion
+```
+**UI feedback mapping for TrialPath:**
+| Pipeline Step | Parlant Status Event | UI Feedback |
+|---------------|---------------------|-------------|
+| Event created | `acknowledged` | "Message received" indicator |
+| Context loading | `processing` | `st.status("Analyzing your request...")` |
+| Tool calling | `tool` events | `st.status("Searching ClinicalTrials.gov...")` |
+| Message generation | `typing` | Typing indicator animation |
+| Complete | `ready` | Display agent response |
+| Error | `error` | `st.error()` with retry option |
+### 3.11 Context Variables (from DeepWiki `emcie-co/parlant`)
+Context variables store dynamic data that agents can reference during conversations. Essential for TrialPath to maintain patient profile state across the journey.
+**Variable scoping (priority order):**
+1. Customer-specific values (per patient)
+2. Tag-specific values (e.g., per disease type)
+3. Global defaults
+**TrialPath context variable examples:**
+```python
+# Create context variables for patient data
+patient_profile_var = await client.context_variables.create(
+    name="patient_profile",
+    description="Current patient clinical profile extracted from documents",
+)
+# Set per-customer value
+await client.context_variables.set_value(
+    variable_id=patient_profile_var.id,
+    key=customer_id,  # Per-patient
+    value=patient_profile_dict,
+)
+# Auto-refresh variable via tool (with freshness rules)
+trial_results_var = await client.context_variables.create(
+    name="matching_trials",
+    description="Current list of matching clinical trials",
+    tool_id=search_trials_tool_id,
+    freshness_rules="*/10 * * * *",  # Refresh every 10 minutes
+)
+```
+**Key details:**
+- Values are JSON-serializable.
+- Included in PromptBuilder's `add_context_variables` section for LLM context.
+- Can be auto-refreshed via associated tools + cron-based `freshness_rules`.
+- `ContextVariableStore.GLOBAL_KEY` for default values.
+### 3.12 MCP Tool Service Details (from DeepWiki `emcie-co/parlant`)
+Parlant has native MCP support via `MCPToolClient`. This is how TrialPath connects to ClinicalTrials.gov.
+**Registration:**
+```python
+# Via REST API
+PUT /services/clinicaltrials_mcp
+{
+    "kind": "mcp",
+    "mcp": {
+        "url": "http://localhost:8080"
+    }
+}
+```
+```bash
+# Via CLI
+parlant service create \
+    --name clinicaltrials_mcp \
+    --kind mcp \
+    --url http://localhost:8080
+```
+**MCPToolClient internals:**
+- Connects via `StreamableHttpTransport` to MCP server's `/mcp` endpoint.
+- `list_tools()` discovers available tools from MCP server.
+- `mcp_tool_to_parlant_tool()` converts MCP tool schemas to Parlant's `Tool` objects.
+- Type mapping: `string`, `integer`, `number`, `boolean`, `date`, `datetime`, `uuid`, `array`, `enum`.
+- `call_tool()` invokes MCP tool, extracts text content from result, wraps in `ToolResult`.
+- Default MCP port: `8181`.
+**Integration with Guideline System:**
+```python
+# Associate MCP tool with a guideline
+search_guideline = await agent.create_guideline(
+    condition="the patient profile has been confirmed and trial search is needed",
+    action="search ClinicalTrials.gov for matching NSCLC trials using the patient's biomarkers and staging",
+    tools=[clinicaltrials_search_tool],  # MCP tool reference
+)
+```
+### 3.13 Prompt Construction (from DeepWiki `emcie-co/parlant`)
+Understanding how Parlant builds LLM prompts is essential for designing effective guidelines and journey states.
+**PromptBuilder sections (in order):**
+| Section | Content | TrialPath Relevance |
+|---------|---------|-------------------|
+| General Instructions | Task description, role | Define clinical trial matching context |
+| Agent Identity | Agent name + description | "patient_trial_copilot" identity |
+| Customer Identity | Customer name, session ID | Patient identifier |
+| Context Variables | Dynamic data (JSON) | PatientProfile, SearchAnchors, prior results |
+| Glossary | Domain terms | NSCLC, ECOG, biomarker definitions |
+| Capabilities | What agent can do | Tool descriptions (MedGemma, MCP) |
+| Interaction History | Conversation events | Full chat history with tool results |
+| Guidelines | Matched condition/action pairs | Active behavioral rules for current state |
+| Journey State | Current position in journey | Which step in INGEST->SUMMARY flow |
+| Few-shot Examples | Desired output format | Example eligibility assessments |
+| Staged Tool Events | Pending/completed tool results | MedGemma extraction results, MCP search results |
+**Context window management:**
+- GuidelineMatcher selectively loads only relevant guidelines and journeys.
+- Journey-scoped guidelines only included when journey is active.
+- Prevents context bloat by pruning low-probability journey guidelines.
+### 3.14 Parlant Testing Framework (from DeepWiki `emcie-co/parlant`)
+Parlant provides a dedicated testing framework with NLP-based assertions (LLM-as-a-Judge).
+**Key test utilities:**
+| Class | Purpose |
+|-------|---------|
+| `Suite` | Test runner, manages server connection and scenarios |
+| `Session` | Test session context manager |
+| `Response` | Agent response with `.should()` assertion |
+| `InteractionBuilder` | Build conversation history for preloading |
+| `CustomerMessage` / `AgentMessage` | Step types for conversation construction |
+**TrialPath test examples:**
+```python
+from parlant.testing import Suite, InteractionBuilder
+from parlant.testing.steps import AgentMessage, CustomerMessage
+suite = Suite(
+    server_url="http://localhost:8800",
+    agent_id="patient_trial_copilot"
+)
+@suite.scenario
+async def test_extraction_journey_step():
+    """Test that agent asks for documents in INGEST state."""
+    async with suite.session() as session:
+        response = await session.send("I want to find clinical trials for my lung cancer")
+        await response.should("ask the patient to upload clinical documents")
+        await response.should("mention accepted file types like PDF or images")
+@suite.scenario
+async def test_gap_analysis_identifies_missing_data():
+    """Test gap analysis identifies unknown biomarkers."""
+    async with suite.session() as session:
+        # Preload history simulating completed extraction + matching
+        history = (
+            InteractionBuilder()
+            .step(CustomerMessage("Here are my medical documents"))
+            .step(AgentMessage("I've extracted your profile. You have NSCLC Stage IIIB, "
+                             "EGFR positive, but KRAS status is unknown."))
+            .step(CustomerMessage("What trials am I eligible for?"))
+            .step(AgentMessage("I found 5 trials. For NCT04000005, KRAS status is required "
+                             "but missing from your records."))
+            .build()
+        )
+        await session.add_events(history)
+        response = await session.send("What should I do about the missing KRAS test?")
+        await response.should("suggest getting a KRAS mutation test")
+        await response.should("explain which trials require KRAS status")
+@suite.scenario
+async def test_multi_turn_journey_flow():
+    """Test complete journey flow with unfold()."""
+    async with suite.session() as session:
+        await session.unfold([
+            CustomerMessage("I have NSCLC and want to find trials"),
+            AgentMessage(
+                text="I'd be happy to help. Please upload your clinical documents.",
+                should="ask for document upload",
+            ),
+            CustomerMessage("I've uploaded my pathology report"),
+            AgentMessage(
+                text="I've extracted your profile...",
+                should=["confirm profile extraction", "present key findings"],
+            ),
+            CustomerMessage("That looks correct, please search for trials"),
+            AgentMessage(
+                text="I found 8 matching trials...",
+                should=["present trial matches", "include eligibility assessment"],
+            ),
+        ])
+```
+**Running tests:**
+```bash
+parlant-test tests/           # Run all test files
+parlant-test tests/ -k gap    # Filter by pattern
+parlant-test tests/ -n 4      # Run in parallel
+```
+### 3.15 Canned Response System (from DeepWiki `emcie-co/parlant`)
+Canned responses provide consistent, template-based messaging. Useful for TrialPath's structured outputs.
+**CompositionMode options:**
+| Mode | Behavior | TrialPath Use |
+|------|----------|--------------|
+| `FLUID` | Free-form LLM generation | General conversation, gap explanations |
+| `CANNED_STRICT` | Only predefined templates | Disclaimer text, safety warnings |
+| `CANNED_COMPOSITED` | Mimics canned style | Eligibility summaries |
+| `CANNED_FLUID` | Prefers canned, falls back to fluid | Standard responses with flexibility |
+**Journey-state-scoped canned responses:**
+```python
+# Canned response only active during SUMMARY state
+summary_template = await journey.create_canned_response(
+    value="Based on your clinical profile, you match {{match_count}} trials. "
+          "{{eligible_count}} are likely eligible, {{borderline_count}} are borderline, "
+          "and {{gap_count}} have unresolved gaps. "
+          "See the attached doctor packet for full details.",
+    fields=["match_count", "eligible_count", "borderline_count", "gap_count"],
+)
+```
+**Template features:**
+- Jinja2 syntax for dynamic fields (e.g., `{{std.customer.name}}`).
+- Fields auto-populated from tool results and context variables.
+- Relevance-scored matching via LLM when multiple templates exist.
+- `signals` and `metadata` for additional template categorization.
+---
+## 4. UI Component Design per Journey State
+### 4.1 INGEST State -- Upload Page
+```
++------------------------------------------+
+| [i] This tool is for information only... |
+| [Sidebar: Journey Progress]              |
+|                                          |
+| Upload Clinical Documents                |
+| +---------------------------------+      |
+| | Drag & drop or browse           |      |
+| | Accepted: PDF, PNG, JPG         |      |
+| +---------------------------------+      |
+|                                          |
+| Uploaded Files:                          |
+| - clinic_letter.pdf (245 KB)  [x]       |
+| - pathology_report.pdf (1.2 MB) [x]     |
+| - lab_results.png (890 KB) [x]          |
+|                                          |
+| [Start Extraction]                       |
+|                                          |
+| st.status: "Extracting clinical data..." |
+|   - Reading documents...                 |
+|   - Running MedGemma 4B...              |
+|   - Building patient profile...          |
++------------------------------------------+
+```
+**Key components:** `file_uploader`, `progress_tracker`
+### 4.2 PRESCREEN State -- Profile Review Page
+```
++------------------------------------------+
+| [i] This tool is for information only... |
+| [Sidebar: Journey Progress]              |
+|                                          |
+| Patient Clinical Profile                 |
+| +--------------------------------------+ |
+| | Demographics: Female, 62, ECOG 1     | |
+| | Diagnosis: NSCLC Stage IIIB          | |
+| | Histology: Adenocarcinoma            | |
+| | Biomarkers:                          | |
+| |   EGFR: Positive (exon 19 del)      | |
+| |   ALK: Negative                      | |
+| |   PD-L1: 45%                         | |
+| | Prior Treatment:                     | |
+| |   Carboplatin+Pemetrexed (2 cycles)  | |
+| | Unknowns:                            | |
+| |   [!] KRAS status not found          | |
+| |   [!] Brain MRI not available        | |
+| +--------------------------------------+ |
+|                                          |
+| [Edit Profile] [Confirm & Search Trials] |
+|                                          |
+| Searching ClinicalTrials.gov...          |
+| Step 1: Initial query -> 47 results      |
+|   Refining: adding Phase 3 filter...     |
+| Step 2: Refined query -> 12 results      |
+|   Shortlisting top candidates...         |
++------------------------------------------+
+```
+**Key components:** `profile_card`, `search_process`, `progress_tracker`
+### 4.3 VALIDATE_TRIALS State -- Trial Matching Page
+```
++------------------------------------------+
+| [i] This tool is for information only... |
+| [Sidebar: Journey Progress]              |
+|                                          |
+| Matching Trials (8 found)                |
+|                                          |
+| Search Process:                          |
+| Step 1: NSCLC + Stage IV + DE -> 47      |
+|   -> Refined: added Phase 3              |
+| Step 2: + Phase 3 -> 12 results          |
+|   -> Shortlisted: reading summaries      |
+| Step 3: 5 trials selected for review     |
+|   [Show/Hide Search Details]             |
+|                                          |
+| +--------------------------------------+ |
+| | NCT04000001 - KEYNOTE-999            | |
+| | Pembrolizumab + Chemo for NSCLC      | |
+| | Overall: LIKELY ELIGIBLE             | |
+| |                                      | |
+| | Criteria:                            | |
+| |  [G] NSCLC confirmed                 | |
+| |  [G] ECOG 0-1                        | |
+| |  [Y] PD-L1 >= 50% (yours: 45%)      | |
+| |  [R] No prior immunotherapy          | |
+| |  [?] Brain mets (unknown)            | |
+| +--------------------------------------+ |
+| | NCT04000002 - ...                    | |
+| +--------------------------------------+ |
+|                                          |
+| [G]=Met  [Y]=Borderline  [R]=Not Met    |
+| [?]=Unknown/Needs Info                   |
++------------------------------------------+
+```
+**Key components:** `trial_card` (traffic-light display), `search_process`, `progress_tracker`
+### 4.4 GAP_FOLLOWUP State -- Gap Analysis Page
+```
++------------------------------------------+
+| [i] This tool is for information only... |
+| [Sidebar: Journey Progress]              |
+|                                          |
+| Gap Analysis & Next Steps                |
+|                                          |
+| +--------------------------------------+ |
+| | GAP: Brain MRI results needed        | |
+| | Impact: Would resolve [?] criteria   | |
+| |   for NCT04000001, NCT04000003       | |
+| | Action: Upload brain MRI report      | |
+| |         [Upload Document]            | |
+| +--------------------------------------+ |
+| | GAP: KRAS mutation status            | |
+| | Impact: Required for NCT04000005     | |
+| | Action: Request test from oncologist | |
+| +--------------------------------------+ |
+|                                          |
+| [Re-run Matching with New Data]          |
+| [Proceed to Summary]                     |
++------------------------------------------+
+```
+**Key components:** `gap_card`, `file_uploader` (for additional docs), `progress_tracker`
+### 4.5 SUMMARY State -- Summary & Export Page
+```
++------------------------------------------+
+| [i] This tool is for information only... |
+| [Sidebar: Journey Progress]              |
+|                                          |
+| Clinical Trial Matching Summary          |
+|                                          |
+| Eligible Trials: 3                       |
+| Borderline Trials: 2                     |
+| Not Eligible: 3                          |
+| Unresolved Gaps: 1                       |
+|                                          |
+| [Download Doctor Packet (JSON/Markdown)] |
+| [Start New Session]                      |
+|                                          |
+| Chat with AI Copilot:                    |
+| +--------------------------------------+ |
+| | AI: Based on your profile...         | |
+| | You: What about trial NCT...?        | |
+| | AI: That trial requires...           | |
+| +--------------------------------------+ |
+| | [Type a message...]       [Send]     | |
+| +--------------------------------------+ |
++------------------------------------------+
+```
+**Key components:** `chat_panel`, `progress_tracker`
+---
+## 5. TDD Test Cases
+### 5.1 Upload Page Tests
+| Test Case | Input | Expected Output | Boundary |
+|-----------|-------|-----------------|----------|
+| No files uploaded | Empty uploader | "Start Extraction" button disabled | N/A |
+| Single PDF upload | 1 PDF file | File listed, extraction button enabled | N/A |
+| Multiple files | 3 PDF + 1 PNG | All 4 files listed with sizes | N/A |
+| Invalid file type | 1 .docx file | File rejected, error message shown | File type filter |
+| Large file | 250 MB PDF | Error or warning per `maxUploadSize` | Size limit |
+| Extraction triggered | Click "Start Extraction" | `st.status` shows running, Parlant event sent | N/A |
+| Extraction completes | MedGemma returns profile | Journey advances to PRESCREEN, profile in session_state | State transition |
+| Extraction fails | MedGemma error | `st.status` shows error state, retry option | Error handling |
+### 5.2 Profile Review Page Tests
+| Test Case | Input | Expected Output | Boundary |
+|-----------|-------|-----------------|----------|
+| Profile display | PatientProfile in session_state | All fields rendered correctly | N/A |
+| Unknown fields highlighted | Profile with unknowns list | Unknowns shown with warning icon | N/A |
+| Edit profile | Click Edit, modify ECOG | session_state updated, confirmation shown | N/A |
+| Confirm profile | Click "Confirm & Search" | Journey advances to VALIDATE_TRIALS | State transition |
+| Empty profile | No profile in session_state | Redirect to Upload page | Guard clause |
+| Biomarker display | Complex biomarker data | All biomarkers with values and methods | Data richness |
+### 5.3 Trial Matching Page Tests
+| Test Case | Input | Expected Output | Boundary |
+|-----------|-------|-----------------|----------|
+| Trials loading | Matching in progress | `st.spinner` or `st.status` shown | N/A |
+| Trials displayed | 8 TrialCandidates | 8 trial cards with traffic-light criteria | N/A |
+| Green criterion | Criterion met with evidence | Green indicator, evidence citation | N/A |
+| Yellow criterion | Borderline match | Yellow indicator, explanation | N/A |
+| Red criterion | Criterion not met | Red indicator, specific reason | N/A |
+| Unknown criterion | Missing data | Question mark, linked to gap | N/A |
+| Zero trials | No matches found | Informative message, suggest broadening | Empty state |
+| Many trials | 50+ results | Pagination or scroll, performance ok | Scale |
+| Search process displayed | SearchLog with 3 steps | 3 step entries shown with query params and result counts | N/A |
+| Refinement visible | >50 initial results refined to 12 | Shows refinement action and reason | Iterative loop |
+| Relaxation visible | 0 initial results relaxed to 5 | Shows relaxation action and reason | Iterative loop |
+### 5.4 Gap Analysis Page Tests
+| Test Case | Input | Expected Output | Boundary |
+|-----------|-------|-----------------|----------|
+| Gaps identified | 3 gaps in ledger | 3 gap cards with actions | N/A |
+| Upload resolves gap | Upload brain MRI report | Gap card updates, re-match option | Iterative flow |
+| No gaps | All criteria resolved | Message: "No gaps", proceed to summary | Happy path |
+| Gap impacts multiple trials | 1 gap affects 3 trials | Gap card lists all 3 affected trials | Cross-reference |
+| Re-run matching | Click re-run after upload | New extraction + matching cycle | Loop back |
+### 5.5 Summary Page Tests
+| Test Case | Input | Expected Output | Boundary |
+|-----------|-------|-----------------|----------|
+| Summary statistics | Complete ledger | Correct counts per category | N/A |
+| Download doctor packet | Click download | JSON + Markdown files downloadable via st.download_button | N/A |
+| Chat interaction | Send message | Message appears, agent responds | N/A |
+| New session | Click "Start New" | State cleared, redirect to Upload | State reset |
+### 5.6 Disclaimer Tests
+| Test Case | Input | Expected Output | Boundary |
+|-----------|-------|-----------------|----------|
+| Disclaimer on upload page | Navigate to Upload | Info banner with disclaimer text visible | N/A |
+| Disclaimer on profile page | Navigate to Profile Review | Info banner with disclaimer text visible | N/A |
+| Disclaimer on matching page | Navigate to Trial Matching | Info banner with disclaimer text visible | N/A |
+| Disclaimer on gap page | Navigate to Gap Analysis | Info banner with disclaimer text visible | N/A |
+| Disclaimer on summary page | Navigate to Summary | Info banner with disclaimer text visible | N/A |
+| Disclaimer text content | Any page | Contains "information only" and "not medical advice" | Exact wording |
+---
+## 6. Streamlit AppTest Testing Strategy
+### 6.1 Test Setup Pattern
+```python
+# tests/test_upload_page.py
+import pytest
+from streamlit.testing.v1 import AppTest
+@pytest.fixture
+def upload_app():
+    """Create AppTest instance for upload page."""
+    at = AppTest.from_file("pages/1_upload.py")
+    # Initialize required session state
+    at.session_state["journey_state"] = "INGEST"
+    at.session_state["parlant_session_id"] = "test-session-123"
+    at.session_state["uploaded_files"] = []
+    return at.run()
+def test_initial_state(upload_app):
+    """Upload page shows uploader and disabled extraction button."""
+    at = upload_app
+    # Check file uploader exists
+    assert len(at.file_uploader) > 0
+    # Check no error state
+    assert len(at.exception) == 0
+def test_extraction_button_disabled_without_files(upload_app):
+    """Extraction button should be disabled when no files uploaded."""
+    at = upload_app
+    # Button should exist but extraction should not proceed without files
+    assert at.button[0].disabled or at.session_state.get("uploaded_files") == []
+```
+### 6.2 Widget Interaction Patterns
+```python
+def test_text_input_profile_edit():
+    """Test editing patient profile fields via text input."""
+    at = AppTest.from_file("pages/2_profile_review.py")
+    at.session_state["journey_state"] = "PRESCREEN"
+    at.session_state["patient_profile"] = {
+        "demographics": {"age": 62, "sex": "Female"},
+        "diagnosis": {"stage": "IIIB", "histology": "Adenocarcinoma"},
+    }
+    at = at.run()
+    # Simulate editing a field
+    if len(at.text_input) > 0:
+        at.text_input[0].input("IIIA").run()
+        # Assert profile updated in session state
+def test_button_click_advances_journey():
+    """Clicking confirm button advances journey to next state."""
+    at = AppTest.from_file("pages/2_profile_review.py")
+    at.session_state["journey_state"] = "PRESCREEN"
+    at.session_state["patient_profile"] = {"demographics": {"age": 62}}
+    at = at.run()
+    # Find and click confirm button
+    confirm_buttons = [b for b in at.button if "Confirm" in str(b.label)]
+    if confirm_buttons:
+        confirm_buttons[0].click()
+        at = at.run()
+        assert at.session_state["journey_state"] == "VALIDATE_TRIALS"
+```
+### 6.3 Page Navigation Test
+```python
+def test_guard_redirect_without_profile():
+    """Profile review page redirects to upload if no profile exists."""
+    at = AppTest.from_file("pages/2_profile_review.py")
+    at.session_state["journey_state"] = "PRESCREEN"
+    at.session_state["patient_profile"] = None  # No profile
+    at = at.run()
+    # Should show warning or error, not crash
+    assert len(at.exception) == 0
+    # Could check for warning message
+    warnings = [m for m in at.warning if "upload" in str(m.value).lower()]
+    assert len(warnings) > 0 or at.session_state["journey_state"] == "INGEST"
+```
+### 6.4 Session State Test
+```python
+def test_session_state_initialization():
+    """All session state keys should be initialized on first run."""
+    at = AppTest.from_file("app.py").run()
+    required_keys = [
+        "journey_state", "parlant_session_id", "patient_profile",
+        "uploaded_files", "trial_candidates", "eligibility_ledger"
+    ]
+    for key in required_keys:
+        assert key in at.session_state, f"Missing session state key: {key}"
+def test_session_state_persists_across_reruns():
+    """Session state values persist across multiple reruns."""
+    at = AppTest.from_file("app.py").run()
+    at.session_state["journey_state"] = "PRESCREEN"
+    at = at.run()
+    assert at.session_state["journey_state"] == "PRESCREEN"
+```
+### 6.5 Component Rendering Tests
+```python
+def test_trial_card_traffic_light_rendering():
+    """Trial card displays correct traffic light colors for criteria."""
+    at = AppTest.from_file("pages/3_trial_matching.py")
+    at.session_state["journey_state"] = "VALIDATE_TRIALS"
+    at.session_state["trial_candidates"] = [
+        {
+            "nct_id": "NCT04000001",
+            "title": "Test Trial",
+            "criteria_results": [
+                {"criterion": "NSCLC", "status": "MET", "evidence": "pathology report p.1"},
+                {"criterion": "ECOG 0-1", "status": "MET", "evidence": "clinic letter"},
+                {"criterion": "No prior IO", "status": "NOT_MET", "evidence": "treatment history"},
+                {"criterion": "Brain mets", "status": "UNKNOWN", "evidence": None},
+            ]
+        }
+    ]
+    at = at.run()
+    # Check that trial card content is rendered
+    assert len(at.exception) == 0
+    # Check for presence of trial ID in rendered markdown
+    markdown_texts = [str(m.value) for m in at.markdown]
+    assert any("NCT04000001" in text for text in markdown_texts)
+```
+### 6.6 Error Handling Tests
+```python
+def test_parlant_connection_error_handling():
+    """App should handle Parlant server unavailability gracefully."""
+    at = AppTest.from_file("app.py")
+    at.session_state["parlant_session_id"] = None  # Simulate no connection
+    at = at.run()
+    # Should not crash
+    assert len(at.exception) == 0
+def test_extraction_error_shows_retry():
+    """When extraction fails, user sees error status and retry option."""
+    at = AppTest.from_file("pages/1_upload.py")
+    at.session_state["journey_state"] = "INGEST"
+    at.session_state["extraction_error"] = "MedGemma timeout"
+    at = at.run()
+    # Should show error message
+    assert len(at.exception) == 0
+    error_msgs = [str(e.value) for e in at.error]
+    assert len(error_msgs) > 0 or at.session_state.get("extraction_error") is not None
+```
+### 6.7 Search Process Component Tests
+```python
+# tests/test_components.py (addition)
+class TestSearchProcessComponent:
+    """Test search process visualization component."""
+    def test_renders_search_steps(self):
+        """Search process should display all refinement steps."""
+        at = AppTest.from_file("app/components/search_process.py")
+        at.session_state["search_log"] = {
+            "steps": [
+                {"step": 1, "query": {"condition": "NSCLC", "location": "DE"}, "found": 47, "action": "refine", "reason": "Too many results, adding phase filter"},
+                {"step": 2, "query": {"condition": "NSCLC", "location": "DE", "phase": "Phase 3"}, "found": 12, "action": "shortlist", "reason": "Right size for detailed review"},
+            ],
+            "final_shortlist_nct_ids": ["NCT001", "NCT002", "NCT003", "NCT004", "NCT005"],
+        }
+        at.run()
+        # Verify steps are displayed
+        assert "47" in at.text[0].value  # First step result count
+        assert "12" in at.text[1].value  # Second step result count
+        assert "Phase 3" in at.text[0].value or "Phase 3" in at.text[1].value
+    def test_empty_search_log(self):
+        """Should handle missing search log gracefully."""
+        at = AppTest.from_file("app/components/search_process.py")
+        at.run()
+        # Should not crash, show placeholder
+        assert not at.exception
+    def test_collapsible_details(self):
+        """Search details should be in an expander for clean UI."""
+        at = AppTest.from_file("app/components/search_process.py")
+        at.session_state["search_log"] = {
+            "steps": [{"step": 1, "query": {}, "found": 10, "action": "shortlist", "reason": "OK"}],
+        }
+        at.run()
+        # Verify expander exists for search details
+        assert len(at.expander) >= 1
+```
+### 6.8 Disclaimer Component Tests
+```python
+# tests/test_components.py (addition)
+class TestDisclaimerBanner:
+    """Test medical disclaimer banner appears correctly."""
+    def test_disclaimer_renders(self):
+        """Disclaimer banner should render on every page."""
+        at = AppTest.from_file("app/components/disclaimer_banner.py")
+        at.run()
+        assert len(at.info) >= 1
+        assert "information" in at.info[0].value.lower()
+        assert "medical advice" in at.info[0].value.lower()
+    def test_disclaimer_in_upload_page(self):
+        """Upload page should include disclaimer."""
+        at = AppTest.from_file("app/pages/1_upload.py")
+        at.run()
+        info_texts = [i.value.lower() for i in at.info]
+        assert any("information" in t and "medical" in t for t in info_texts)
+```
+### 6.9 AppTest Limitations
+- `AppTest` does not support testing `st.file_uploader` file content directly (mock at service layer instead).
+- Not yet compatible with `st.navigation`/`st.Page` multipage (test individual pages via `from_file`).
+- No browser rendering -- tests run headless, pure Python.
+- Must call `.run()` after every interaction to see updated state.
+---
+## 7. Appendix: API Reference
+### 7.1 Streamlit Key APIs
+| API | Purpose | Notes |
+|-----|---------|-------|
+| `st.navigation(pages, position)` | Define multipage app | Returns current page, must call `.run()` |
+| `st.Page(page, title, icon, url_path)` | Define a page | `page` = filepath or callable |
+| `st.switch_page(page)` | Programmatic navigation | Stops current page execution |
+| `st.page_link(page, label, icon)` | Clickable nav link | Non-blocking |
+| `st.file_uploader(label, type, accept_multiple_files, key)` | File upload widget | Returns `UploadedFile` (extends `BytesIO`) |
+| `st.session_state` | Persistent key-value store | Survives reruns, per-session |
+| `st.status(label, expanded, state)` | Collapsible status container | Context manager, auto-completes |
+| `st.spinner(text, show_time)` | Loading spinner | Context manager |
+| `st.progress(value, text)` | Progress bar | 0-100 int or 0.0-1.0 float |
+| `st.toast(body, icon, duration)` | Transient notification | Top-right corner |
+| `st.write_stream(generator)` | Streaming text output | Typewriter effect for strings |
+| `@st.fragment(run_every=N)` | Partial rerun decorator | Isolated from full app rerun |
+| `st.rerun(scope)` | Trigger rerun | `"app"` or `"fragment"` |
+| `st.chat_message(name)` | Chat bubble | `"user"`, `"assistant"`, or custom |
+| `st.chat_input(placeholder)` | Chat text input | Fixed at bottom of container |
+| `AppTest.from_file(path)` | Create test instance | `.run()` to execute |
+| `AppTest.from_string(code)` | Test from string | Quick inline tests |
+| `at.button[i].click()` | Simulate button click | Chain with `.run()` |
+| `at.text_input[i].input(val)` | Simulate text entry | Chain with `.run()` |
+| `at.slider[i].set_value(val)` | Set slider value | Chain with `.run()` |
+### 7.2 Parlant Key APIs (from DeepWiki `emcie-co/parlant`)
+**REST Endpoints:**
+| Endpoint | Method | Purpose | Key Params |
+|----------|--------|---------|------------|
+| `/agents` | POST | Create agent | `name`, `description` |
+| `/sessions` | POST | Create session | `agent_id`, `customer_id` (optional), `title`, `metadata` |
+| `/sessions` | GET | List sessions | `agent_id`, `customer_id`, `limit`, `cursor`, `sort` |
+| `/sessions/{id}/events` | POST | Send event | `kind`, `source`, `message`/`data`, `metadata`; query: `moderation` |
+| `/sessions/{id}/events` | GET | Poll events | `min_offset`, `wait_for_data`, `source`, `correlation_id`, `trace_id`, `kinds` |
+| `/sessions/{id}/events/{eid}` | PATCH | Update event | metadata updates only |
+**Event kinds:** `message`, `status`, `tool`, `custom`
+**Event sources:** `customer`, `customer_ui`, `ai_agent`, `human_agent`, `human_agent_on_behalf_of_ai_agent`, `system`
+**Status event states:** `acknowledged`, `processing`, `typing`, `ready`, `error`, `cancelled`
+**Long-polling behavior:** `wait_for_data` > 0 blocks until new events or timeout; returns `504` on timeout.
+**SDK APIs:**
+| SDK Method | Purpose |
+|------------|---------|
+| `agent.create_journey(title, conditions, description)` | Create Journey with state machine |
+| `journey.initial_state.transition_to(chat_state=..., tool_state=..., condition=...)` | Define state transitions |
+| `agent.create_guideline(condition, action, tools=[...])` | Create global guideline |
+| `journey.create_guideline(condition, action, tools=[...])` | Create journey-scoped guideline |
+| `p.Server(session_store="local"/"mongodb://...")` | Configure session persistence |
+**Tool decorator:** `@p.tool` auto-extracts name, description, parameters from function signature.
+**NLP backend:** `parlant-server --gemini` (requires `GEMINI_API_KEY` and `pip install parlant[gemini]`).
+**Client SDK:** `parlant-client` (Python), TypeScript client, or direct REST.
+**Storage options:** in-memory (default/testing), local JSON, MongoDB (production).
+### 7.3 Integration Pattern: Streamlit + Parlant
+```
+User Action (Streamlit UI)
+  -> st.session_state update
+  -> ParlantClient.send_message() or send_custom_event()
+  -> Parlant Server processes (async)
+  -> @st.fragment polls ParlantClient.poll_events()
+  -> New events update st.session_state
+  -> UI rerenders with new data
+```
+This polling loop runs via `@st.fragment(run_every=3)` to avoid blocking the main app thread, providing near-real-time updates without full page reruns.
+---
+## References
+- Streamlit source: DeepWiki analysis of `streamlit/streamlit`
+- Parlant source: DeepWiki analysis of `emcie-co/parlant`
+- Parlant official docs: https://www.parlant.io/docs/
+- Parlant Sessions: https://www.parlant.io/docs/concepts/sessions/
+- Parlant Conversation API: https://www.parlant.io/docs/engine-internals/conversation-api/
+- Parlant GitHub: https://github.com/emcie-co/parlant
+- Parlant Journey System: DeepWiki `emcie-co/parlant` section 5.2
+- Parlant Guideline System: DeepWiki `emcie-co/parlant` section 5.1
+- Parlant Tool Integration: DeepWiki `emcie-co/parlant` section 6
+- Parlant NLP Providers: DeepWiki `emcie-co/parlant` section 10.1

pyproject.toml ADDED Viewed

	@@ -0,0 +1,29 @@

+[project]
+name = "trialpath"
+version = "0.1.0"
+description = "AI-powered clinical trial matching for NSCLC patients"
+requires-python = ">=3.11"
+dependencies = [
+    "pydantic>=2.0",
+    "httpx>=0.27",
+    "streamlit>=1.40",
+    "pytest>=8.0",
+    "pytest-asyncio>=0.24",
+]
+[project.optional-dependencies]
+dev = [
+    "ruff>=0.8",
+    "pytest-cov>=6.0",
+]
+[tool.ruff]
+line-length = 100
+target-version = "py311"
+[tool.ruff.lint]
+select = ["E", "F", "I", "W"]
+[tool.pytest.ini_options]
+testpaths = ["trialpath/tests", "app/tests"]
+asyncio_mode = "auto"

trialpath/__init__.py ADDED Viewed

File without changes

trialpath/agent/__init__.py ADDED Viewed

File without changes

trialpath/models/__init__.py ADDED Viewed

File without changes

trialpath/services/__init__.py ADDED Viewed

File without changes

trialpath/tests/__init__.py ADDED Viewed

File without changes