Commit ·
1abff4e
0
Parent(s):
chore: initialize project skeleton with pyproject.toml
Browse files- Add pyproject.toml with core deps (pydantic, httpx, streamlit, pytest)
- Empty package structure: trialpath/ (models, services, agent) and app/ (pages, components, services)
- Configure ruff and pytest
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- CLAUDE.md +82 -0
- app/__init__.py +0 -0
- app/tests/__init__.py +0 -0
- docs/TrialPath AI technical design.md +487 -0
- docs/Trialpath PRD.md +246 -0
- docs/tdd-guide-backend-service.md +0 -0
- docs/tdd-guide-data-evaluation.md +2384 -0
- docs/tdd-guide-ux-frontend.md +1524 -0
- pyproject.toml +29 -0
- trialpath/__init__.py +0 -0
- trialpath/agent/__init__.py +0 -0
- trialpath/models/__init__.py +0 -0
- trialpath/services/__init__.py +0 -0
- trialpath/tests/__init__.py +0 -0
CLAUDE.md
ADDED
|
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CLAUDE.md
|
| 2 |
+
|
| 3 |
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
| 4 |
+
|
| 5 |
+
## Project Overview
|
| 6 |
+
|
| 7 |
+
TrialPath is an AI-powered clinical trial matching system for NSCLC (Non-Small Cell Lung Cancer) patients. It is currently in **pre-implementation design phase** — only design documents exist, no source code yet.
|
| 8 |
+
|
| 9 |
+
**Core idea:** Help patients understand which clinical trials they may qualify for, transform "rejection" into "actionable next steps" via gap analysis.
|
| 10 |
+
|
| 11 |
+
## Design Documents
|
| 12 |
+
|
| 13 |
+
- `Trialpath PRD.md` — Product requirements, success metrics, HAI-DEF submission plan
|
| 14 |
+
- `TrialPath AI Synergy in Digital Health Trials.md` — Technical architecture, data contracts, Parlant workflow design
|
| 15 |
+
|
| 16 |
+
## Architecture (5 Components)
|
| 17 |
+
|
| 18 |
+
1. **UI & Orchestrator** — Streamlit/FastAPI app embedding Parlant engine
|
| 19 |
+
2. **Parlant Agent + Journey** — Single agent (`patient_trial_copilot`) with 5 states: `INGEST` → `PRESCREEN` → `VALIDATE_TRIALS` → `GAP_FOLLOWUP` → `SUMMARY`
|
| 20 |
+
3. **MedGemma 4B** (HF endpoint) — Multimodal extraction from PDFs/images → `PatientProfile` + evidence spans
|
| 21 |
+
4. **Gemini 3 Pro** — LLM planner: generates `SearchAnchors` from profile, reranks trials, orchestrates criterion evaluation
|
| 22 |
+
5. **ClinicalTrials MCP Server** (existing, not custom) — Wraps ClinicalTrials.gov REST API v2
|
| 23 |
+
|
| 24 |
+
## Key Design Decisions
|
| 25 |
+
|
| 26 |
+
- **No vector DB / RAG** — Uses agentic search via ClinicalTrials.gov API with iterative query refinement
|
| 27 |
+
- **Reuse existing MCP** — Don't build custom trial search; use off-the-shelf ClinicalTrials MCP servers
|
| 28 |
+
- **Two-stage clinical screening** — Mirrors real-world: prescreen (minimal dataset) → validation (full criterion-by-criterion)
|
| 29 |
+
- **Evidence-linked** — Every decision must cite source doc/page/span
|
| 30 |
+
- **Gap analysis as core differentiator** — "You'd qualify IF you had X" rather than just "No match"
|
| 31 |
+
|
| 32 |
+
## Data Contracts (JSON Schemas)
|
| 33 |
+
|
| 34 |
+
Four core contracts defined in the tech design doc (section 4):
|
| 35 |
+
- **PatientProfile v1** — MedGemma output with demographics, diagnosis, biomarkers, labs, treatments, unknowns
|
| 36 |
+
- **SearchAnchors v1** — Gemini-generated query params for MCP search
|
| 37 |
+
- **TrialCandidate v1** — Normalized MCP search results
|
| 38 |
+
- **EligibilityLedger v1** — Per-trial criterion-level assessment with evidence pointers and gaps
|
| 39 |
+
|
| 40 |
+
## Planned Code Structure
|
| 41 |
+
|
| 42 |
+
From PRD deliverables section:
|
| 43 |
+
```
|
| 44 |
+
data/generate_synthetic_patients.py
|
| 45 |
+
data/generate_noisy_pdfs.py
|
| 46 |
+
matching/medgemma_extractor.py
|
| 47 |
+
matching/agentic_search.py # Parlant + Gemini + MCP
|
| 48 |
+
evaluation/run_trec_benchmark.py
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
## Planned Tech Stack
|
| 52 |
+
|
| 53 |
+
- Python (Streamlit or FastAPI)
|
| 54 |
+
- Google Gemini 3 Pro (orchestration)
|
| 55 |
+
- MedGemma 4B via Hugging Face endpoint (multimodal extraction)
|
| 56 |
+
- Parlant (agentic workflow engine)
|
| 57 |
+
- Synthea FHIR (synthetic patient generation)
|
| 58 |
+
- TREC Clinical Trials Track 2021/2022 (benchmarking)
|
| 59 |
+
|
| 60 |
+
## Success Targets
|
| 61 |
+
|
| 62 |
+
- MedGemma Extraction F1 >= 0.85
|
| 63 |
+
- Trial Retrieval Recall@50 >= 0.75
|
| 64 |
+
- Trial Ranking NDCG@10 >= 0.60
|
| 65 |
+
- Criterion Decision Accuracy >= 0.85
|
| 66 |
+
- Latency < 15s, Cost < $0.50/session
|
| 67 |
+
|
| 68 |
+
## Scope
|
| 69 |
+
|
| 70 |
+
- Disease: NSCLC only
|
| 71 |
+
- Data: Synthetic patients only (no real PHI)
|
| 72 |
+
- Timeline: 3-month PoC
|
| 73 |
+
|
| 74 |
+
|
| 75 |
+
## Dev tools
|
| 76 |
+
|
| 77 |
+
- use huggingface cli for model deployment
|
| 78 |
+
- use uv, ruff, astral ty
|
| 79 |
+
- use ripgrep
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
## Commit atomically
|
app/__init__.py
ADDED
|
File without changes
|
app/tests/__init__.py
ADDED
|
File without changes
|
docs/TrialPath AI technical design.md
ADDED
|
@@ -0,0 +1,487 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Below is a compact but deepened tech design doc that applies your three constraints:
|
| 2 |
+
|
| 3 |
+
1. Reuse existing ClinicalTrials MCPs.
|
| 4 |
+
2. Make Parlant workflows map tightly onto real clinical screening.
|
| 5 |
+
3. Lay out a general patient plan (using synthetic data) that feels like a real-world journey.
|
| 6 |
+
|
| 7 |
+
No code; just user flow, data contracts, and architecture.
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## **1\. Scope & Positioning**
|
| 12 |
+
|
| 13 |
+
**PoC Goal (2‑week sprint, YAGNI):**
|
| 14 |
+
A working, demoable *patient‑centric* trial-matching copilot that:
|
| 15 |
+
|
| 16 |
+
* Takes **synthetic NSCLC patients** (documents \+ minimal metadata).
|
| 17 |
+
* Uses **MedGemma 4B multimodal** to understand those artifacts.
|
| 18 |
+
* Uses **Gemini 3 Pro \+ Parlant** to orchestrate **patient‑to‑trials matching** via an **off‑the‑shelf ClinicalTrials MCP server**.
|
| 19 |
+
* Produces an **eligibility ledger \+ gap analysis** aligned with real clinical screening workflows (prescreen → validation), not “toy” UX.
|
| 20 |
+
|
| 21 |
+
We explicitly **don’t** build our own trial MCP, own search stack, or multi-service infra. Everything runs in a thin orchestrator \+ UI process.
|
| 22 |
+
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
## **2\. Real-World Screening Workflow Mapping**
|
| 26 |
+
|
| 27 |
+
Evidence from clinical practice and trial‑matching research converges on a two‑stage flow:[appliedclinicaltrialsonline+4](https://www.appliedclinicaltrialsonline.com/view/clinical-trial-matching-solutions-understanding-the-landscape)
|
| 28 |
+
|
| 29 |
+
1. **Prescreening**
|
| 30 |
+
* Quick eligibility judgment on a *minimal dataset*: diagnosis, stage, functional status (ECOG), basic labs, key comorbidities.
|
| 31 |
+
* Usually: oncologist \+ coordinator \+ minimal EHR context.
|
| 32 |
+
* Goal: “Is this patient worth deeper chart review for any trials here?”
|
| 33 |
+
2. **Validation (Full Match / Chart Review)**
|
| 34 |
+
* Detailed comparison of **full record** vs **full inclusion/exclusion**, often 40–60 criteria per trial.
|
| 35 |
+
* Typically done by a coordinator/CRA with investigator sign‑off.
|
| 36 |
+
* Goal: for a *specific trial*, decide: *eligible / excluded / unclear → needs further tests*.
|
| 37 |
+
|
| 38 |
+
Our PoC should simulate this **two‑stage workflow**:
|
| 39 |
+
|
| 40 |
+
* **Stage 1 \= “Patient‑First Prescreen”** → shortlist trials via MCP \+ Gemini using MedGemma‑extracted “minimal dataset”.
|
| 41 |
+
* **Stage 2 \= “Trial‑Specific Validation”** → trial‑by‑trial, criterion‑by‑criterion ledger using MedGemma evidence.
|
| 42 |
+
|
| 43 |
+
Parlant Journeys become the *explicit codification* of these two stages \+ transitions.
|
| 44 |
+
|
| 45 |
+
---
|
| 46 |
+
|
| 47 |
+
## **3\. High-Level Architecture (YAGNI, Reusing MCP)**
|
| 48 |
+
|
| 49 |
+
## **3.1 Components**
|
| 50 |
+
|
| 51 |
+
**1\) UI & Orchestrator (single process)**
|
| 52 |
+
|
| 53 |
+
* Streamlit/FastAPI-style app (exact stack is secondary) that:
|
| 54 |
+
* Hosts the chat/stepper UI.
|
| 55 |
+
* Embeds **Parlant** and maintains session state.
|
| 56 |
+
* Calls external tools (Gemini API, MedGemma HF endpoint, ClinicalTrials MCP).
|
| 57 |
+
|
| 58 |
+
**2\) Parlant Agent \+ Journey**
|
| 59 |
+
|
| 60 |
+
* Single Parlant agent, e.g. `patient_trial_copilot`.
|
| 61 |
+
* One **Journey** with explicit stages mirroring real-world workflow:
|
| 62 |
+
* `INGEST` → `PRESCREEN` → `VALIDATE_TRIALS` → `GAP_FOLLOWUP` → `SUMMARY`.
|
| 63 |
+
* Parlant rules enforce:
|
| 64 |
+
* When to call which tool.
|
| 65 |
+
* When to move from prescreen to validation.
|
| 66 |
+
* When to ask the patient (synthetic persona) for more documents.
|
| 67 |
+
|
| 68 |
+
**3\) MedGemma 4B Multimodal Service (HF endpoint)**
|
| 69 |
+
|
| 70 |
+
* Input: PDF(s) \+ optional images.
|
| 71 |
+
* Output: structured **PatientProfile** \+ **evidence spans** (doc/page/region references).
|
| 72 |
+
* Used twice:
|
| 73 |
+
* Once for **prescreen dataset** extraction.
|
| 74 |
+
* Once for **criterion‑level validation** (patient vs trial snippets).
|
| 75 |
+
|
| 76 |
+
**4\) Gemini 3 Pro (LLM Planner & Re‑ranker)**
|
| 77 |
+
|
| 78 |
+
* Uses Google AI / Vertex Gemini 3 Pro for:
|
| 79 |
+
* Generating query parameters for ClinicalTrials MCP from PatientProfile.
|
| 80 |
+
* Interpreting MCP results & producing ranked **TrialCandidate** list.
|
| 81 |
+
* Orchestrating criterion slicing and gap reasoning.
|
| 82 |
+
* Strategy: keep Gemini in **tools \+ structured outputs** mode; no direct free-form “actions”.
|
| 83 |
+
|
| 84 |
+
**5\) ClinicalTrials MCP Server (Existing)**
|
| 85 |
+
|
| 86 |
+
* Choose an existing **ClinicalTrials MCP server** rather than hand-rolling: e.g. one of the open-source MCP servers wrapping the ClinicalTrials.gov REST API v2.[github+3](https://github.com/JackKuo666/ClinicalTrials-MCP-Server)
|
| 87 |
+
* Must support at least:
|
| 88 |
+
* `search_trials(parameters)` → list of (NCT ID, title, conditions, locations, status, phase, eligibility text).
|
| 89 |
+
* `get_trial(nct_id)` → full record including inclusion/exclusion criteria.
|
| 90 |
+
|
| 91 |
+
## **3.2 Why Reuse MCP is Critical**
|
| 92 |
+
|
| 93 |
+
* **Time**: ClinicalTrials.gov v2 API is detailed and somewhat finicky; paging, filters, field lists. Existing MCPs already encode those details \+ JSON schemas.[nlm.nih+1](https://www.nlm.nih.gov/pubs/techbull/ma24/ma24_clinicaltrials_api.html)
|
| 94 |
+
* **Alignment with agentic ecosystems**: These MCP servers are already shaped as “tools” for LLMs. We just plug Parlant/Gemini on top.
|
| 95 |
+
* **YAGNI**: custom MCP or RAG index for trials is a post‑PoC optimization.
|
| 96 |
+
|
| 97 |
+
---
|
| 98 |
+
|
| 99 |
+
## **4\. Data Contracts (Core JSON Schemas)**
|
| 100 |
+
|
| 101 |
+
We keep contracts minimal but explicit, so we can test each piece in isolation.
|
| 102 |
+
|
| 103 |
+
## **4.1 PatientProfile (v1)**
|
| 104 |
+
|
| 105 |
+
Output of MedGemma’s **prescreen extraction**; updated as new docs arrive:
|
| 106 |
+
|
| 107 |
+
json
|
| 108 |
+
`{`
|
| 109 |
+
`"patient_id": "string",`
|
| 110 |
+
`"source_docs": [`
|
| 111 |
+
`{ "doc_id": "string", "type": "clinic_letter|pathology|lab|imaging", "meta": {} }`
|
| 112 |
+
`],`
|
| 113 |
+
`"demographics": {`
|
| 114 |
+
`"age": 52,`
|
| 115 |
+
`"sex": "female"`
|
| 116 |
+
`},`
|
| 117 |
+
`"diagnosis": {`
|
| 118 |
+
`"primary_condition": "Non-Small Cell Lung Cancer",`
|
| 119 |
+
`"histology": "adenocarcinoma",`
|
| 120 |
+
`"stage": "IVa",`
|
| 121 |
+
`"diagnosis_date": "2025-11-15"`
|
| 122 |
+
`},`
|
| 123 |
+
`"performance_status": {`
|
| 124 |
+
`"scale": "ECOG",`
|
| 125 |
+
`"value": 1,`
|
| 126 |
+
`"evidence": [{ "doc_id": "clinic_1", "page": 2, "span_id": "s_17" }]`
|
| 127 |
+
`},`
|
| 128 |
+
`"biomarkers": [`
|
| 129 |
+
`{`
|
| 130 |
+
`"name": "EGFR",`
|
| 131 |
+
`"result": "Exon 19 deletion",`
|
| 132 |
+
`"date": "2026-01-10",`
|
| 133 |
+
`"evidence": [{ "doc_id": "path_egfr", "page": 1, "span_id": "s_3" }]`
|
| 134 |
+
`}`
|
| 135 |
+
`],`
|
| 136 |
+
`"key_labs": [`
|
| 137 |
+
`{`
|
| 138 |
+
`"name": "ANC",`
|
| 139 |
+
`"value": 1.8,`
|
| 140 |
+
`"unit": "10^9/L",`
|
| 141 |
+
`"date": "2026-01-28",`
|
| 142 |
+
`"evidence": [{ "doc_id": "labs_jan", "page": 1, "span_id": "tbl_anc" }]`
|
| 143 |
+
`}`
|
| 144 |
+
`],`
|
| 145 |
+
`"treatments": [`
|
| 146 |
+
`{`
|
| 147 |
+
`"drug_name": "Pembrolizumab",`
|
| 148 |
+
`"start_date": "2024-06-01",`
|
| 149 |
+
`"end_date": "2024-11-30",`
|
| 150 |
+
`"line": 1,`
|
| 151 |
+
`"evidence": [{ "doc_id": "clinic_2", "page": 3, "span_id": "s_45" }]`
|
| 152 |
+
`}`
|
| 153 |
+
`],`
|
| 154 |
+
`"comorbidities": [`
|
| 155 |
+
`{`
|
| 156 |
+
`"name": "CKD",`
|
| 157 |
+
`"grade": "Stage 3",`
|
| 158 |
+
`"evidence": [{ "doc_id": "clinic_1", "page": 2, "span_id": "s_20" }]`
|
| 159 |
+
`}`
|
| 160 |
+
`],`
|
| 161 |
+
`"imaging_summary": [`
|
| 162 |
+
`{`
|
| 163 |
+
`"modality": "MRI brain",`
|
| 164 |
+
`"date": "2026-01-20",`
|
| 165 |
+
`"finding": "Stable 3mm left frontal lesion, no enhancement",`
|
| 166 |
+
`"interpretation": "likely inactive scar",`
|
| 167 |
+
`"certainty": "low|medium|high",`
|
| 168 |
+
`"evidence": [{ "doc_id": "mri_report", "page": 1, "span_id": "s_9" }]`
|
| 169 |
+
`}`
|
| 170 |
+
`],`
|
| 171 |
+
`"unknowns": [`
|
| 172 |
+
`{ "field": "EGFR", "reason": "No clear mention", "importance": "high" }`
|
| 173 |
+
`]`
|
| 174 |
+
`}`
|
| 175 |
+
|
| 176 |
+
Notes:
|
| 177 |
+
|
| 178 |
+
* `unknowns` is **explicit**, enabling Parlant to decide what to ask for in `GAP_FOLLOWUP`.
|
| 179 |
+
* `evidence` structure enables later criterion-level ledger to reference the same spans.
|
| 180 |
+
* This is **not** a fully normalized EHR; it’s what’s needed for prescreening.[pmc.ncbi.nlm.nih+1](https://pmc.ncbi.nlm.nih.gov/articles/PMC11612666/)
|
| 181 |
+
|
| 182 |
+
## **4.2 SearchAnchors (v1)**
|
| 183 |
+
|
| 184 |
+
Intermediate structure Gemini produces from PatientProfile to drive the MCP search:
|
| 185 |
+
|
| 186 |
+
json
|
| 187 |
+
`{`
|
| 188 |
+
`"condition": "Non-Small Cell Lung Cancer",`
|
| 189 |
+
`"subtype": "adenocarcinoma",`
|
| 190 |
+
`"biomarkers": ["EGFR exon 19 deletion"],`
|
| 191 |
+
`"stage": "IV",`
|
| 192 |
+
`"geography": {`
|
| 193 |
+
`"country": "DE",`
|
| 194 |
+
`"max_distance_km": 200`
|
| 195 |
+
`},`
|
| 196 |
+
`"age": 52,`
|
| 197 |
+
`"performance_status_max": 1,`
|
| 198 |
+
`"trial_filters": {`
|
| 199 |
+
`"recruitment_status": ["Recruiting", "Not yet recruiting"],`
|
| 200 |
+
`"phase": ["Phase 2", "Phase 3"]`
|
| 201 |
+
`},`
|
| 202 |
+
`"relaxation_order": [`
|
| 203 |
+
`"phase",`
|
| 204 |
+
`"distance",`
|
| 205 |
+
`"biomarker_strictness"`
|
| 206 |
+
`]`
|
| 207 |
+
`}`
|
| 208 |
+
|
| 209 |
+
This mirrors patient‑centric matching literature: patient characteristics \+ geography \+ site status.[nature+1](https://www.nature.com/articles/s41467-024-53081-z)
|
| 210 |
+
|
| 211 |
+
## **4.3 TrialCandidate (v1)**
|
| 212 |
+
|
| 213 |
+
Returned by ClinicalTrials MCP search and lightly normalized:
|
| 214 |
+
|
| 215 |
+
json
|
| 216 |
+
`{`
|
| 217 |
+
`"nct_id": "NCT01234567",`
|
| 218 |
+
`"title": "Phase 3 Study of Osimertinib in EGFR+ NSCLC",`
|
| 219 |
+
`"conditions": ["NSCLC"],`
|
| 220 |
+
`"phase": "Phase 3",`
|
| 221 |
+
`"status": "Recruiting",`
|
| 222 |
+
`"locations": [`
|
| 223 |
+
`{ "country": "DE", "city": "Berlin" },`
|
| 224 |
+
`{ "country": "DE", "city": "Hamburg" }`
|
| 225 |
+
`],`
|
| 226 |
+
`"age_range": { "min": 18, "max": 75 },`
|
| 227 |
+
`"fingerprint_text": "short concatenation of title + key inclusion/exclusion + keywords",`
|
| 228 |
+
`"eligibility_text": {`
|
| 229 |
+
`"inclusion": "raw inclusion criteria text ...",`
|
| 230 |
+
`"exclusion": "raw exclusion criteria text ..."`
|
| 231 |
+
`}`
|
| 232 |
+
`}`
|
| 233 |
+
|
| 234 |
+
`fingerprint_text` is purposely short and designed for Gemini reranking; full eligibility goes to MedGemma for criterion analysis.
|
| 235 |
+
|
| 236 |
+
## **4.4 EligibilityLedger (v1)**
|
| 237 |
+
|
| 238 |
+
Final artifact per trial, shown to the “clinician” or patient:
|
| 239 |
+
|
| 240 |
+
json
|
| 241 |
+
`{`
|
| 242 |
+
`"patient_id": "P001",`
|
| 243 |
+
`"nct_id": "NCT01234567",`
|
| 244 |
+
`"overall_assessment": "likely_eligible|likely_ineligible|uncertain",`
|
| 245 |
+
`"criteria": [`
|
| 246 |
+
`{`
|
| 247 |
+
`"criterion_id": "inc_1",`
|
| 248 |
+
`"type": "inclusion",`
|
| 249 |
+
`"text": "Histologically confirmed NSCLC, stage IIIB/IV",`
|
| 250 |
+
`"decision": "met|not_met|unknown",`
|
| 251 |
+
`"patient_evidence": [{ "doc_id": "clinic_1", "page": 1, "span_id": "s_12" }],`
|
| 252 |
+
`"trial_evidence": [{ "field": "eligibility_text.inclusion", "offset_start": 0, "offset_end": 80 }]`
|
| 253 |
+
`},`
|
| 254 |
+
`{`
|
| 255 |
+
`"criterion_id": "exc_3",`
|
| 256 |
+
`"type": "exclusion",`
|
| 257 |
+
`"text": "No prior treatment with immune checkpoint inhibitors",`
|
| 258 |
+
`"decision": "not_met",`
|
| 259 |
+
`"patient_evidence": [{ "doc_id": "clinic_2", "page": 3, "span_id": "s_45" }],`
|
| 260 |
+
`"trial_evidence": [{ "field": "eligibility_text.exclusion", "offset_start": 211, "offset_end": 280 }]`
|
| 261 |
+
`}`
|
| 262 |
+
`],`
|
| 263 |
+
`"gaps": [`
|
| 264 |
+
`{`
|
| 265 |
+
`"description": "Requires brain MRI within 28 days; last MRI is 45 days old",`
|
| 266 |
+
`"recommended_action": "Repeat brain MRI",`
|
| 267 |
+
`"clinical_importance": "high"`
|
| 268 |
+
`}`
|
| 269 |
+
`]`
|
| 270 |
+
`}`
|
| 271 |
+
|
| 272 |
+
This mirrors TrialGPT’s criterion‑level output (explanation \+ evidence locations \+ decision) but tuned to our multimodal extraction and PoC constraints.\[[nature](https://www.nature.com/articles/s41467-024-53081-z)\]
|
| 273 |
+
|
| 274 |
+
---
|
| 275 |
+
|
| 276 |
+
## **5\. Parlant Workflow Design (Aligned with Real Clinical Work)**
|
| 277 |
+
|
| 278 |
+
We design a **single Parlant Journey** that approximates the real-world job of a trial coordinator/oncologist team, but in a patient‑centric context.[pmc.ncbi.nlm.nih+3](https://pmc.ncbi.nlm.nih.gov/articles/PMC6685132/)
|
| 279 |
+
|
| 280 |
+
## **5.1 Journey States**
|
| 281 |
+
|
| 282 |
+
**States:**
|
| 283 |
+
|
| 284 |
+
1. `INGEST` (Document Collection)
|
| 285 |
+
2. `PRESCREEN` (Patient-Level Trial Shortlist)
|
| 286 |
+
3. `VALIDATE_TRIALS` (Trial-Level Eligibility Ledger)
|
| 287 |
+
4. `GAP_FOLLOWUP` (Patient Data Completion Loop)
|
| 288 |
+
5. `SUMMARY` (Shareable Packet & Next Steps)
|
| 289 |
+
|
| 290 |
+
## **State 1 — INGEST**
|
| 291 |
+
|
| 292 |
+
**Role in real world:** Patient (or referrer) provides records; coordinator checks if enough to do prescreen.[trialchoices+2](https://www.trialchoices.org/post/what-to-expect-during-the-clinical-trial-screening-process)
|
| 293 |
+
|
| 294 |
+
**Inputs:**
|
| 295 |
+
|
| 296 |
+
* Uploaded PDFs/images (synthetic in PoC).
|
| 297 |
+
* Lightweight metadata (age, sex, location) from user form.
|
| 298 |
+
|
| 299 |
+
**Actions:**
|
| 300 |
+
|
| 301 |
+
* Parlant calls MedGemma with multimodal input (images \+ text) to generate `PatientProfile.v1`.
|
| 302 |
+
* Parlant agent summarises back to the patient:
|
| 303 |
+
* What it understood (“You have stage IV NSCLC, ECOG 1, EGFR unknown”).
|
| 304 |
+
* What it is missing (“I did not find EGFR mutation status or recent brain MRI”).
|
| 305 |
+
|
| 306 |
+
**Transitions:**
|
| 307 |
+
|
| 308 |
+
* If **minimal prescreen dataset is present** (diagnosis \+ stage \+ ECOG \+ rough labs): → `PRESCREEN`.
|
| 309 |
+
* Else: stays in `INGEST` but triggers `GAP_FOLLOWUP`‑style prompts (“Can you upload a pathology report or discharge summary?”).
|
| 310 |
+
|
| 311 |
+
## **State 2 — PRESCREEN**
|
| 312 |
+
|
| 313 |
+
**Role in real world:** Pre‑filter to “worth reviewing” trials based on limited data.[pmc.ncbi.nlm.nih+1](https://pmc.ncbi.nlm.nih.gov/articles/PMC11612666/)
|
| 314 |
+
|
| 315 |
+
**Inputs:**
|
| 316 |
+
|
| 317 |
+
* `PatientProfile.v1`.
|
| 318 |
+
|
| 319 |
+
**Actions:**
|
| 320 |
+
|
| 321 |
+
* Gemini converts `PatientProfile` → `SearchAnchors.v1`.
|
| 322 |
+
* Parlant calls **existing ClinicalTrials MCP** with `SearchAnchors` mapping to MCP’s parameters:
|
| 323 |
+
* Condition keywords
|
| 324 |
+
* Recruitment status
|
| 325 |
+
* Phase filters
|
| 326 |
+
* Geography
|
| 327 |
+
* Trials returned as `TrialCandidate` list.
|
| 328 |
+
* Gemini reranks them using `fingerprint_text` \+ `PatientProfile` to produce a shortlist (e.g., top 20).
|
| 329 |
+
* Parlant communicates to user:
|
| 330 |
+
* “Based on your profile, I found 23 potentially relevant NSCLC trials; I’ll now check each more carefully.”
|
| 331 |
+
|
| 332 |
+
**Transitions:**
|
| 333 |
+
|
| 334 |
+
* If **0 trials** → `GAP_FOLLOWUP` (relax criteria and/or widen geography).
|
| 335 |
+
* If **\>0 trials** → `VALIDATE_TRIALS`.
|
| 336 |
+
|
| 337 |
+
This maps to patient‑centric matching described in the applied literature: single patient → candidate trials, then deeper evaluation.[trec-cds+2](https://www.trec-cds.org/2021.html)
|
| 338 |
+
|
| 339 |
+
## **State 3 — VALIDATE\_TRIALS**
|
| 340 |
+
|
| 341 |
+
**Role in real world:** Detailed chart review vs full eligibility criteria.[pmc.ncbi.nlm.nih+1](https://pmc.ncbi.nlm.nih.gov/articles/PMC6685132/)
|
| 342 |
+
|
| 343 |
+
**Inputs:**
|
| 344 |
+
|
| 345 |
+
* Shortlisted `TrialCandidate` (e.g., top 10–20).
|
| 346 |
+
|
| 347 |
+
**Actions:**
|
| 348 |
+
|
| 349 |
+
For each trial in shortlist:
|
| 350 |
+
|
| 351 |
+
1. Gemini slices inclusion/exclusion text into atomic criteria (each with an ID and text).
|
| 352 |
+
2. For each criterion:
|
| 353 |
+
* Parlant calls **MedGemma** with:
|
| 354 |
+
* `PatientProfile` \+ selected patient evidence snippets (and where available, underlying images).
|
| 355 |
+
* Criterion text snippet.
|
| 356 |
+
* MedGemma outputs:
|
| 357 |
+
* `decision: met/not_met/unknown`.
|
| 358 |
+
* `patient_evidence` span references (doc/page/span\_id).
|
| 359 |
+
3. Parlant aggregates per‑trial into `EligibilityLedger.v1`.
|
| 360 |
+
|
| 361 |
+
**Outputs:**
|
| 362 |
+
|
| 363 |
+
* A ranked list of trials with:
|
| 364 |
+
* Traffic‑light label (green/yellow/red) for overall eligibility (+ explanation).
|
| 365 |
+
* Criterion‑level breakdowns & evidence pointers.
|
| 366 |
+
|
| 367 |
+
**Transitions:**
|
| 368 |
+
|
| 369 |
+
* If **no trial has any green/yellow** (all clearly ineligible):
|
| 370 |
+
* `GAP_FOLLOWUP` to explore whether missing data (e.g., outdated labs) could change this.
|
| 371 |
+
* Else:
|
| 372 |
+
* Offer `SUMMARY` while keeping `GAP_FOLLOWUP` open.
|
| 373 |
+
|
| 374 |
+
## **State 4 — GAP\_FOLLOWUP**
|
| 375 |
+
|
| 376 |
+
**Role in real world:** Additional tests/data to confirm eligibility (e.g., labs, imaging).[pfizerclinicaltrials+2](https://www.pfizerclinicaltrials.com/about/steps-to-join)
|
| 377 |
+
|
| 378 |
+
**Inputs:**
|
| 379 |
+
|
| 380 |
+
* `PatientProfile.unknowns` \+ `EligibilityLedger.gaps`.
|
| 381 |
+
|
| 382 |
+
**Actions:**
|
| 383 |
+
|
| 384 |
+
* Gemini synthesizes the **minimal actionable set** of missing data:
|
| 385 |
+
* E.g., “Most promising trials require: (1) current EGFR mutation status, (2) brain MRI \< 28 days old.”
|
| 386 |
+
* Parlant:
|
| 387 |
+
* Poses this to the patient in simple language.
|
| 388 |
+
* For PoC, user (you, or script) uploads new synthetic documents representing those tests.
|
| 389 |
+
* On new upload, we go back through `INGEST` → update `PatientProfile` → fast‑path direct to `PRESCREEN`/`VALIDATE_TRIALS`.
|
| 390 |
+
|
| 391 |
+
**Transitions:**
|
| 392 |
+
|
| 393 |
+
* On new docs → `INGEST` (update and re‑run).
|
| 394 |
+
* If user declines or no additional data possible → `SUMMARY` with clear explanation (“Here’s why current trials don’t fit”).
|
| 395 |
+
|
| 396 |
+
## **State 5 — SUMMARY**
|
| 397 |
+
|
| 398 |
+
**Role in real world:** Coordinator/oncologist summarises findings, shares options, and discusses next steps.[pfizerclinicaltrials+2](https://www.pfizerclinicaltrials.com/about/steps-to-join)
|
| 399 |
+
|
| 400 |
+
**Inputs:**
|
| 401 |
+
|
| 402 |
+
* Final `PatientProfile`.
|
| 403 |
+
* Set of `EligibilityLedger` objects for top trials.
|
| 404 |
+
* List of `gaps`.
|
| 405 |
+
|
| 406 |
+
**Actions:**
|
| 407 |
+
|
| 408 |
+
* Generate:
|
| 409 |
+
* **Patient‑friendly summary**: 3–5 bullet explanation of matches.
|
| 410 |
+
* **Clinician packet**: aggregated ledger and evidence pointers, referencing doc IDs and trial NCT IDs.
|
| 411 |
+
* For PoC: show in UI \+ downloadable JSON/Markdown.
|
| 412 |
+
|
| 413 |
+
**Transitions:**
|
| 414 |
+
|
| 415 |
+
* End of Journey.
|
| 416 |
+
|
| 417 |
+
---
|
| 418 |
+
|
| 419 |
+
## **6\. General Patient Plan (Synthetic Data Flow)**
|
| 420 |
+
|
| 421 |
+
We simulate realistic but synthetic patients, and run them through exactly the above journey.
|
| 422 |
+
|
| 423 |
+
## **6.1 Synthetic Patient Generation & Formats**
|
| 424 |
+
|
| 425 |
+
**Source:**
|
| 426 |
+
|
| 427 |
+
* TREC Clinical Trials Track 2021/2022 patient topics (free‑text vignettes) as the ground truth for “what the patient’s story should convey”.[trec-cds+3](https://www.trec-cds.org/2022.html)
|
| 428 |
+
* Synthea or custom scripts to generate structured NSCLC trajectories consistent with those vignettes (for additional fields we want).
|
| 429 |
+
|
| 430 |
+
**Artifacts per patient:**
|
| 431 |
+
|
| 432 |
+
1. **Clinic letter PDF**
|
| 433 |
+
* Plain text \+ embedded logo; maybe 1–2 key tables (comorbidities, meds).
|
| 434 |
+
2. **Biomarker/pathology PDF**
|
| 435 |
+
* EGFR/ALK/PD‑L1 etc, with small table or scanned‑like image.
|
| 436 |
+
3. **Lab report PDF**
|
| 437 |
+
* Hematology and chemistry values, with dates.
|
| 438 |
+
4. **Imaging report PDF** (+ optional illustrative image)
|
| 439 |
+
* Brain MRI/CT narrative with lesion description; maybe a low‑res “snapshot” image.
|
| 440 |
+
|
| 441 |
+
Each artifact is saved with metadata mapping to the underlying TREC topic (so we can label what the “true” conditions/stage/biomarkers are).
|
| 442 |
+
|
| 443 |
+
## **6.2 Patient Journey (Narrative)**
|
| 444 |
+
|
| 445 |
+
For each synthetic patient “Anna”:
|
| 446 |
+
|
| 447 |
+
1. **Pre‑visit (INGEST)**
|
| 448 |
+
* Anna (or a proxy) uploads her documents to the copilot.
|
| 449 |
+
* MedGemma extracts a `PatientProfile`.
|
| 450 |
+
* Parlant confirms: “You have stage IV NSCLC with ECOG 1 and prior pembrolizumab; I don’t see your EGFR mutation test yet.”
|
| 451 |
+
2. **Prescreen (PRESCREEN)**
|
| 452 |
+
* Using `SearchAnchors`, trials are fetched via ClinicalTrials MCP.
|
| 453 |
+
* The system returns, e.g., 30 candidates; after reranking, top 10 are selected for validation.
|
| 454 |
+
3. **Trial Validation (VALIDATE\_TRIALS)**
|
| 455 |
+
* For each of top 10, the eligibility ledger is computed.
|
| 456 |
+
* System identifies, say, 3 trials with many green criteria but a few unknowns (e.g., recent brain MRI).
|
| 457 |
+
4. **Gap‑Driven Iteration (GAP\_FOLLOWUP)**
|
| 458 |
+
* Copilot: “You likely qualify for trial NCT01234567 if you have a brain MRI within the last 28 days. Your last MRI is 45 days ago. If your doctor orders a new MRI and the report shows no active brain metastases, you may qualify. For this PoC, you can upload a ‘new MRI report’ file to simulate this.”
|
| 459 |
+
* New synthetic PDF is uploaded; `PatientProfile` is updated.
|
| 460 |
+
5. **Re‑match & Summary (PRESCREEN → VALIDATE\_TRIALS → SUMMARY)**
|
| 461 |
+
* System re‑runs with updated `PatientProfile`.
|
| 462 |
+
* Now 3 trials are “likely eligible”, with red flags on only non‑critical criteria.
|
| 463 |
+
* Copilot generates:
|
| 464 |
+
* Patient summary: “Here are three trials that look promising for your situation, and why.”
|
| 465 |
+
* Clinician packet: ledger \+ evidence pointers that mimic a coordinator’s notes.
|
| 466 |
+
|
| 467 |
+
This general patient plan is consistent across synthetic cases but parameterized by each TREC topic (e.g. biomarker variant, comorbidity pattern).
|
| 468 |
+
|
| 469 |
+
---
|
| 470 |
+
|
| 471 |
+
## **7\. How This Plan Fixes Earlier Gaps**
|
| 472 |
+
|
| 473 |
+
1. **No custom trial search stack**
|
| 474 |
+
* We explicitly plug into existing ClinicalTrials MCPs built for LLM agents, aligning with your “don’t reinvent the wheel” constraint and drastically lowering infra risk in 2 weeks.[github+2](https://github.com/cyanheads/clinicaltrialsgov-mcp-server)
|
| 475 |
+
2. **Parlant used as a real workflow engine, not just a wrapper**
|
| 476 |
+
* States mirror prescreen vs validation vs gap‑closure described in empirical screening studies and trial‑matching frameworks.[appliedclinicaltrialsonline+3](https://www.appliedclinicaltrialsonline.com/view/clinical-trial-matching-solutions-understanding-the-landscape)
|
| 477 |
+
* Parlant becomes the place where you encode “when do we ask a human for more information vs when do we refine a query vs when do we stop?”
|
| 478 |
+
3. **Patient plan grounded in real‑world processes**
|
| 479 |
+
* The synthetic patient journey isn’t just “upload docs → list trials.”
|
| 480 |
+
* It follows actual clinical workflows: minimal dataset, prescreen, chart review, additional tests, and finally discussion/summary.[trialchoices+3](https://www.trialchoices.org/post/what-to-expect-during-the-clinical-trial-screening-process)
|
| 481 |
+
4. **Minimal, testable contracts**
|
| 482 |
+
* PatientProfile, SearchAnchors, TrialCandidate, EligibilityLedger together give you:
|
| 483 |
+
* Places to measure MedGemma extraction F1.
|
| 484 |
+
* Places to plug TREC qrels (TrialCandidate → NDCG@10).[arxiv+2](https://arxiv.org/pdf/2202.07858.pdf)
|
| 485 |
+
* They’re small enough to implement quickly but rich enough to survive PoC → MVP.
|
| 486 |
+
|
| 487 |
+
Source: [https://www.perplexity.ai/search/simulate-as-an-experienced-cto-i6TIXOP9TX.rqA97awuc1Q?sm=d\#3](https://www.perplexity.ai/search/simulate-as-an-experienced-cto-i6TIXOP9TX.rqA97awuc1Q?sm=d#3)
|
docs/Trialpath PRD.md
ADDED
|
@@ -0,0 +1,246 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# HAI-DEF Pitch: MedGemma Match – Patient Trial Copilot
|
| 2 |
+
|
| 3 |
+
**PoC Goal:** Demonstrate MedGemma + Gemini 3 Pro + Parlant agentic architecture for patient-facing clinical trial matching with **explainable eligibility reasoning** and **iterative gap-filling**.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## 1. Problem & Unmet Need
|
| 8 |
+
|
| 9 |
+
### The Challenge
|
| 10 |
+
- **Low trial participation:** <5% of adult cancer patients enroll in clinical trials despite potential eligibility
|
| 11 |
+
- **Complex eligibility criteria:** Free-text criteria mix demographics, biomarkers, labs, imaging findings, and treatment history
|
| 12 |
+
- **Patient barrier:** Patients receive PDFs/reports but have no way to understand which trials fit their situation
|
| 13 |
+
- **Manual screening burden:** Clinicians spend hours per patient manually reviewing eligibility; automated tools show mixed real-world performance
|
| 14 |
+
|
| 15 |
+
### Why AI? Why Now?
|
| 16 |
+
- Eligibility criteria require synthesis across multiple document types (pathology, labs, imaging, treatment history)—impossible with keyword search alone
|
| 17 |
+
- Recent LLM-based matching systems (TrialGPT, PRISM) show promise but lack patient-centric design and multimodal medical understanding
|
| 18 |
+
- HAI-DEF open-weight health models enable privacy-preserving deployment with medical domain expertise
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
## 2. Solution: MedGemma as Clinical Understanding Engine
|
| 23 |
+
|
| 24 |
+
### Core Concept
|
| 25 |
+
**"Agentic Search + Multimodal Extraction"** replacing traditional vector-RAG approaches.
|
| 26 |
+
|
| 27 |
+
**Architecture:**
|
| 28 |
+
- **MedGemma (HAI-DEF):** Extracts structured clinical facts from messy PDFs/reports + understands medical imaging contexts
|
| 29 |
+
- **Gemini 3 Pro:** Orchestrates agentic search through ClinicalTrials.gov API with iterative query refinement
|
| 30 |
+
- **Parlant:** Enforces state machine (search → filter → verify) and prevents parameter hallucination
|
| 31 |
+
- **ClinicalTrials MCP:** Structured API wrapper for trials data (no vector DB needed)
|
| 32 |
+
|
| 33 |
+
### Why MedGemma is Central (Not Replaceable)
|
| 34 |
+
1. **Multimodal medical reasoning:** Designed for radiology reports, pathology, labs—where generic LLMs are weaker
|
| 35 |
+
2. **Domain-aligned extraction:** Medical entity recognition with units, dates, and clinical context preservation
|
| 36 |
+
3. **Open weights:** Enables VPC deployment for future PHI handling (vs closed-weight alternatives)
|
| 37 |
+
4. **Health-safety guardrails:** Model card emphasizes validation/adaptation patterns we follow
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
## 3. User Journey (Patient-Centric)
|
| 42 |
+
|
| 43 |
+
### Target User (PoC Persona)
|
| 44 |
+
**"Anna"** – 52-year-old NSCLC patient in Berlin with PDFs from her oncologist but no trial navigation support.
|
| 45 |
+
|
| 46 |
+
### Journey Flow
|
| 47 |
+
1. **Upload Documents** → Clinic letter, pathology report, lab results (synthetic PDFs in PoC)
|
| 48 |
+
2. **MedGemma Extraction** → System builds "My Clinical Profile (draft)": Stage IVa, EGFR status unknown, ECOG 1
|
| 49 |
+
3. **Agentic Search** → Gemini queries ClinicalTrials.gov via MCP:
|
| 50 |
+
- Initial: `condition=NSCLC, location=DE, status=RECRUITING, keywords=EGFR` → 47 results
|
| 51 |
+
- Refines: Adds `phase=PHASE3` → 12 results
|
| 52 |
+
- Reads summaries, filters to 5 relevant trials
|
| 53 |
+
4. **Eligibility Analysis** → For each trial, MedGemma evaluates criteria against extracted facts
|
| 54 |
+
5. **Gap Identification** → System highlights: *"You'd likely qualify IF you had EGFR mutation test"*
|
| 55 |
+
6. **Iteration** → Anna uploads biomarker report → System re-matches → 3 new trials appear
|
| 56 |
+
7. **Share with Doctor** → Generate clinician packet with evidence-linked eligibility ledger
|
| 57 |
+
|
| 58 |
+
### Key Differentiator: The "Gap Analysis"
|
| 59 |
+
- We don't just say "No Match"
|
| 60 |
+
- We say: **"You would match NCT12345 IF you had: recent brain MRI showing no active CNS disease"**
|
| 61 |
+
- This transforms "rejection" into "actionable next steps"
|
| 62 |
+
|
| 63 |
+
---
|
| 64 |
+
|
| 65 |
+
## 4. Technical Innovation: Smart Agentic Search (No Vector DB)
|
| 66 |
+
|
| 67 |
+
### Traditional Approach (What We're *Not* Doing)
|
| 68 |
+
```
|
| 69 |
+
Patient text → Embeddings → Vector similarity search →
|
| 70 |
+
Retrieve top-K trials → LLM re-ranks
|
| 71 |
+
```
|
| 72 |
+
**Problem:** Vector search is "dumb" about structured constraints (Phase, Location, Status) and negations.
|
| 73 |
+
|
| 74 |
+
### Our Approach: Iterative Query Refinement
|
| 75 |
+
```
|
| 76 |
+
MedGemma extracts "Search Anchors" (Condition, Biomarkers, Location) →
|
| 77 |
+
Gemini formulates API query with filters →
|
| 78 |
+
ClinicalTrials MCP returns results →
|
| 79 |
+
Too many (>50)? → Parlant enforces refinement (add phase/keywords)
|
| 80 |
+
Too few (0)? → Parlant enforces relaxation (remove city filter)
|
| 81 |
+
Right size (10-30)? → Gemini reads summaries in 2M context window →
|
| 82 |
+
Shortlist 5 NCT IDs → Deep eligibility verification with MedGemma
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
**Why This is Better:**
|
| 86 |
+
- **Precision:** Leverages native API filters (Phase, Status, Location) that vectors can't handle
|
| 87 |
+
- **Transparency:** Every search step is logged and explainable ("I searched X, got Y results, refined to Z")
|
| 88 |
+
- **Feasibility:** No vector DB infrastructure; uses live API
|
| 89 |
+
- **Showcases Gemini reasoning:** Demonstrates multi-step planning vs one-shot retrieval
|
| 90 |
+
|
| 91 |
+
---
|
| 92 |
+
|
| 93 |
+
## 5. MedGemma Showcase Moments (HAI-DEF "Fullest Potential")
|
| 94 |
+
|
| 95 |
+
### Use Case 1: Temporal Lab Extraction
|
| 96 |
+
**Challenge:** Criterion requires "ANC ≥ 1.5 �� 10⁹/L within 14 days of enrollment"
|
| 97 |
+
- **MedGemma extracts:** Value=1.8, Units=10⁹/L, Date=2026-01-28, DocID=labs_jan.pdf
|
| 98 |
+
- **System verifies:** Current date Feb 4 → 7 days ago → ✓ MEETS criterion
|
| 99 |
+
- **Evidence link:** User can click to see exact lab table and date
|
| 100 |
+
|
| 101 |
+
### Use Case 2: Multimodal Imaging Context
|
| 102 |
+
**Challenge:** Criterion requires "No active CNS metastases"
|
| 103 |
+
- **MedGemma reads:** Brain MRI report text: *"Stable 3mm left frontal lesion, no enhancement, likely scarring from prior SRS"*
|
| 104 |
+
- **System interprets:** "Stable" + "no enhancement" + "scarring" → Likely inactive → Flags as ⚠️ UNKNOWN (requires clinician confirmation)
|
| 105 |
+
- **Evidence link:** Highlights report section for doctor review
|
| 106 |
+
|
| 107 |
+
### Use Case 3: Treatment Line Reconstruction
|
| 108 |
+
**Challenge:** Criterion excludes "Prior immune checkpoint inhibitor therapy"
|
| 109 |
+
- **MedGemma reconstructs:** From medication list and notes → Patient received Pembrolizumab 2024-06 to 2024-11
|
| 110 |
+
- **System verifies:** → ✗ EXCLUDED
|
| 111 |
+
- **Evidence link:** Shows medication timeline with dates and sources
|
| 112 |
+
|
| 113 |
+
---
|
| 114 |
+
|
| 115 |
+
## 6. PoC Scope & Data Strategy
|
| 116 |
+
|
| 117 |
+
### In Scope (3-Month PoC)
|
| 118 |
+
- **Disease:** NSCLC only (complex biomarkers, high trial volume)
|
| 119 |
+
- **Data:** Synthetic patients only (no real PHI)
|
| 120 |
+
- **Deliverables:**
|
| 121 |
+
- Working web prototype (video demo)
|
| 122 |
+
- Experimental validation on TREC benchmarks
|
| 123 |
+
- Technical write-up + public code repo
|
| 124 |
+
|
| 125 |
+
### Data Sources
|
| 126 |
+
**Patients (Synthetic):**
|
| 127 |
+
- Structured ground truth: Synthea FHIR (500 NSCLC patients)
|
| 128 |
+
- Unstructured artifacts: LLM-generated clinic letters + lab PDFs with controlled noise (abbreviations, OCR errors, missing values)
|
| 129 |
+
|
| 130 |
+
**Trials (Real):**
|
| 131 |
+
- ClinicalTrials.gov live API via MCP wrapper
|
| 132 |
+
- Focus on NSCLC recruiting trials in Europe + US
|
| 133 |
+
|
| 134 |
+
**Benchmarking:**
|
| 135 |
+
- TREC Clinical Trials Track 2021/2022 (75 patient topics + judged relevance)
|
| 136 |
+
- Custom criterion-extraction test set (labeled synthetic reports)
|
| 137 |
+
|
| 138 |
+
---
|
| 139 |
+
|
| 140 |
+
## 7. Success Metrics & Evaluation Plan
|
| 141 |
+
|
| 142 |
+
### Model Performance
|
| 143 |
+
| Metric | Target | Baseline | Method |
|
| 144 |
+
|--------|--------|----------|--------|
|
| 145 |
+
| **MedGemma Extraction F1** | ≥0.85 | Gemini-only: 0.65-0.75 | Field-level (stage, ECOG, biomarkers, labs) on labeled synthetic reports |
|
| 146 |
+
| **Trial Retrieval Recall@50** | ≥0.75 | BM25: ~0.60 | TREC 2021 patient topics |
|
| 147 |
+
| **Trial Ranking NDCG@10** | ≥0.60 | Non-LLM baseline: ~0.45 | TREC judged relevance |
|
| 148 |
+
| **Criterion Decision Accuracy** | ≥0.85 | Rule-based: ~0.70 | Per-criterion classification on synthetic patient-trial pairs |
|
| 149 |
+
|
| 150 |
+
### Product Quality
|
| 151 |
+
- **Latency:** <15s from upload to first match results
|
| 152 |
+
- **Explainability:** 100% of "met/not met" decisions must include evidence pointer (trial text + patient doc ID)
|
| 153 |
+
- **Cost:** <$0.50 per patient session (token + GPU usage)
|
| 154 |
+
|
| 155 |
+
### UX Validation (Small Study)
|
| 156 |
+
- Task completion: Can lay users identify ≥1 plausible trial from shortlist?
|
| 157 |
+
- Explanation clarity: SUS-style usability score ≥70
|
| 158 |
+
- Reading level: B1/8th-grade equivalent (Flesch-Kincaid)
|
| 159 |
+
|
| 160 |
+
---
|
| 161 |
+
|
| 162 |
+
## 8. Impact Potential
|
| 163 |
+
|
| 164 |
+
### If PoC Succeeds (Quantified)
|
| 165 |
+
**Near-term (PoC phase):**
|
| 166 |
+
- Demonstrate 15-25% relative improvement in ranking quality (NDCG) vs non-LLM baselines on TREC benchmarks
|
| 167 |
+
- Show multimodal extraction advantage: MedGemma F1 ≥0.10 higher than Gemini-only on medical fields
|
| 168 |
+
|
| 169 |
+
**Post-PoC (Real-world projection):**
|
| 170 |
+
- **Patient impact:** Based on literature showing automated tools can surface 20-30% more eligible trials vs manual search, and considering NSCLC patients often face 50+ active trials but only learn about 2-3 from their oncologist
|
| 171 |
+
- **Clinician impact:** Trial coordinators report spending 2-4 hours per patient on manual screening; if our tool pre-screens with 85% sensitivity, reduces manual verification by ~60%
|
| 172 |
+
- **Trial enrollment:** Even a 10% increase in eligible patient identification could improve trial recruitment timelines (major pharma pain point)
|
| 173 |
+
|
| 174 |
+
---
|
| 175 |
+
|
| 176 |
+
## 9. Risks & Mitigations
|
| 177 |
+
|
| 178 |
+
| Risk | Mitigation |
|
| 179 |
+
|------|-----------|
|
| 180 |
+
| **Synthetic data too clean** | Add controlled noise to PDFs (OCR errors, abbreviations); validate against TREC which uses realistic synthetic cases |
|
| 181 |
+
| **MedGemma hallucination on edge cases** | Implement evidence-pointer system (every decision must cite doc ID + span); flag low-confidence as "unknown" not "met" |
|
| 182 |
+
| **API rate limits** | Cache trial protocols; batch requests during search refinement |
|
| 183 |
+
| **Regulatory misunderstanding** | Explicit "information only, not medical advice" framing throughout UI; follow MedGemma model card guidance on validation/adaptation |
|
| 184 |
+
|
| 185 |
+
---
|
| 186 |
+
|
| 187 |
+
## 10. Deliverables for HAI-DEF Submission
|
| 188 |
+
|
| 189 |
+
### Video Demo (~5-7 min)
|
| 190 |
+
- Patient persona introduction
|
| 191 |
+
- Upload → extraction visualization (showing MedGemma in action)
|
| 192 |
+
- Agentic search loop (showing query refinement)
|
| 193 |
+
- Match results with traffic-light eligibility cards
|
| 194 |
+
- Gap-filling iteration (upload biomarker → new matches)
|
| 195 |
+
- "Share with doctor" packet generation
|
| 196 |
+
|
| 197 |
+
### Technical Write-up
|
| 198 |
+
1. Problem + why HAI-DEF models
|
| 199 |
+
2. Architecture diagram (Parlant journey + MedGemma + Gemini + MCP)
|
| 200 |
+
3. Data generation pipeline
|
| 201 |
+
4. Experiments: extraction, retrieval, ranking (tables + ablations)
|
| 202 |
+
5. Limitations + path to real PHI deployment
|
| 203 |
+
|
| 204 |
+
### Code Repository
|
| 205 |
+
- `data/generate_synthetic_patients.py`
|
| 206 |
+
- `data/generate_noisy_pdfs.py`
|
| 207 |
+
- `matching/medgemma_extractor.py`
|
| 208 |
+
- `matching/agentic_search.py` (Parlant + Gemini + MCP)
|
| 209 |
+
- `evaluation/run_trec_benchmark.py`
|
| 210 |
+
- Clear README with one-command reproducibility
|
| 211 |
+
|
| 212 |
+
---
|
| 213 |
+
|
| 214 |
+
## 11. Why This Wins HAI-DEF
|
| 215 |
+
|
| 216 |
+
### Effective Use of Models (20%)
|
| 217 |
+
✓ MedGemma as primary clinical understanding engine (extraction + multimodal)
|
| 218 |
+
✓ Concrete demos showing where non-HAI-DEF models fail (extraction accuracy gaps)
|
| 219 |
+
✓ Plan for task-specific evaluation showing measurable improvement
|
| 220 |
+
|
| 221 |
+
### Problem Domain (15%)
|
| 222 |
+
✓ Clear unmet need (low trial enrollment, manual screening burden)
|
| 223 |
+
✓ Patient-centric storytelling ("Anna's journey")
|
| 224 |
+
✓ Evidence-based magnitude (enrollment stats, screening time data)
|
| 225 |
+
|
| 226 |
+
### Impact Potential (15%)
|
| 227 |
+
✓ Quantified near-term (benchmark improvements) and long-term (enrollment lift) impact
|
| 228 |
+
✓ Clear calculation logic grounded in literature
|
| 229 |
+
|
| 230 |
+
### Product Feasibility (20%)
|
| 231 |
+
✓ Detailed technical architecture (agentic search innovation)
|
| 232 |
+
✓ Realistic synthetic data strategy
|
| 233 |
+
✓ Concrete evaluation plan with baselines
|
| 234 |
+
✓ Deployment considerations (latency, cost, safety)
|
| 235 |
+
|
| 236 |
+
### Execution & Communication (30%)
|
| 237 |
+
✓ Cohesive narrative across video + write-up + code
|
| 238 |
+
✓ Reproducible experiments
|
| 239 |
+
✓ Clear explanation of design choices
|
| 240 |
+
✓ Professional polish (evidence pointers, explanations, UX details)
|
| 241 |
+
|
| 242 |
+
---
|
| 243 |
+
|
| 244 |
+
**Timeline:** 3 months to PoC demo ready for HAI-DEF submission.
|
| 245 |
+
|
| 246 |
+
**Team needs:** 1 ML engineer (MedGemma fine-tuning + evaluation), 1 full-stack engineer (web app + Parlant orchestration), 1 CPO (coordination + submission materials).
|
docs/tdd-guide-backend-service.md
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
docs/tdd-guide-data-evaluation.md
ADDED
|
@@ -0,0 +1,2384 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# TrialPath 数据与评估管线 TDD 实现指南
|
| 2 |
+
|
| 3 |
+
> 基于 DeepWiki、TREC 官方文档、ir-measures/ir_datasets 库深度研究产出
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## 1. 管线架构概览
|
| 8 |
+
|
| 9 |
+
### 1.1 数据流图
|
| 10 |
+
|
| 11 |
+
```
|
| 12 |
+
┌─────────────────────────────────────────────────────────────────┐
|
| 13 |
+
│ Data & Evaluation Pipeline │
|
| 14 |
+
├─────────────────────────────────────────────────────────────────┤
|
| 15 |
+
│ │
|
| 16 |
+
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
|
| 17 |
+
│ │ Synthea │───▶│ FHIR Bundle │───▶│ PatientProfile │ │
|
| 18 |
+
│ │ (Java CLI) │ │ (JSON) │ │ (JSON Schema) │ │
|
| 19 |
+
│ └──────────────┘ └──────────────┘ └────────┬─────────┘ │
|
| 20 |
+
│ │ │
|
| 21 |
+
│ ┌──────────────┐ ┌──────────────┐ ▼ │
|
| 22 |
+
│ │ LLM Letter │───▶│ ReportLab │───▶ Noisy Clinical PDFs │
|
| 23 |
+
│ │ Generator │ │ + Augraphy │ (Letters/Labs/Path) │
|
| 24 |
+
│ └──────────────┘ └──────────────┘ │
|
| 25 |
+
│ │
|
| 26 |
+
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
|
| 27 |
+
│ │ MedGemma │───▶│ Extracted │───▶│ F1 Evaluator │ │
|
| 28 |
+
│ │ Extractor │ │ Profile │ │ (scikit-learn) │ │
|
| 29 |
+
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
|
| 30 |
+
│ │
|
| 31 |
+
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
|
| 32 |
+
│ │ TREC Topics │───▶│ TrialPath │───▶│ TREC Evaluator │ │
|
| 33 |
+
│ │ (ir_datasets)│ │ Matching │ │ (ir-measures) │ │
|
| 34 |
+
│ └──────────────┘ └──────────────┘ └──────────────────┘ │
|
| 35 |
+
│ │
|
| 36 |
+
└─────────────────────────────────────────────────────────────────┘
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
### 1.2 模块关系
|
| 40 |
+
|
| 41 |
+
| 模块 | 输入 | 输出 | 依赖 |
|
| 42 |
+
|------|------|------|------|
|
| 43 |
+
| `data/generate_synthetic_patients.py` | Synthea FHIR Bundles | `PatientProfile` JSON + Ground Truth | Synthea CLI, FHIR R4 |
|
| 44 |
+
| `data/generate_noisy_pdfs.py` | `PatientProfile` JSON | Clinical PDFs (带噪声) | ReportLab, Augraphy |
|
| 45 |
+
| `evaluation/run_trec_benchmark.py` | TREC Topics + TrialPath Run | Recall@50, NDCG@10, P@10 | ir_datasets, ir-measures |
|
| 46 |
+
| `evaluation/extraction_eval.py` | Extracted vs Ground Truth Profiles | Field-level F1 | scikit-learn |
|
| 47 |
+
| `evaluation/criterion_eval.py` | EligibilityLedger vs Gold Standard | Criterion Accuracy | scikit-learn |
|
| 48 |
+
| `evaluation/latency_cost_tracker.py` | API call logs | Latency/Cost reports | time, logging |
|
| 49 |
+
|
| 50 |
+
### 1.3 目录结构
|
| 51 |
+
|
| 52 |
+
```
|
| 53 |
+
data/
|
| 54 |
+
├── generate_synthetic_patients.py # Synthea FHIR → PatientProfile
|
| 55 |
+
├── generate_noisy_pdfs.py # PatientProfile → Clinical PDFs
|
| 56 |
+
├── synthea_config/
|
| 57 |
+
│ ├── synthea.properties # Synthea 配置
|
| 58 |
+
│ └── modules/
|
| 59 |
+
│ └── lung_cancer_extended.json # 扩展 NSCLC 模块 (含 biomarkers)
|
| 60 |
+
├── templates/
|
| 61 |
+
│ ├── clinical_letter.py # 临床信件模板
|
| 62 |
+
│ ├── pathology_report.py # 病理报告模板
|
| 63 |
+
│ ├── lab_report.py # 实验室报告模板
|
| 64 |
+
│ └── imaging_report.py # 影像报告模板
|
| 65 |
+
├── noise/
|
| 66 |
+
│ └── noise_injector.py # 噪声注入引擎
|
| 67 |
+
└── output/
|
| 68 |
+
├── fhir/ # Synthea 原始 FHIR 输出
|
| 69 |
+
├── profiles/ # 转换后的 PatientProfile JSON
|
| 70 |
+
├── pdfs/ # 生成的临床 PDF
|
| 71 |
+
└── ground_truth/ # 标注数据
|
| 72 |
+
|
| 73 |
+
evaluation/
|
| 74 |
+
├── run_trec_benchmark.py # TREC 检索评估
|
| 75 |
+
├── extraction_eval.py # MedGemma 提取 F1
|
| 76 |
+
├── criterion_eval.py # Criterion Decision Accuracy
|
| 77 |
+
├── latency_cost_tracker.py # 延迟与成本追踪
|
| 78 |
+
├── trec_data/
|
| 79 |
+
│ ├── topics2021.xml # TREC 2021 topics
|
| 80 |
+
│ ├── qrels2021.txt # TREC 2021 relevance judgments
|
| 81 |
+
│ └── topics2022.xml # TREC 2022 topics
|
| 82 |
+
└── reports/ # 评估报告输出
|
| 83 |
+
|
| 84 |
+
tests/
|
| 85 |
+
├── test_synthea_data.py # Synthea 数据验证
|
| 86 |
+
├── test_pdf_generation.py # PDF 生成正确性
|
| 87 |
+
├── test_noise_injection.py # 噪声注入效果
|
| 88 |
+
├── test_trec_evaluation.py # TREC 评估计算
|
| 89 |
+
├── test_extraction_f1.py # F1 计算测试
|
| 90 |
+
├── test_latency_cost.py # 延迟成本测试
|
| 91 |
+
└── test_e2e_pipeline.py # 端到端管线测试
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
---
|
| 95 |
+
|
| 96 |
+
## 2. Synthea 合成患者生成指南
|
| 97 |
+
|
| 98 |
+
### 2.1 Synthea 概述
|
| 99 |
+
|
| 100 |
+
Synthea 是 MITRE 开发的开源合成患者模拟器,基于 Java 实现。它通过 JSON 状态机模块模拟疾病轨迹,输出标准 FHIR R4 Bundle。
|
| 101 |
+
|
| 102 |
+
**关键特性(来源:DeepWiki synthetichealth/synthea):**
|
| 103 |
+
- 基于模块的疾病模拟:每种疾病定义为 JSON 状态机
|
| 104 |
+
- 支持 FHIR R4/STU3/DSTU2 导出
|
| 105 |
+
- 内置 `lung_cancer.json` 模块,85% NSCLC / 15% SCLC 分布
|
| 106 |
+
- 支持 Stage I-IV 分期和化疗/放疗治疗路径
|
| 107 |
+
- **不含 NSCLC 特异性 biomarkers(EGFR, ALK, PD-L1, KRAS, ROS1)—— 需要自定义扩展**
|
| 108 |
+
|
| 109 |
+
### 2.2 安装和配置
|
| 110 |
+
|
| 111 |
+
**系统要求:**
|
| 112 |
+
- Java JDK 11 或更高版本(推荐 LTS 11 或 17)
|
| 113 |
+
|
| 114 |
+
**安装方式 A:直接使用 JAR(推荐用于数据生成)**
|
| 115 |
+
```bash
|
| 116 |
+
# 下载最新 release JAR
|
| 117 |
+
# 从 https://github.com/synthetichealth/synthea/releases 获取
|
| 118 |
+
wget https://github.com/synthetichealth/synthea/releases/download/master-branch-latest/synthea-with-dependencies.jar
|
| 119 |
+
|
| 120 |
+
# 验证安装
|
| 121 |
+
java -jar synthea-with-dependencies.jar --help
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
**安装方式 B:从源码构建(需要自定义模块时使用)**
|
| 125 |
+
```bash
|
| 126 |
+
git clone https://github.com/synthetichealth/synthea.git
|
| 127 |
+
cd synthea
|
| 128 |
+
./gradlew build check test
|
| 129 |
+
```
|
| 130 |
+
|
| 131 |
+
### 2.3 NSCLC 模块配置
|
| 132 |
+
|
| 133 |
+
#### 2.3.1 现有 lung_cancer 模块分析
|
| 134 |
+
|
| 135 |
+
来源:DeepWiki 对 `synthetichealth/synthea` 的 `lung_cancer.json` 模块分析:
|
| 136 |
+
|
| 137 |
+
- **入口条件**:45-65 岁人群,基于概率计算
|
| 138 |
+
- **诊断流程**:症状(咳嗽、咯血、气短) → 胸部 X 光 → 胸部 CT → 活检/细胞学
|
| 139 |
+
- **分型**:85% NSCLC,15% SCLC
|
| 140 |
+
- **分期**:Stage I-IV,基于 `lung_cancer_nondiagnosis_counter`
|
| 141 |
+
- **治疗**:NSCLC 使用 Cisplatin + Paclitaxel → 放疗
|
| 142 |
+
|
| 143 |
+
#### 2.3.2 自定义 NSCLC Biomarker 扩展模块
|
| 144 |
+
|
| 145 |
+
由于原生模块不含 EGFR/ALK/PD-L1 等 biomarkers,需要创建扩展子模块。
|
| 146 |
+
|
| 147 |
+
**文件:`data/synthea_config/modules/lung_cancer_biomarkers.json`**
|
| 148 |
+
|
| 149 |
+
基于 DeepWiki 研究的 Synthea 模块状态类型,可用的状态类型包括:
|
| 150 |
+
- `Initial` — 模块入口
|
| 151 |
+
- `Terminal` — 模块出口
|
| 152 |
+
- `Observation` — 记录临床观察值(用于 biomarkers)
|
| 153 |
+
- `SetAttribute` — 设置患者属性
|
| 154 |
+
- `Guard` — 条件门控
|
| 155 |
+
- `Simple` — 简单转换状态
|
| 156 |
+
- `Encounter` — 就诊状态
|
| 157 |
+
|
| 158 |
+
Biomarker 观察状态示例结构:
|
| 159 |
+
```json
|
| 160 |
+
{
|
| 161 |
+
"name": "NSCLC Biomarker Panel",
|
| 162 |
+
"states": {
|
| 163 |
+
"Initial": {
|
| 164 |
+
"type": "Initial",
|
| 165 |
+
"conditional_transition": [
|
| 166 |
+
{
|
| 167 |
+
"condition": {
|
| 168 |
+
"condition_type": "Attribute",
|
| 169 |
+
"attribute": "Lung Cancer Type",
|
| 170 |
+
"operator": "==",
|
| 171 |
+
"value": "NSCLC"
|
| 172 |
+
},
|
| 173 |
+
"transition": "EGFR_Test_Encounter"
|
| 174 |
+
},
|
| 175 |
+
{
|
| 176 |
+
"transition": "Terminal"
|
| 177 |
+
}
|
| 178 |
+
]
|
| 179 |
+
},
|
| 180 |
+
"EGFR_Test_Encounter": {
|
| 181 |
+
"type": "Encounter",
|
| 182 |
+
"encounter_class": "ambulatory",
|
| 183 |
+
"codes": [
|
| 184 |
+
{
|
| 185 |
+
"system": "SNOMED-CT",
|
| 186 |
+
"code": "185349003",
|
| 187 |
+
"display": "Encounter for check up"
|
| 188 |
+
}
|
| 189 |
+
],
|
| 190 |
+
"direct_transition": "EGFR_Mutation_Status"
|
| 191 |
+
},
|
| 192 |
+
"EGFR_Mutation_Status": {
|
| 193 |
+
"type": "Observation",
|
| 194 |
+
"category": "laboratory",
|
| 195 |
+
"codes": [
|
| 196 |
+
{
|
| 197 |
+
"system": "LOINC",
|
| 198 |
+
"code": "41103-3",
|
| 199 |
+
"display": "EGFR gene mutations found"
|
| 200 |
+
}
|
| 201 |
+
],
|
| 202 |
+
"distributed_transition": [
|
| 203 |
+
{
|
| 204 |
+
"distribution": 0.15,
|
| 205 |
+
"transition": "EGFR_Positive"
|
| 206 |
+
},
|
| 207 |
+
{
|
| 208 |
+
"distribution": 0.85,
|
| 209 |
+
"transition": "EGFR_Negative"
|
| 210 |
+
}
|
| 211 |
+
]
|
| 212 |
+
},
|
| 213 |
+
"EGFR_Positive": {
|
| 214 |
+
"type": "SetAttribute",
|
| 215 |
+
"attribute": "egfr_status",
|
| 216 |
+
"value": "positive",
|
| 217 |
+
"direct_transition": "ALK_Rearrangement_Status"
|
| 218 |
+
},
|
| 219 |
+
"EGFR_Negative": {
|
| 220 |
+
"type": "SetAttribute",
|
| 221 |
+
"attribute": "egfr_status",
|
| 222 |
+
"value": "negative",
|
| 223 |
+
"direct_transition": "ALK_Rearrangement_Status"
|
| 224 |
+
},
|
| 225 |
+
"ALK_Rearrangement_Status": {
|
| 226 |
+
"type": "Observation",
|
| 227 |
+
"category": "laboratory",
|
| 228 |
+
"codes": [
|
| 229 |
+
{
|
| 230 |
+
"system": "LOINC",
|
| 231 |
+
"code": "46264-8",
|
| 232 |
+
"display": "ALK gene rearrangement"
|
| 233 |
+
}
|
| 234 |
+
],
|
| 235 |
+
"distributed_transition": [
|
| 236 |
+
{
|
| 237 |
+
"distribution": 0.05,
|
| 238 |
+
"transition": "ALK_Positive"
|
| 239 |
+
},
|
| 240 |
+
{
|
| 241 |
+
"distribution": 0.95,
|
| 242 |
+
"transition": "ALK_Negative"
|
| 243 |
+
}
|
| 244 |
+
]
|
| 245 |
+
},
|
| 246 |
+
"ALK_Positive": {
|
| 247 |
+
"type": "SetAttribute",
|
| 248 |
+
"attribute": "alk_status",
|
| 249 |
+
"value": "positive",
|
| 250 |
+
"direct_transition": "PDL1_Expression"
|
| 251 |
+
},
|
| 252 |
+
"ALK_Negative": {
|
| 253 |
+
"type": "SetAttribute",
|
| 254 |
+
"attribute": "alk_status",
|
| 255 |
+
"value": "negative",
|
| 256 |
+
"direct_transition": "PDL1_Expression"
|
| 257 |
+
},
|
| 258 |
+
"PDL1_Expression": {
|
| 259 |
+
"type": "Observation",
|
| 260 |
+
"category": "laboratory",
|
| 261 |
+
"codes": [
|
| 262 |
+
{
|
| 263 |
+
"system": "LOINC",
|
| 264 |
+
"code": "85147-0",
|
| 265 |
+
"display": "PD-L1 by immune stain"
|
| 266 |
+
}
|
| 267 |
+
],
|
| 268 |
+
"distributed_transition": [
|
| 269 |
+
{
|
| 270 |
+
"distribution": 0.30,
|
| 271 |
+
"transition": "PDL1_High"
|
| 272 |
+
},
|
| 273 |
+
{
|
| 274 |
+
"distribution": 0.35,
|
| 275 |
+
"transition": "PDL1_Low"
|
| 276 |
+
},
|
| 277 |
+
{
|
| 278 |
+
"distribution": 0.35,
|
| 279 |
+
"transition": "PDL1_Negative"
|
| 280 |
+
}
|
| 281 |
+
]
|
| 282 |
+
},
|
| 283 |
+
"PDL1_High": {
|
| 284 |
+
"type": "SetAttribute",
|
| 285 |
+
"attribute": "pdl1_tps",
|
| 286 |
+
"value": ">=50%",
|
| 287 |
+
"direct_transition": "KRAS_Mutation_Status"
|
| 288 |
+
},
|
| 289 |
+
"PDL1_Low": {
|
| 290 |
+
"type": "SetAttribute",
|
| 291 |
+
"attribute": "pdl1_tps",
|
| 292 |
+
"value": "1-49%",
|
| 293 |
+
"direct_transition": "KRAS_Mutation_Status"
|
| 294 |
+
},
|
| 295 |
+
"PDL1_Negative": {
|
| 296 |
+
"type": "SetAttribute",
|
| 297 |
+
"attribute": "pdl1_tps",
|
| 298 |
+
"value": "<1%",
|
| 299 |
+
"direct_transition": "KRAS_Mutation_Status"
|
| 300 |
+
},
|
| 301 |
+
"KRAS_Mutation_Status": {
|
| 302 |
+
"type": "Observation",
|
| 303 |
+
"category": "laboratory",
|
| 304 |
+
"codes": [
|
| 305 |
+
{
|
| 306 |
+
"system": "LOINC",
|
| 307 |
+
"code": "21717-3",
|
| 308 |
+
"display": "KRAS gene mutations found"
|
| 309 |
+
}
|
| 310 |
+
],
|
| 311 |
+
"distributed_transition": [
|
| 312 |
+
{
|
| 313 |
+
"distribution": 0.25,
|
| 314 |
+
"transition": "KRAS_Positive"
|
| 315 |
+
},
|
| 316 |
+
{
|
| 317 |
+
"distribution": 0.75,
|
| 318 |
+
"transition": "KRAS_Negative"
|
| 319 |
+
}
|
| 320 |
+
]
|
| 321 |
+
},
|
| 322 |
+
"KRAS_Positive": {
|
| 323 |
+
"type": "SetAttribute",
|
| 324 |
+
"attribute": "kras_status",
|
| 325 |
+
"value": "positive",
|
| 326 |
+
"direct_transition": "Terminal"
|
| 327 |
+
},
|
| 328 |
+
"KRAS_Negative": {
|
| 329 |
+
"type": "SetAttribute",
|
| 330 |
+
"attribute": "kras_status",
|
| 331 |
+
"value": "negative",
|
| 332 |
+
"direct_transition": "Terminal"
|
| 333 |
+
},
|
| 334 |
+
"Terminal": {
|
| 335 |
+
"type": "Terminal"
|
| 336 |
+
}
|
| 337 |
+
}
|
| 338 |
+
}
|
| 339 |
+
```
|
| 340 |
+
|
| 341 |
+
**Biomarker 流行率分布(基于 NSCLC 文献):**
|
| 342 |
+
|
| 343 |
+
| Biomarker | 阳性率 | LOINC Code | 说明 |
|
| 344 |
+
|-----------|--------|------------|------|
|
| 345 |
+
| EGFR mutation | ~15% | 41103-3 | 非吸烟亚裔女性更高 |
|
| 346 |
+
| ALK rearrangement | ~5% | 46264-8 | 年轻非吸烟者更常见 |
|
| 347 |
+
| PD-L1 TPS>=50% | ~30% | 85147-0 | 免疫治疗适用标准 |
|
| 348 |
+
| KRAS G12C | ~13% | 21717-3 | Sotorasib 靶向 |
|
| 349 |
+
| ROS1 fusion | ~1-2% | 46265-5 | Crizotinib 靶向 |
|
| 350 |
+
|
| 351 |
+
### 2.4 批量生成命令
|
| 352 |
+
|
| 353 |
+
```bash
|
| 354 |
+
# 生成 500 个 NSCLC 患者,使用种子确保可重现
|
| 355 |
+
java -jar synthea-with-dependencies.jar \
|
| 356 |
+
-p 500 \
|
| 357 |
+
-s 42 \
|
| 358 |
+
-m lung_cancer \
|
| 359 |
+
--exporter.fhir.export=true \
|
| 360 |
+
--exporter.fhir_stu3.export=false \
|
| 361 |
+
--exporter.fhir_dstu2.export=false \
|
| 362 |
+
--exporter.ccda.export=false \
|
| 363 |
+
--exporter.csv.export=false \
|
| 364 |
+
--exporter.hospital.fhir.export=false \
|
| 365 |
+
--exporter.practitioner.fhir.export=false \
|
| 366 |
+
--exporter.pretty_print=true \
|
| 367 |
+
Massachusetts
|
| 368 |
+
|
| 369 |
+
# 参数说明:
|
| 370 |
+
# -p 500 : 生成 500 个患者
|
| 371 |
+
# -s 42 : 随机种子 (可重现)
|
| 372 |
+
# -m lung_cancer : 仅运行 lung_cancer 模块
|
| 373 |
+
# --exporter.fhir.export=true : 启用 FHIR R4 导出
|
| 374 |
+
# Massachusetts : 生成地区
|
| 375 |
+
```
|
| 376 |
+
|
| 377 |
+
**输出位置:** `./output/fhir/` 目录下,每个患者一个 JSON 文件。
|
| 378 |
+
|
| 379 |
+
### 2.5 FHIR Bundle 输出格式
|
| 380 |
+
|
| 381 |
+
来源:DeepWiki `synthetichealth/synthea` 关于 FHIR 导出系统的分析。
|
| 382 |
+
|
| 383 |
+
**顶层结构:**
|
| 384 |
+
```json
|
| 385 |
+
{
|
| 386 |
+
"resourceType": "Bundle",
|
| 387 |
+
"type": "transaction",
|
| 388 |
+
"entry": [
|
| 389 |
+
{
|
| 390 |
+
"fullUrl": "urn:uuid:patient-uuid-here",
|
| 391 |
+
"resource": { "resourceType": "Patient", ... },
|
| 392 |
+
"request": { "method": "POST", "url": "Patient" }
|
| 393 |
+
},
|
| 394 |
+
{
|
| 395 |
+
"fullUrl": "urn:uuid:condition-uuid-here",
|
| 396 |
+
"resource": { "resourceType": "Condition", ... },
|
| 397 |
+
"request": { "method": "POST", "url": "Condition" }
|
| 398 |
+
}
|
| 399 |
+
]
|
| 400 |
+
}
|
| 401 |
+
```
|
| 402 |
+
|
| 403 |
+
**Synthea 生成的 FHIR Resource 类型(DeepWiki 确认):**
|
| 404 |
+
- `Patient` — 患者基本信息
|
| 405 |
+
- `Condition` — 诊断(如 NSCLC)
|
| 406 |
+
- `Observation` — 实验室检查和生命体征
|
| 407 |
+
- `MedicationRequest` — 用药处方
|
| 408 |
+
- `Procedure` — 手术和操作
|
| 409 |
+
- `DiagnosticReport` — 诊断报告
|
| 410 |
+
- `DocumentReference` — 临床文档(需 US Core IG 启用)
|
| 411 |
+
- `Encounter` — 就诊记录
|
| 412 |
+
- `AllergyIntolerance` — 过敏史
|
| 413 |
+
- `Immunization` — 免疫接种
|
| 414 |
+
- `CarePlan` — 护理计划
|
| 415 |
+
- `ImagingStudy` — 影像检查
|
| 416 |
+
|
| 417 |
+
### 2.6 FHIR Resource 到 PatientProfile 的映射
|
| 418 |
+
|
| 419 |
+
```python
|
| 420 |
+
# data/generate_synthetic_patients.py 中的映射逻辑
|
| 421 |
+
|
| 422 |
+
FHIR_TO_PATIENT_PROFILE_MAP = {
|
| 423 |
+
# Patient Resource → demographics
|
| 424 |
+
"Patient.name": "demographics.name",
|
| 425 |
+
"Patient.gender": "demographics.sex",
|
| 426 |
+
"Patient.birthDate": "demographics.date_of_birth",
|
| 427 |
+
"Patient.address.state": "demographics.state",
|
| 428 |
+
|
| 429 |
+
# Condition Resource → diagnosis
|
| 430 |
+
"Condition[code=SNOMED:254637007]": "diagnosis.primary", # NSCLC
|
| 431 |
+
"Condition.stage.summary": "diagnosis.stage",
|
| 432 |
+
"Condition.bodySite": "diagnosis.histology",
|
| 433 |
+
|
| 434 |
+
# Observation Resources → biomarkers
|
| 435 |
+
"Observation[code=LOINC:41103-3]": "biomarkers.egfr",
|
| 436 |
+
"Observation[code=LOINC:46264-8]": "biomarkers.alk",
|
| 437 |
+
"Observation[code=LOINC:85147-0]": "biomarkers.pdl1_tps",
|
| 438 |
+
"Observation[code=LOINC:21717-3]": "biomarkers.kras",
|
| 439 |
+
|
| 440 |
+
# Observation Resources → labs
|
| 441 |
+
"Observation[category=laboratory]": "labs[]",
|
| 442 |
+
|
| 443 |
+
# MedicationRequest → prior_treatments
|
| 444 |
+
"MedicationRequest.medicationCodeableConcept": "treatments[].medication",
|
| 445 |
+
|
| 446 |
+
# Procedure → prior_treatments
|
| 447 |
+
"Procedure.code": "treatments[].procedure",
|
| 448 |
+
}
|
| 449 |
+
```
|
| 450 |
+
|
| 451 |
+
**转换函数模式:**
|
| 452 |
+
```python
|
| 453 |
+
import json
|
| 454 |
+
from pathlib import Path
|
| 455 |
+
from dataclasses import dataclass, field, asdict
|
| 456 |
+
from typing import Optional
|
| 457 |
+
|
| 458 |
+
@dataclass
|
| 459 |
+
class Demographics:
|
| 460 |
+
name: str = ""
|
| 461 |
+
sex: str = ""
|
| 462 |
+
date_of_birth: str = ""
|
| 463 |
+
age: int = 0
|
| 464 |
+
state: str = ""
|
| 465 |
+
|
| 466 |
+
@dataclass
|
| 467 |
+
class Diagnosis:
|
| 468 |
+
primary: str = ""
|
| 469 |
+
stage: str = ""
|
| 470 |
+
histology: str = ""
|
| 471 |
+
diagnosis_date: str = ""
|
| 472 |
+
|
| 473 |
+
@dataclass
|
| 474 |
+
class Biomarkers:
|
| 475 |
+
egfr: Optional[str] = None
|
| 476 |
+
alk: Optional[str] = None
|
| 477 |
+
pdl1_tps: Optional[str] = None
|
| 478 |
+
kras: Optional[str] = None
|
| 479 |
+
ros1: Optional[str] = None
|
| 480 |
+
|
| 481 |
+
@dataclass
|
| 482 |
+
class LabResult:
|
| 483 |
+
name: str = ""
|
| 484 |
+
value: float = 0.0
|
| 485 |
+
unit: str = ""
|
| 486 |
+
date: str = ""
|
| 487 |
+
loinc_code: str = ""
|
| 488 |
+
|
| 489 |
+
@dataclass
|
| 490 |
+
class Treatment:
|
| 491 |
+
name: str = ""
|
| 492 |
+
type: str = "" # "medication" | "procedure" | "radiation"
|
| 493 |
+
start_date: str = ""
|
| 494 |
+
end_date: Optional[str] = None
|
| 495 |
+
|
| 496 |
+
@dataclass
|
| 497 |
+
class PatientProfile:
|
| 498 |
+
patient_id: str = ""
|
| 499 |
+
demographics: Demographics = field(default_factory=Demographics)
|
| 500 |
+
diagnosis: Diagnosis = field(default_factory=Diagnosis)
|
| 501 |
+
biomarkers: Biomarkers = field(default_factory=Biomarkers)
|
| 502 |
+
labs: list[LabResult] = field(default_factory=list)
|
| 503 |
+
treatments: list[Treatment] = field(default_factory=list)
|
| 504 |
+
unknowns: list[str] = field(default_factory=list)
|
| 505 |
+
evidence_spans: list[dict] = field(default_factory=list)
|
| 506 |
+
|
| 507 |
+
|
| 508 |
+
def parse_fhir_bundle(fhir_path: Path) -> PatientProfile:
|
| 509 |
+
"""Parse a Synthea FHIR Bundle JSON into PatientProfile."""
|
| 510 |
+
with open(fhir_path) as f:
|
| 511 |
+
bundle = json.load(f)
|
| 512 |
+
|
| 513 |
+
profile = PatientProfile()
|
| 514 |
+
entries = bundle.get("entry", [])
|
| 515 |
+
|
| 516 |
+
for entry in entries:
|
| 517 |
+
resource = entry.get("resource", {})
|
| 518 |
+
resource_type = resource.get("resourceType")
|
| 519 |
+
|
| 520 |
+
if resource_type == "Patient":
|
| 521 |
+
_parse_patient(resource, profile)
|
| 522 |
+
elif resource_type == "Condition":
|
| 523 |
+
_parse_condition(resource, profile)
|
| 524 |
+
elif resource_type == "Observation":
|
| 525 |
+
_parse_observation(resource, profile)
|
| 526 |
+
elif resource_type == "MedicationRequest":
|
| 527 |
+
_parse_medication(resource, profile)
|
| 528 |
+
elif resource_type == "Procedure":
|
| 529 |
+
_parse_procedure(resource, profile)
|
| 530 |
+
|
| 531 |
+
return profile
|
| 532 |
+
|
| 533 |
+
|
| 534 |
+
def _parse_patient(resource: dict, profile: PatientProfile):
|
| 535 |
+
"""Extract demographics from Patient resource."""
|
| 536 |
+
names = resource.get("name", [{}])
|
| 537 |
+
if names:
|
| 538 |
+
given = " ".join(names[0].get("given", []))
|
| 539 |
+
family = names[0].get("family", "")
|
| 540 |
+
profile.demographics.name = f"{given} {family}".strip()
|
| 541 |
+
|
| 542 |
+
profile.demographics.sex = resource.get("gender", "")
|
| 543 |
+
profile.demographics.date_of_birth = resource.get("birthDate", "")
|
| 544 |
+
profile.patient_id = resource.get("id", "")
|
| 545 |
+
|
| 546 |
+
addresses = resource.get("address", [{}])
|
| 547 |
+
if addresses:
|
| 548 |
+
profile.demographics.state = addresses[0].get("state", "")
|
| 549 |
+
|
| 550 |
+
|
| 551 |
+
def _parse_condition(resource: dict, profile: PatientProfile):
|
| 552 |
+
"""Extract diagnosis from Condition resource."""
|
| 553 |
+
code = resource.get("code", {})
|
| 554 |
+
codings = code.get("coding", [])
|
| 555 |
+
for coding in codings:
|
| 556 |
+
# SNOMED codes for lung cancer
|
| 557 |
+
if coding.get("code") in ["254637007", "254632001"]:
|
| 558 |
+
profile.diagnosis.primary = coding.get("display", "")
|
| 559 |
+
onset = resource.get("onsetDateTime", "")
|
| 560 |
+
profile.diagnosis.diagnosis_date = onset
|
| 561 |
+
# Extract stage if available
|
| 562 |
+
stage_info = resource.get("stage", [])
|
| 563 |
+
if stage_info:
|
| 564 |
+
summary = stage_info[0].get("summary", {})
|
| 565 |
+
stage_codings = summary.get("coding", [])
|
| 566 |
+
if stage_codings:
|
| 567 |
+
profile.diagnosis.stage = stage_codings[0].get("display", "")
|
| 568 |
+
|
| 569 |
+
|
| 570 |
+
def _parse_observation(resource: dict, profile: PatientProfile):
|
| 571 |
+
"""Extract labs and biomarkers from Observation resource."""
|
| 572 |
+
code = resource.get("code", {})
|
| 573 |
+
codings = code.get("coding", [])
|
| 574 |
+
category_list = resource.get("category", [])
|
| 575 |
+
is_lab = any(
|
| 576 |
+
cat_coding.get("code") == "laboratory"
|
| 577 |
+
for cat in category_list
|
| 578 |
+
for cat_coding in cat.get("coding", [])
|
| 579 |
+
)
|
| 580 |
+
|
| 581 |
+
for coding in codings:
|
| 582 |
+
loinc = coding.get("code", "")
|
| 583 |
+
display = coding.get("display", "")
|
| 584 |
+
|
| 585 |
+
# Biomarker mappings
|
| 586 |
+
biomarker_map = {
|
| 587 |
+
"41103-3": "egfr",
|
| 588 |
+
"46264-8": "alk",
|
| 589 |
+
"85147-0": "pdl1_tps",
|
| 590 |
+
"21717-3": "kras",
|
| 591 |
+
"46265-5": "ros1",
|
| 592 |
+
}
|
| 593 |
+
|
| 594 |
+
if loinc in biomarker_map:
|
| 595 |
+
value_cc = resource.get("valueCodeableConcept", {})
|
| 596 |
+
value_codings = value_cc.get("coding", [])
|
| 597 |
+
value_str = value_codings[0].get("display", "") if value_codings else ""
|
| 598 |
+
setattr(profile.biomarkers, biomarker_map[loinc], value_str)
|
| 599 |
+
elif is_lab:
|
| 600 |
+
value_qty = resource.get("valueQuantity", {})
|
| 601 |
+
lab = LabResult(
|
| 602 |
+
name=display,
|
| 603 |
+
value=value_qty.get("value", 0.0),
|
| 604 |
+
unit=value_qty.get("unit", ""),
|
| 605 |
+
date=resource.get("effectiveDateTime", ""),
|
| 606 |
+
loinc_code=loinc,
|
| 607 |
+
)
|
| 608 |
+
profile.labs.append(lab)
|
| 609 |
+
```
|
| 610 |
+
|
| 611 |
+
---
|
| 612 |
+
|
| 613 |
+
## 3. 合成 PDF 生成管线
|
| 614 |
+
|
| 615 |
+
### 3.1 概述
|
| 616 |
+
|
| 617 |
+
目标:将 `PatientProfile` 转换为逼真的临床文档 PDF,并注入受控噪声以模拟真实世界 OCR 场景。
|
| 618 |
+
|
| 619 |
+
**技术栈:**
|
| 620 |
+
- **ReportLab** (`pip install reportlab`) — PDF 生成引擎,支持 `SimpleDocTemplate`、`Table`、`Paragraph` 等 Platypus 流式组件
|
| 621 |
+
- **Augraphy** (`pip install augraphy`) — 文档图像退化管线,模拟打印、传真、扫描噪声
|
| 622 |
+
- **Pillow** (`pip install Pillow`) — 图像处理
|
| 623 |
+
- **pdf2image** (`pip install pdf2image`) — PDF 转图像(用于噪声注入后转回 PDF)
|
| 624 |
+
|
| 625 |
+
### 3.2 临床信件模板
|
| 626 |
+
|
| 627 |
+
```python
|
| 628 |
+
# data/templates/clinical_letter.py
|
| 629 |
+
from reportlab.lib.pagesizes import letter
|
| 630 |
+
from reportlab.lib.units import inch
|
| 631 |
+
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
|
| 632 |
+
from reportlab.platypus import (
|
| 633 |
+
SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle
|
| 634 |
+
)
|
| 635 |
+
from reportlab.lib import colors
|
| 636 |
+
|
| 637 |
+
|
| 638 |
+
def generate_clinical_letter(profile: dict, output_path: str):
|
| 639 |
+
"""Generate a clinical letter PDF from PatientProfile."""
|
| 640 |
+
doc = SimpleDocTemplate(output_path, pagesize=letter,
|
| 641 |
+
topMargin=1*inch, bottomMargin=1*inch)
|
| 642 |
+
styles = getSampleStyleSheet()
|
| 643 |
+
story = []
|
| 644 |
+
|
| 645 |
+
# Header
|
| 646 |
+
header_style = ParagraphStyle(
|
| 647 |
+
'Header', parent=styles['Heading1'], fontSize=14,
|
| 648 |
+
spaceAfter=6
|
| 649 |
+
)
|
| 650 |
+
story.append(Paragraph("Clinical Summary Letter", header_style))
|
| 651 |
+
story.append(Spacer(1, 12))
|
| 652 |
+
|
| 653 |
+
# Patient Info
|
| 654 |
+
info_data = [
|
| 655 |
+
["Patient Name:", profile["demographics"]["name"]],
|
| 656 |
+
["Date of Birth:", profile["demographics"]["date_of_birth"]],
|
| 657 |
+
["Sex:", profile["demographics"]["sex"]],
|
| 658 |
+
["MRN:", profile["patient_id"]],
|
| 659 |
+
]
|
| 660 |
+
info_table = Table(info_data, colWidths=[2*inch, 4*inch])
|
| 661 |
+
info_table.setStyle(TableStyle([
|
| 662 |
+
('FONTNAME', (0, 0), (0, -1), 'Helvetica-Bold'),
|
| 663 |
+
('FONTNAME', (1, 0), (1, -1), 'Helvetica'),
|
| 664 |
+
('FONTSIZE', (0, 0), (-1, -1), 10),
|
| 665 |
+
('VALIGN', (0, 0), (-1, -1), 'TOP'),
|
| 666 |
+
]))
|
| 667 |
+
story.append(info_table)
|
| 668 |
+
story.append(Spacer(1, 18))
|
| 669 |
+
|
| 670 |
+
# Diagnosis Section
|
| 671 |
+
story.append(Paragraph("Diagnosis", styles['Heading2']))
|
| 672 |
+
dx = profile.get("diagnosis", {})
|
| 673 |
+
dx_text = (
|
| 674 |
+
f"Primary: {dx.get('primary', 'Unknown')}. "
|
| 675 |
+
f"Stage: {dx.get('stage', 'Unknown')}. "
|
| 676 |
+
f"Histology: {dx.get('histology', 'Unknown')}. "
|
| 677 |
+
f"Diagnosed: {dx.get('diagnosis_date', 'Unknown')}."
|
| 678 |
+
)
|
| 679 |
+
story.append(Paragraph(dx_text, styles['Normal']))
|
| 680 |
+
story.append(Spacer(1, 12))
|
| 681 |
+
|
| 682 |
+
# Biomarkers Section
|
| 683 |
+
story.append(Paragraph("Molecular Testing", styles['Heading2']))
|
| 684 |
+
bm = profile.get("biomarkers", {})
|
| 685 |
+
bm_data = [["Biomarker", "Result"]]
|
| 686 |
+
for marker, value in bm.items():
|
| 687 |
+
if value is not None:
|
| 688 |
+
bm_data.append([marker.upper(), str(value)])
|
| 689 |
+
if len(bm_data) > 1:
|
| 690 |
+
bm_table = Table(bm_data, colWidths=[2.5*inch, 3.5*inch])
|
| 691 |
+
bm_table.setStyle(TableStyle([
|
| 692 |
+
('BACKGROUND', (0, 0), (-1, 0), colors.lightgrey),
|
| 693 |
+
('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
|
| 694 |
+
('GRID', (0, 0), (-1, -1), 0.5, colors.grey),
|
| 695 |
+
('FONTSIZE', (0, 0), (-1, -1), 10),
|
| 696 |
+
]))
|
| 697 |
+
story.append(bm_table)
|
| 698 |
+
story.append(Spacer(1, 12))
|
| 699 |
+
|
| 700 |
+
# Treatment History
|
| 701 |
+
story.append(Paragraph("Treatment History", styles['Heading2']))
|
| 702 |
+
treatments = profile.get("treatments", [])
|
| 703 |
+
for tx in treatments:
|
| 704 |
+
tx_text = f"- {tx['name']} ({tx['type']}): {tx.get('start_date', '')}"
|
| 705 |
+
story.append(Paragraph(tx_text, styles['Normal']))
|
| 706 |
+
|
| 707 |
+
doc.build(story)
|
| 708 |
+
```
|
| 709 |
+
|
| 710 |
+
### 3.3 病理报告模板
|
| 711 |
+
|
| 712 |
+
```python
|
| 713 |
+
# data/templates/pathology_report.py
|
| 714 |
+
def generate_pathology_report(profile: dict, output_path: str):
|
| 715 |
+
"""Generate a pathology report PDF."""
|
| 716 |
+
doc = SimpleDocTemplate(output_path, pagesize=letter)
|
| 717 |
+
styles = getSampleStyleSheet()
|
| 718 |
+
story = []
|
| 719 |
+
|
| 720 |
+
story.append(Paragraph("SURGICAL PATHOLOGY REPORT", styles['Title']))
|
| 721 |
+
story.append(Spacer(1, 12))
|
| 722 |
+
|
| 723 |
+
# Specimen Info
|
| 724 |
+
spec_data = [
|
| 725 |
+
["Specimen:", "Right lung, upper lobe, wedge resection"],
|
| 726 |
+
["Procedure:", "CT-guided needle biopsy"],
|
| 727 |
+
["Date:", profile["diagnosis"]["diagnosis_date"]],
|
| 728 |
+
]
|
| 729 |
+
spec_table = Table(spec_data, colWidths=[2*inch, 4*inch])
|
| 730 |
+
story.append(spec_table)
|
| 731 |
+
story.append(Spacer(1, 12))
|
| 732 |
+
|
| 733 |
+
# Final Diagnosis
|
| 734 |
+
story.append(Paragraph("FINAL DIAGNOSIS", styles['Heading2']))
|
| 735 |
+
story.append(Paragraph(
|
| 736 |
+
f"Non-small cell lung carcinoma, {profile['diagnosis'].get('histology', 'adenocarcinoma')}, "
|
| 737 |
+
f"{profile['diagnosis'].get('stage', 'Stage IIIA')}",
|
| 738 |
+
styles['Normal']
|
| 739 |
+
))
|
| 740 |
+
|
| 741 |
+
# Biomarker Results
|
| 742 |
+
story.append(Spacer(1, 12))
|
| 743 |
+
story.append(Paragraph("MOLECULAR/IMMUNOHISTOCHEMISTRY", styles['Heading2']))
|
| 744 |
+
bm = profile.get("biomarkers", {})
|
| 745 |
+
results = []
|
| 746 |
+
if bm.get("egfr"):
|
| 747 |
+
results.append(f"EGFR mutation analysis: {bm['egfr']}")
|
| 748 |
+
if bm.get("alk"):
|
| 749 |
+
results.append(f"ALK rearrangement (FISH): {bm['alk']}")
|
| 750 |
+
if bm.get("pdl1_tps"):
|
| 751 |
+
results.append(f"PD-L1 (22C3, TPS): {bm['pdl1_tps']}")
|
| 752 |
+
if bm.get("kras"):
|
| 753 |
+
results.append(f"KRAS mutation analysis: {bm['kras']}")
|
| 754 |
+
for r in results:
|
| 755 |
+
story.append(Paragraph(r, styles['Normal']))
|
| 756 |
+
|
| 757 |
+
doc.build(story)
|
| 758 |
+
```
|
| 759 |
+
|
| 760 |
+
### 3.4 实验室报告模板
|
| 761 |
+
|
| 762 |
+
```python
|
| 763 |
+
# data/templates/lab_report.py
|
| 764 |
+
def generate_lab_report(profile: dict, output_path: str):
|
| 765 |
+
"""Generate a laboratory report PDF with CBC, CMP, etc."""
|
| 766 |
+
doc = SimpleDocTemplate(output_path, pagesize=letter)
|
| 767 |
+
styles = getSampleStyleSheet()
|
| 768 |
+
story = []
|
| 769 |
+
|
| 770 |
+
story.append(Paragraph("LABORATORY REPORT", styles['Title']))
|
| 771 |
+
story.append(Spacer(1, 12))
|
| 772 |
+
|
| 773 |
+
# Lab Results Table
|
| 774 |
+
lab_data = [["Test", "Result", "Unit", "Reference Range", "Date"]]
|
| 775 |
+
for lab in profile.get("labs", []):
|
| 776 |
+
lab_data.append([
|
| 777 |
+
lab["name"], str(lab["value"]), lab["unit"],
|
| 778 |
+
"", # Reference range (can be added)
|
| 779 |
+
lab["date"][:10] if lab["date"] else ""
|
| 780 |
+
])
|
| 781 |
+
|
| 782 |
+
if len(lab_data) > 1:
|
| 783 |
+
lab_table = Table(lab_data, colWidths=[2*inch, 1*inch, 0.8*inch, 1.2*inch, 1*inch])
|
| 784 |
+
lab_table.setStyle(TableStyle([
|
| 785 |
+
('BACKGROUND', (0, 0), (-1, 0), colors.HexColor('#003366')),
|
| 786 |
+
('TEXTCOLOR', (0, 0), (-1, 0), colors.white),
|
| 787 |
+
('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'),
|
| 788 |
+
('GRID', (0, 0), (-1, -1), 0.5, colors.grey),
|
| 789 |
+
('FONTSIZE', (0, 0), (-1, -1), 9),
|
| 790 |
+
('ROWBACKGROUNDS', (0, 1), (-1, -1), [colors.white, colors.HexColor('#f0f0f0')]),
|
| 791 |
+
]))
|
| 792 |
+
story.append(lab_table)
|
| 793 |
+
|
| 794 |
+
doc.build(story)
|
| 795 |
+
```
|
| 796 |
+
|
| 797 |
+
### 3.5 噪声注入策略
|
| 798 |
+
|
| 799 |
+
```python
|
| 800 |
+
# data/noise/noise_injector.py
|
| 801 |
+
import random
|
| 802 |
+
import re
|
| 803 |
+
from pathlib import Path
|
| 804 |
+
from PIL import Image
|
| 805 |
+
|
| 806 |
+
# Augraphy 管线配置
|
| 807 |
+
try:
|
| 808 |
+
from augraphy import (
|
| 809 |
+
AugraphyPipeline, InkBleed, Letterpress, LowInkPeriodicLines,
|
| 810 |
+
DirtyDrum, SubtleNoise, Jpeg, Brightness, BleedThrough
|
| 811 |
+
)
|
| 812 |
+
AUGRAPHY_AVAILABLE = True
|
| 813 |
+
except ImportError:
|
| 814 |
+
AUGRAPHY_AVAILABLE = False
|
| 815 |
+
|
| 816 |
+
|
| 817 |
+
class NoiseInjector:
|
| 818 |
+
"""受控噪声注入引擎,模拟真实世界文档退化。"""
|
| 819 |
+
|
| 820 |
+
# OCR 常见错误映射
|
| 821 |
+
OCR_ERROR_MAP = {
|
| 822 |
+
"0": ["O", "o", "Q"],
|
| 823 |
+
"1": ["l", "I", "|"],
|
| 824 |
+
"5": ["S", "s"],
|
| 825 |
+
"8": ["B"],
|
| 826 |
+
"O": ["0", "Q"],
|
| 827 |
+
"l": ["1", "I", "|"],
|
| 828 |
+
"rn": ["m"],
|
| 829 |
+
"cl": ["d"],
|
| 830 |
+
"vv": ["w"],
|
| 831 |
+
}
|
| 832 |
+
|
| 833 |
+
# 医学缩写替换
|
| 834 |
+
ABBREVIATION_MAP = {
|
| 835 |
+
"non-small cell lung cancer": ["NSCLC", "non-small cell ca", "NSCC"],
|
| 836 |
+
"adenocarcinoma": ["adeno", "adenoca", "adeno ca"],
|
| 837 |
+
"squamous cell carcinoma": ["SCC", "squamous ca", "sq cell ca"],
|
| 838 |
+
"Eastern Cooperative Oncology Group": ["ECOG"],
|
| 839 |
+
"performance status": ["PS", "perf status"],
|
| 840 |
+
"milligrams per deciliter": ["mg/dL", "mg/dl"],
|
| 841 |
+
"computed tomography": ["CT", "cat scan"],
|
| 842 |
+
}
|
| 843 |
+
|
| 844 |
+
# 噪声级别配置
|
| 845 |
+
NOISE_LEVELS = {
|
| 846 |
+
"clean": {"ocr_rate": 0.0, "abbrev_rate": 0.0, "missing_rate": 0.0},
|
| 847 |
+
"mild": {"ocr_rate": 0.02, "abbrev_rate": 0.1, "missing_rate": 0.05},
|
| 848 |
+
"moderate": {"ocr_rate": 0.05, "abbrev_rate": 0.2, "missing_rate": 0.1},
|
| 849 |
+
"severe": {"ocr_rate": 0.10, "abbrev_rate": 0.3, "missing_rate": 0.2},
|
| 850 |
+
}
|
| 851 |
+
|
| 852 |
+
def __init__(self, noise_level: str = "mild", seed: int = 42):
|
| 853 |
+
self.config = self.NOISE_LEVELS[noise_level]
|
| 854 |
+
self.rng = random.Random(seed)
|
| 855 |
+
|
| 856 |
+
def inject_text_noise(self, text: str) -> tuple[str, list[dict]]:
|
| 857 |
+
"""Inject OCR errors and abbreviations into text.
|
| 858 |
+
|
| 859 |
+
Returns (noisy_text, list_of_injected_noise_records).
|
| 860 |
+
"""
|
| 861 |
+
noise_records = []
|
| 862 |
+
chars = list(text)
|
| 863 |
+
|
| 864 |
+
# OCR character substitutions
|
| 865 |
+
i = 0
|
| 866 |
+
while i < len(chars):
|
| 867 |
+
if self.rng.random() < self.config["ocr_rate"]:
|
| 868 |
+
original = chars[i]
|
| 869 |
+
if original in self.OCR_ERROR_MAP:
|
| 870 |
+
replacement = self.rng.choice(self.OCR_ERROR_MAP[original])
|
| 871 |
+
chars[i] = replacement
|
| 872 |
+
noise_records.append({
|
| 873 |
+
"type": "ocr_error",
|
| 874 |
+
"position": i,
|
| 875 |
+
"original": original,
|
| 876 |
+
"replacement": replacement,
|
| 877 |
+
})
|
| 878 |
+
i += 1
|
| 879 |
+
|
| 880 |
+
noisy_text = "".join(chars)
|
| 881 |
+
|
| 882 |
+
# Abbreviation substitutions
|
| 883 |
+
for full_form, abbreviations in self.ABBREVIATION_MAP.items():
|
| 884 |
+
if full_form in noisy_text.lower() and self.rng.random() < self.config["abbrev_rate"]:
|
| 885 |
+
abbrev = self.rng.choice(abbreviations)
|
| 886 |
+
noisy_text = re.sub(
|
| 887 |
+
re.escape(full_form), abbrev, noisy_text, count=1, flags=re.IGNORECASE
|
| 888 |
+
)
|
| 889 |
+
noise_records.append({
|
| 890 |
+
"type": "abbreviation",
|
| 891 |
+
"original": full_form,
|
| 892 |
+
"replacement": abbrev,
|
| 893 |
+
})
|
| 894 |
+
|
| 895 |
+
return noisy_text, noise_records
|
| 896 |
+
|
| 897 |
+
def inject_missing_values(self, profile: dict) -> tuple[dict, list[str]]:
|
| 898 |
+
"""Randomly remove fields from profile to simulate missing data.
|
| 899 |
+
|
| 900 |
+
Returns (modified_profile, list_of_removed_fields).
|
| 901 |
+
"""
|
| 902 |
+
removed = []
|
| 903 |
+
removable_fields = [
|
| 904 |
+
("biomarkers", "egfr"),
|
| 905 |
+
("biomarkers", "alk"),
|
| 906 |
+
("biomarkers", "pdl1_tps"),
|
| 907 |
+
("biomarkers", "kras"),
|
| 908 |
+
("biomarkers", "ros1"),
|
| 909 |
+
("diagnosis", "stage"),
|
| 910 |
+
("diagnosis", "histology"),
|
| 911 |
+
]
|
| 912 |
+
|
| 913 |
+
for section, field_name in removable_fields:
|
| 914 |
+
if self.rng.random() < self.config["missing_rate"]:
|
| 915 |
+
if section in profile and field_name in profile[section]:
|
| 916 |
+
profile[section][field_name] = None
|
| 917 |
+
removed.append(f"{section}.{field_name}")
|
| 918 |
+
|
| 919 |
+
return profile, removed
|
| 920 |
+
|
| 921 |
+
def degrade_image(self, image: Image.Image) -> Image.Image:
|
| 922 |
+
"""Apply Augraphy degradation pipeline to document image."""
|
| 923 |
+
if not AUGRAPHY_AVAILABLE:
|
| 924 |
+
return image
|
| 925 |
+
|
| 926 |
+
import numpy as np
|
| 927 |
+
img_array = np.array(image)
|
| 928 |
+
|
| 929 |
+
pipeline = AugraphyPipeline(
|
| 930 |
+
ink_phase=[
|
| 931 |
+
InkBleed(p=0.5),
|
| 932 |
+
Letterpress(p=0.3),
|
| 933 |
+
LowInkPeriodicLines(p=0.3),
|
| 934 |
+
],
|
| 935 |
+
paper_phase=[
|
| 936 |
+
SubtleNoise(p=0.5),
|
| 937 |
+
],
|
| 938 |
+
post_phase=[
|
| 939 |
+
DirtyDrum(p=0.3),
|
| 940 |
+
Brightness(p=0.5),
|
| 941 |
+
Jpeg(p=0.5),
|
| 942 |
+
],
|
| 943 |
+
)
|
| 944 |
+
|
| 945 |
+
degraded = pipeline(img_array)
|
| 946 |
+
return Image.fromarray(degraded)
|
| 947 |
+
```
|
| 948 |
+
|
| 949 |
+
---
|
| 950 |
+
|
| 951 |
+
## 4. TREC 基准评估指南
|
| 952 |
+
|
| 953 |
+
### 4.1 数据集概述
|
| 954 |
+
|
| 955 |
+
**TREC Clinical Trials Track 2021:**
|
| 956 |
+
- 来源:NIST 文本检索会议
|
| 957 |
+
- Topics(查询):75 个合成患者描述(5-10 句入院记录)
|
| 958 |
+
- 文档集:376,000+ 临床试验(ClinicalTrials.gov 2021 年 4 月快照)
|
| 959 |
+
- Qrels:35,832 条相关性判断
|
| 960 |
+
- 相关性标签:0=不相关,1=排除,2=合格
|
| 961 |
+
|
| 962 |
+
**TREC Clinical Trials Track 2022:**
|
| 963 |
+
- Topics:50 个合成患者描述
|
| 964 |
+
- 使用相同的文档集快照
|
| 965 |
+
|
| 966 |
+
### 4.2 数据格式
|
| 967 |
+
|
| 968 |
+
#### Topics XML 格式
|
| 969 |
+
```xml
|
| 970 |
+
<topics task="2021 TREC Clinical Trials">
|
| 971 |
+
<topic number="1">
|
| 972 |
+
A 62-year-old male presents with a 3-month history of
|
| 973 |
+
progressive dyspnea and a 20-pound weight loss. He has
|
| 974 |
+
a 40 pack-year smoking history. CT chest reveals a 4.5cm
|
| 975 |
+
right upper lobe mass with mediastinal lymphadenopathy.
|
| 976 |
+
Biopsy confirms non-small cell lung cancer, adenocarcinoma.
|
| 977 |
+
EGFR mutation testing is positive for exon 19 deletion.
|
| 978 |
+
PD-L1 TPS is 60%. ECOG performance status is 1.
|
| 979 |
+
</topic>
|
| 980 |
+
<topic number="2">
|
| 981 |
+
...
|
| 982 |
+
</topic>
|
| 983 |
+
</topics>
|
| 984 |
+
```
|
| 985 |
+
|
| 986 |
+
#### Qrels 格式(制表符分隔)
|
| 987 |
+
```
|
| 988 |
+
topic_id 0 doc_id relevance
|
| 989 |
+
1 0 NCT00760162 2
|
| 990 |
+
1 0 NCT01234567 1
|
| 991 |
+
1 0 NCT09876543 0
|
| 992 |
+
```
|
| 993 |
+
- 列 1:Topic 编号
|
| 994 |
+
- 列 2:固定值 0(迭代次数)
|
| 995 |
+
- 列 3:NCT 文档 ID
|
| 996 |
+
- 列 4:相关性(0=不相关,1=排除,2=合格)
|
| 997 |
+
|
| 998 |
+
#### Run 提交格式
|
| 999 |
+
```
|
| 1000 |
+
TOPIC_NO Q0 NCT_ID RANK SCORE RUN_NAME
|
| 1001 |
+
1 Q0 NCT00760162 1 0.9999 trialpath-v1
|
| 1002 |
+
1 Q0 NCT01234567 2 0.9998 trialpath-v1
|
| 1003 |
+
```
|
| 1004 |
+
|
| 1005 |
+
### 4.3 使用 ir_datasets 加载数据
|
| 1006 |
+
|
| 1007 |
+
```python
|
| 1008 |
+
# evaluation/run_trec_benchmark.py
|
| 1009 |
+
import ir_datasets
|
| 1010 |
+
|
| 1011 |
+
def load_trec_2021():
|
| 1012 |
+
"""Load TREC CT 2021 topics and qrels via ir_datasets."""
|
| 1013 |
+
dataset = ir_datasets.load("clinicaltrials/2021/trec-ct-2021")
|
| 1014 |
+
|
| 1015 |
+
# 加载 topics (GenericQuery: query_id, text)
|
| 1016 |
+
topics = {}
|
| 1017 |
+
for query in dataset.queries_iter():
|
| 1018 |
+
topics[query.query_id] = query.text
|
| 1019 |
+
|
| 1020 |
+
# 加载 qrels (TrecQrel: query_id, doc_id, relevance, iteration)
|
| 1021 |
+
qrels = {}
|
| 1022 |
+
for qrel in dataset.qrels_iter():
|
| 1023 |
+
if qrel.query_id not in qrels:
|
| 1024 |
+
qrels[qrel.query_id] = {}
|
| 1025 |
+
qrels[qrel.query_id][qrel.doc_id] = qrel.relevance
|
| 1026 |
+
|
| 1027 |
+
return topics, qrels
|
| 1028 |
+
|
| 1029 |
+
|
| 1030 |
+
def load_trec_2022():
|
| 1031 |
+
"""Load TREC CT 2022 topics and qrels."""
|
| 1032 |
+
dataset = ir_datasets.load("clinicaltrials/2021/trec-ct-2022")
|
| 1033 |
+
|
| 1034 |
+
topics = {q.query_id: q.text for q in dataset.queries_iter()}
|
| 1035 |
+
qrels = {}
|
| 1036 |
+
for qrel in dataset.qrels_iter():
|
| 1037 |
+
if qrel.query_id not in qrels:
|
| 1038 |
+
qrels[qrel.query_id] = {}
|
| 1039 |
+
qrels[qrel.query_id][qrel.doc_id] = qrel.relevance
|
| 1040 |
+
|
| 1041 |
+
return topics, qrels
|
| 1042 |
+
|
| 1043 |
+
|
| 1044 |
+
def load_trial_documents():
|
| 1045 |
+
"""Load the clinical trial documents from ir_datasets."""
|
| 1046 |
+
dataset = ir_datasets.load("clinicaltrials/2021")
|
| 1047 |
+
# ClinicalTrialsDoc: doc_id, title, condition, summary,
|
| 1048 |
+
# detailed_description, eligibility
|
| 1049 |
+
docs = {}
|
| 1050 |
+
for doc in dataset.docs_iter():
|
| 1051 |
+
docs[doc.doc_id] = {
|
| 1052 |
+
"title": doc.title,
|
| 1053 |
+
"condition": doc.condition,
|
| 1054 |
+
"summary": doc.summary,
|
| 1055 |
+
"detailed_description": doc.detailed_description,
|
| 1056 |
+
"eligibility": doc.eligibility,
|
| 1057 |
+
}
|
| 1058 |
+
return docs
|
| 1059 |
+
```
|
| 1060 |
+
|
| 1061 |
+
### 4.4 TrialPath 输出到 TREC 格式的映射
|
| 1062 |
+
|
| 1063 |
+
```python
|
| 1064 |
+
def convert_trialpath_to_trec_run(
|
| 1065 |
+
results: dict[str, list[dict]],
|
| 1066 |
+
run_name: str = "trialpath-v1"
|
| 1067 |
+
) -> str:
|
| 1068 |
+
"""Convert TrialPath matching results to TREC run format.
|
| 1069 |
+
|
| 1070 |
+
Args:
|
| 1071 |
+
results: {topic_id: [{"nct_id": str, "score": float}, ...]}
|
| 1072 |
+
run_name: Run identifier
|
| 1073 |
+
|
| 1074 |
+
Returns:
|
| 1075 |
+
TREC-format run string
|
| 1076 |
+
"""
|
| 1077 |
+
lines = []
|
| 1078 |
+
for topic_id, candidates in results.items():
|
| 1079 |
+
sorted_candidates = sorted(candidates, key=lambda x: x["score"], reverse=True)
|
| 1080 |
+
for rank, candidate in enumerate(sorted_candidates[:1000], 1):
|
| 1081 |
+
lines.append(
|
| 1082 |
+
f"{topic_id} Q0 {candidate['nct_id']} {rank} "
|
| 1083 |
+
f"{candidate['score']:.6f} {run_name}"
|
| 1084 |
+
)
|
| 1085 |
+
return "\n".join(lines)
|
| 1086 |
+
|
| 1087 |
+
|
| 1088 |
+
def save_trec_run(run_str: str, output_path: str):
|
| 1089 |
+
"""Save TREC run to file."""
|
| 1090 |
+
with open(output_path, 'w') as f:
|
| 1091 |
+
f.write(run_str)
|
| 1092 |
+
```
|
| 1093 |
+
|
| 1094 |
+
### 4.5 使用 ir-measures 计算评估指标
|
| 1095 |
+
|
| 1096 |
+
```python
|
| 1097 |
+
# evaluation/run_trec_benchmark.py (续)
|
| 1098 |
+
import ir_measures
|
| 1099 |
+
from ir_measures import nDCG, P, Recall, AP, RR, SetP, SetR, SetF
|
| 1100 |
+
|
| 1101 |
+
|
| 1102 |
+
def evaluate_trec_run(
|
| 1103 |
+
qrels_path: str,
|
| 1104 |
+
run_path: str,
|
| 1105 |
+
) -> dict:
|
| 1106 |
+
"""Evaluate a TREC run using ir-measures.
|
| 1107 |
+
|
| 1108 |
+
Target metrics:
|
| 1109 |
+
- Recall@50 >= 0.75
|
| 1110 |
+
- NDCG@10 >= 0.60
|
| 1111 |
+
- P@10 (informational)
|
| 1112 |
+
"""
|
| 1113 |
+
qrels = list(ir_measures.read_trec_qrels(qrels_path))
|
| 1114 |
+
run = list(ir_measures.read_trec_run(run_path))
|
| 1115 |
+
|
| 1116 |
+
# 定义目标指标
|
| 1117 |
+
measures = [
|
| 1118 |
+
nDCG@10, # Target >= 0.60
|
| 1119 |
+
Recall@50, # Target >= 0.75
|
| 1120 |
+
P@10, # Precision at 10
|
| 1121 |
+
AP, # Mean Average Precision
|
| 1122 |
+
RR, # Reciprocal Rank
|
| 1123 |
+
nDCG@20, # Additional depth
|
| 1124 |
+
Recall@100, # Extended recall
|
| 1125 |
+
]
|
| 1126 |
+
|
| 1127 |
+
# 计算聚合指标
|
| 1128 |
+
aggregate = ir_measures.calc_aggregate(measures, qrels, run)
|
| 1129 |
+
|
| 1130 |
+
# 计算逐查询指标
|
| 1131 |
+
per_query = {}
|
| 1132 |
+
for metric in ir_measures.iter_calc(measures, qrels, run):
|
| 1133 |
+
qid = metric.query_id
|
| 1134 |
+
if qid not in per_query:
|
| 1135 |
+
per_query[qid] = {}
|
| 1136 |
+
per_query[qid][str(metric.measure)] = metric.value
|
| 1137 |
+
|
| 1138 |
+
return {
|
| 1139 |
+
"aggregate": {str(k): v for k, v in aggregate.items()},
|
| 1140 |
+
"per_query": per_query,
|
| 1141 |
+
"pass_fail": {
|
| 1142 |
+
"ndcg@10": aggregate.get(nDCG@10, 0) >= 0.60,
|
| 1143 |
+
"recall@50": aggregate.get(Recall@50, 0) >= 0.75,
|
| 1144 |
+
}
|
| 1145 |
+
}
|
| 1146 |
+
|
| 1147 |
+
|
| 1148 |
+
def evaluate_with_eligibility_levels(
|
| 1149 |
+
qrels_path: str,
|
| 1150 |
+
run_path: str,
|
| 1151 |
+
) -> dict:
|
| 1152 |
+
"""Evaluate with TREC CT graded relevance (0=NR, 1=Excluded, 2=Eligible).
|
| 1153 |
+
|
| 1154 |
+
Uses rel=2 for strict eligible-only evaluation.
|
| 1155 |
+
"""
|
| 1156 |
+
qrels = list(ir_measures.read_trec_qrels(qrels_path))
|
| 1157 |
+
run = list(ir_measures.read_trec_run(run_path))
|
| 1158 |
+
|
| 1159 |
+
# Standard evaluation (relevance >= 1)
|
| 1160 |
+
standard_measures = [nDCG@10, Recall@50, P@10]
|
| 1161 |
+
standard = ir_measures.calc_aggregate(standard_measures, qrels, run)
|
| 1162 |
+
|
| 1163 |
+
# Strict evaluation (only eligible = relevance 2)
|
| 1164 |
+
strict_measures = [
|
| 1165 |
+
AP(rel=2),
|
| 1166 |
+
P(rel=2)@10,
|
| 1167 |
+
Recall(rel=2)@50,
|
| 1168 |
+
]
|
| 1169 |
+
strict = ir_measures.calc_aggregate(strict_measures, qrels, run)
|
| 1170 |
+
|
| 1171 |
+
return {
|
| 1172 |
+
"standard": {str(k): v for k, v in standard.items()},
|
| 1173 |
+
"strict_eligible_only": {str(k): v for k, v in strict.items()},
|
| 1174 |
+
}
|
| 1175 |
+
```
|
| 1176 |
+
|
| 1177 |
+
### 4.6 使用 ir_datasets 的替代 qrels/run 格式
|
| 1178 |
+
|
| 1179 |
+
```python
|
| 1180 |
+
def evaluate_from_dicts(
|
| 1181 |
+
qrels_dict: dict[str, dict[str, int]],
|
| 1182 |
+
run_dict: dict[str, list[tuple[str, float]]],
|
| 1183 |
+
) -> dict:
|
| 1184 |
+
"""Evaluate using Python dict format (no files needed).
|
| 1185 |
+
|
| 1186 |
+
Args:
|
| 1187 |
+
qrels_dict: {query_id: {doc_id: relevance}}
|
| 1188 |
+
run_dict: {query_id: [(doc_id, score), ...]}
|
| 1189 |
+
"""
|
| 1190 |
+
# Convert to ir-measures format
|
| 1191 |
+
qrels = [
|
| 1192 |
+
ir_measures.Qrel(qid, did, rel)
|
| 1193 |
+
for qid, docs in qrels_dict.items()
|
| 1194 |
+
for did, rel in docs.items()
|
| 1195 |
+
]
|
| 1196 |
+
run = [
|
| 1197 |
+
ir_measures.ScoredDoc(qid, did, score)
|
| 1198 |
+
for qid, docs in run_dict.items()
|
| 1199 |
+
for did, score in docs
|
| 1200 |
+
]
|
| 1201 |
+
|
| 1202 |
+
measures = [nDCG@10, Recall@50, P@10, AP]
|
| 1203 |
+
aggregate = ir_measures.calc_aggregate(measures, qrels, run)
|
| 1204 |
+
return {str(k): v for k, v in aggregate.items()}
|
| 1205 |
+
```
|
| 1206 |
+
|
| 1207 |
+
---
|
| 1208 |
+
|
| 1209 |
+
## 5. MedGemma 提取评估
|
| 1210 |
+
|
| 1211 |
+
### 5.1 标注数据集设计
|
| 1212 |
+
|
| 1213 |
+
```python
|
| 1214 |
+
# evaluation/extraction_eval.py
|
| 1215 |
+
from dataclasses import dataclass
|
| 1216 |
+
from typing import Optional
|
| 1217 |
+
|
| 1218 |
+
|
| 1219 |
+
@dataclass
|
| 1220 |
+
class AnnotatedField:
|
| 1221 |
+
"""A single annotated field with ground truth and extraction result."""
|
| 1222 |
+
field_name: str # e.g., "biomarkers.egfr"
|
| 1223 |
+
ground_truth: Optional[str] # From Synthea profile (gold standard)
|
| 1224 |
+
extracted: Optional[str] # From MedGemma extraction
|
| 1225 |
+
evidence_span: Optional[str] # Text span in source document
|
| 1226 |
+
source_page: Optional[int] # Page number in PDF
|
| 1227 |
+
|
| 1228 |
+
|
| 1229 |
+
@dataclass
|
| 1230 |
+
class ExtractionAnnotation:
|
| 1231 |
+
"""Complete annotation for one patient's extraction."""
|
| 1232 |
+
patient_id: str
|
| 1233 |
+
fields: list[AnnotatedField]
|
| 1234 |
+
noise_level: str # "clean", "mild", "moderate", "severe"
|
| 1235 |
+
document_type: str # "clinical_letter", "pathology_report", etc.
|
| 1236 |
+
```
|
| 1237 |
+
|
| 1238 |
+
**标注数据集结构:**
|
| 1239 |
+
```json
|
| 1240 |
+
{
|
| 1241 |
+
"patient_id": "synth-001",
|
| 1242 |
+
"noise_level": "mild",
|
| 1243 |
+
"document_type": "clinical_letter",
|
| 1244 |
+
"fields": [
|
| 1245 |
+
{
|
| 1246 |
+
"field_name": "demographics.name",
|
| 1247 |
+
"ground_truth": "John Smith",
|
| 1248 |
+
"extracted": "John Smith",
|
| 1249 |
+
"correct": true
|
| 1250 |
+
},
|
| 1251 |
+
{
|
| 1252 |
+
"field_name": "diagnosis.stage",
|
| 1253 |
+
"ground_truth": "Stage IIIA",
|
| 1254 |
+
"extracted": "Stage 3A",
|
| 1255 |
+
"correct": true,
|
| 1256 |
+
"note": "Equivalent representation"
|
| 1257 |
+
},
|
| 1258 |
+
{
|
| 1259 |
+
"field_name": "biomarkers.egfr",
|
| 1260 |
+
"ground_truth": "Exon 19 deletion",
|
| 1261 |
+
"extracted": "EGFR positive",
|
| 1262 |
+
"correct": false,
|
| 1263 |
+
"note": "Partial extraction - missing specific mutation"
|
| 1264 |
+
}
|
| 1265 |
+
]
|
| 1266 |
+
}
|
| 1267 |
+
```
|
| 1268 |
+
|
| 1269 |
+
### 5.2 字段级 F1 计算
|
| 1270 |
+
|
| 1271 |
+
```python
|
| 1272 |
+
# evaluation/extraction_eval.py
|
| 1273 |
+
from sklearn.metrics import (
|
| 1274 |
+
f1_score, precision_score, recall_score,
|
| 1275 |
+
classification_report, confusion_matrix
|
| 1276 |
+
)
|
| 1277 |
+
import numpy as np
|
| 1278 |
+
|
| 1279 |
+
|
| 1280 |
+
# 定义所有可提取字段
|
| 1281 |
+
EXTRACTION_FIELDS = [
|
| 1282 |
+
"demographics.name",
|
| 1283 |
+
"demographics.sex",
|
| 1284 |
+
"demographics.date_of_birth",
|
| 1285 |
+
"demographics.age",
|
| 1286 |
+
"diagnosis.primary",
|
| 1287 |
+
"diagnosis.stage",
|
| 1288 |
+
"diagnosis.histology",
|
| 1289 |
+
"biomarkers.egfr",
|
| 1290 |
+
"biomarkers.alk",
|
| 1291 |
+
"biomarkers.pdl1_tps",
|
| 1292 |
+
"biomarkers.kras",
|
| 1293 |
+
"biomarkers.ros1",
|
| 1294 |
+
"labs.wbc",
|
| 1295 |
+
"labs.hemoglobin",
|
| 1296 |
+
"labs.platelets",
|
| 1297 |
+
"labs.creatinine",
|
| 1298 |
+
"labs.alt",
|
| 1299 |
+
"labs.ast",
|
| 1300 |
+
"treatments.current_regimen",
|
| 1301 |
+
"performance_status.ecog",
|
| 1302 |
+
]
|
| 1303 |
+
|
| 1304 |
+
|
| 1305 |
+
def compute_field_level_f1(
|
| 1306 |
+
annotations: list[dict],
|
| 1307 |
+
) -> dict:
|
| 1308 |
+
"""Compute field-level F1, precision, recall.
|
| 1309 |
+
|
| 1310 |
+
For each field:
|
| 1311 |
+
- TP: ground_truth exists AND extracted matches
|
| 1312 |
+
- FP: extracted exists BUT ground_truth is None or mismatch
|
| 1313 |
+
- FN: ground_truth exists BUT extracted is None or mismatch
|
| 1314 |
+
|
| 1315 |
+
Args:
|
| 1316 |
+
annotations: List of patient annotation dicts
|
| 1317 |
+
|
| 1318 |
+
Returns:
|
| 1319 |
+
Per-field and aggregate metrics
|
| 1320 |
+
"""
|
| 1321 |
+
field_metrics = {}
|
| 1322 |
+
|
| 1323 |
+
for field_name in EXTRACTION_FIELDS:
|
| 1324 |
+
y_true = [] # 1 if field has ground truth value
|
| 1325 |
+
y_pred = [] # 1 if field was correctly extracted
|
| 1326 |
+
|
| 1327 |
+
for ann in annotations:
|
| 1328 |
+
fields = {f["field_name"]: f for f in ann["fields"]}
|
| 1329 |
+
if field_name in fields:
|
| 1330 |
+
f = fields[field_name]
|
| 1331 |
+
has_gt = f["ground_truth"] is not None
|
| 1332 |
+
is_correct = f.get("correct", False)
|
| 1333 |
+
|
| 1334 |
+
y_true.append(1 if has_gt else 0)
|
| 1335 |
+
y_pred.append(1 if is_correct else 0)
|
| 1336 |
+
|
| 1337 |
+
if len(y_true) > 0:
|
| 1338 |
+
precision = precision_score(y_true, y_pred, zero_division=0)
|
| 1339 |
+
recall = recall_score(y_true, y_pred, zero_division=0)
|
| 1340 |
+
f1 = f1_score(y_true, y_pred, zero_division=0)
|
| 1341 |
+
field_metrics[field_name] = {
|
| 1342 |
+
"precision": round(precision, 4),
|
| 1343 |
+
"recall": round(recall, 4),
|
| 1344 |
+
"f1": round(f1, 4),
|
| 1345 |
+
"support": sum(y_true),
|
| 1346 |
+
}
|
| 1347 |
+
|
| 1348 |
+
# Aggregate metrics
|
| 1349 |
+
all_y_true = []
|
| 1350 |
+
all_y_pred = []
|
| 1351 |
+
for ann in annotations:
|
| 1352 |
+
for f in ann["fields"]:
|
| 1353 |
+
has_gt = f["ground_truth"] is not None
|
| 1354 |
+
is_correct = f.get("correct", False)
|
| 1355 |
+
all_y_true.append(1 if has_gt else 0)
|
| 1356 |
+
all_y_pred.append(1 if is_correct else 0)
|
| 1357 |
+
|
| 1358 |
+
micro_f1 = f1_score(all_y_true, all_y_pred, zero_division=0)
|
| 1359 |
+
macro_f1 = np.mean([m["f1"] for m in field_metrics.values()])
|
| 1360 |
+
|
| 1361 |
+
return {
|
| 1362 |
+
"per_field": field_metrics,
|
| 1363 |
+
"micro_f1": round(micro_f1, 4),
|
| 1364 |
+
"macro_f1": round(macro_f1, 4),
|
| 1365 |
+
"total_fields": len(all_y_true),
|
| 1366 |
+
"pass": micro_f1 >= 0.85, # Target: F1 >= 0.85
|
| 1367 |
+
}
|
| 1368 |
+
|
| 1369 |
+
|
| 1370 |
+
def compute_extraction_report(annotations: list[dict]) -> str:
|
| 1371 |
+
"""Generate a scikit-learn classification_report style output."""
|
| 1372 |
+
all_y_true = []
|
| 1373 |
+
all_y_pred = []
|
| 1374 |
+
labels = []
|
| 1375 |
+
|
| 1376 |
+
for field_name in EXTRACTION_FIELDS:
|
| 1377 |
+
for ann in annotations:
|
| 1378 |
+
fields = {f["field_name"]: f for f in ann["fields"]}
|
| 1379 |
+
if field_name in fields:
|
| 1380 |
+
f = fields[field_name]
|
| 1381 |
+
has_gt = f["ground_truth"] is not None
|
| 1382 |
+
is_correct = f.get("correct", False)
|
| 1383 |
+
all_y_true.append(1 if has_gt else 0)
|
| 1384 |
+
all_y_pred.append(1 if is_correct else 0)
|
| 1385 |
+
|
| 1386 |
+
return classification_report(
|
| 1387 |
+
all_y_true, all_y_pred,
|
| 1388 |
+
target_names=["absent", "present/correct"],
|
| 1389 |
+
digits=4,
|
| 1390 |
+
)
|
| 1391 |
+
|
| 1392 |
+
|
| 1393 |
+
def compare_with_baseline(
|
| 1394 |
+
medgemma_annotations: list[dict],
|
| 1395 |
+
gemini_only_annotations: list[dict],
|
| 1396 |
+
) -> dict:
|
| 1397 |
+
"""Compare MedGemma extraction vs Gemini-only baseline."""
|
| 1398 |
+
medgemma_metrics = compute_field_level_f1(medgemma_annotations)
|
| 1399 |
+
gemini_metrics = compute_field_level_f1(gemini_only_annotations)
|
| 1400 |
+
|
| 1401 |
+
comparison = {}
|
| 1402 |
+
for field_name in EXTRACTION_FIELDS:
|
| 1403 |
+
mg = medgemma_metrics["per_field"].get(field_name, {})
|
| 1404 |
+
gm = gemini_metrics["per_field"].get(field_name, {})
|
| 1405 |
+
comparison[field_name] = {
|
| 1406 |
+
"medgemma_f1": mg.get("f1", 0),
|
| 1407 |
+
"gemini_f1": gm.get("f1", 0),
|
| 1408 |
+
"delta": round(mg.get("f1", 0) - gm.get("f1", 0), 4),
|
| 1409 |
+
}
|
| 1410 |
+
|
| 1411 |
+
return {
|
| 1412 |
+
"per_field_comparison": comparison,
|
| 1413 |
+
"medgemma_overall_f1": medgemma_metrics["micro_f1"],
|
| 1414 |
+
"gemini_overall_f1": gemini_metrics["micro_f1"],
|
| 1415 |
+
"improvement": round(
|
| 1416 |
+
medgemma_metrics["micro_f1"] - gemini_metrics["micro_f1"], 4
|
| 1417 |
+
),
|
| 1418 |
+
}
|
| 1419 |
+
```
|
| 1420 |
+
|
| 1421 |
+
### 5.3 噪声级别对提取性能的影响分析
|
| 1422 |
+
|
| 1423 |
+
```python
|
| 1424 |
+
def analyze_noise_impact(annotations: list[dict]) -> dict:
|
| 1425 |
+
"""Analyze how noise level affects extraction F1."""
|
| 1426 |
+
by_noise = {}
|
| 1427 |
+
for ann in annotations:
|
| 1428 |
+
level = ann["noise_level"]
|
| 1429 |
+
if level not in by_noise:
|
| 1430 |
+
by_noise[level] = []
|
| 1431 |
+
by_noise[level].append(ann)
|
| 1432 |
+
|
| 1433 |
+
results = {}
|
| 1434 |
+
for level, level_anns in by_noise.items():
|
| 1435 |
+
metrics = compute_field_level_f1(level_anns)
|
| 1436 |
+
results[level] = {
|
| 1437 |
+
"micro_f1": metrics["micro_f1"],
|
| 1438 |
+
"macro_f1": metrics["macro_f1"],
|
| 1439 |
+
"n_patients": len(level_anns),
|
| 1440 |
+
}
|
| 1441 |
+
|
| 1442 |
+
return results
|
| 1443 |
+
```
|
| 1444 |
+
|
| 1445 |
+
---
|
| 1446 |
+
|
| 1447 |
+
## 6. 端到端评估管线
|
| 1448 |
+
|
| 1449 |
+
### 6.1 Criterion Decision Accuracy
|
| 1450 |
+
|
| 1451 |
+
```python
|
| 1452 |
+
# evaluation/criterion_eval.py
|
| 1453 |
+
|
| 1454 |
+
def compute_criterion_accuracy(
|
| 1455 |
+
predictions: list[dict],
|
| 1456 |
+
ground_truth: list[dict],
|
| 1457 |
+
) -> dict:
|
| 1458 |
+
"""Compute criterion-level decision accuracy.
|
| 1459 |
+
|
| 1460 |
+
Each prediction/ground_truth entry:
|
| 1461 |
+
{
|
| 1462 |
+
"patient_id": str,
|
| 1463 |
+
"trial_id": str,
|
| 1464 |
+
"criteria": [
|
| 1465 |
+
{"criterion_id": str, "decision": "met"|"not_met"|"unknown",
|
| 1466 |
+
"evidence": str}
|
| 1467 |
+
]
|
| 1468 |
+
}
|
| 1469 |
+
|
| 1470 |
+
Target: >= 0.85
|
| 1471 |
+
"""
|
| 1472 |
+
total = 0
|
| 1473 |
+
correct = 0
|
| 1474 |
+
by_decision_type = {"met": {"tp": 0, "total": 0},
|
| 1475 |
+
"not_met": {"tp": 0, "total": 0},
|
| 1476 |
+
"unknown": {"tp": 0, "total": 0}}
|
| 1477 |
+
|
| 1478 |
+
for pred, gt in zip(predictions, ground_truth):
|
| 1479 |
+
assert pred["patient_id"] == gt["patient_id"]
|
| 1480 |
+
assert pred["trial_id"] == gt["trial_id"]
|
| 1481 |
+
|
| 1482 |
+
gt_map = {c["criterion_id"]: c["decision"] for c in gt["criteria"]}
|
| 1483 |
+
|
| 1484 |
+
for criterion in pred["criteria"]:
|
| 1485 |
+
cid = criterion["criterion_id"]
|
| 1486 |
+
if cid in gt_map:
|
| 1487 |
+
total += 1
|
| 1488 |
+
gt_decision = gt_map[cid]
|
| 1489 |
+
pred_decision = criterion["decision"]
|
| 1490 |
+
by_decision_type[gt_decision]["total"] += 1
|
| 1491 |
+
if pred_decision == gt_decision:
|
| 1492 |
+
correct += 1
|
| 1493 |
+
by_decision_type[gt_decision]["tp"] += 1
|
| 1494 |
+
|
| 1495 |
+
accuracy = correct / total if total > 0 else 0.0
|
| 1496 |
+
|
| 1497 |
+
return {
|
| 1498 |
+
"overall_accuracy": round(accuracy, 4),
|
| 1499 |
+
"total_criteria": total,
|
| 1500 |
+
"correct": correct,
|
| 1501 |
+
"pass": accuracy >= 0.85,
|
| 1502 |
+
"by_decision_type": {
|
| 1503 |
+
k: {
|
| 1504 |
+
"accuracy": round(v["tp"] / v["total"], 4) if v["total"] > 0 else 0,
|
| 1505 |
+
"support": v["total"],
|
| 1506 |
+
}
|
| 1507 |
+
for k, v in by_decision_type.items()
|
| 1508 |
+
},
|
| 1509 |
+
}
|
| 1510 |
+
```
|
| 1511 |
+
|
| 1512 |
+
### 6.2 延迟基准测试
|
| 1513 |
+
|
| 1514 |
+
```python
|
| 1515 |
+
# evaluation/latency_cost_tracker.py
|
| 1516 |
+
import time
|
| 1517 |
+
import json
|
| 1518 |
+
from dataclasses import dataclass, field, asdict
|
| 1519 |
+
from typing import Optional
|
| 1520 |
+
from contextlib import contextmanager
|
| 1521 |
+
|
| 1522 |
+
|
| 1523 |
+
@dataclass
|
| 1524 |
+
class APICallRecord:
|
| 1525 |
+
"""Record of a single API call."""
|
| 1526 |
+
service: str # "medgemma", "gemini", "clinicaltrials_mcp"
|
| 1527 |
+
operation: str # "extract", "search", "evaluate_criterion"
|
| 1528 |
+
latency_ms: float
|
| 1529 |
+
input_tokens: int = 0
|
| 1530 |
+
output_tokens: int = 0
|
| 1531 |
+
cost_usd: float = 0.0
|
| 1532 |
+
timestamp: str = ""
|
| 1533 |
+
|
| 1534 |
+
|
| 1535 |
+
@dataclass
|
| 1536 |
+
class SessionMetrics:
|
| 1537 |
+
"""Aggregate metrics for a patient matching session."""
|
| 1538 |
+
patient_id: str
|
| 1539 |
+
total_latency_ms: float = 0.0
|
| 1540 |
+
total_cost_usd: float = 0.0
|
| 1541 |
+
api_calls: list[APICallRecord] = field(default_factory=list)
|
| 1542 |
+
|
| 1543 |
+
@property
|
| 1544 |
+
def total_latency_s(self) -> float:
|
| 1545 |
+
return self.total_latency_ms / 1000.0
|
| 1546 |
+
|
| 1547 |
+
@property
|
| 1548 |
+
def pass_latency(self) -> bool:
|
| 1549 |
+
"""Target: < 15s per session."""
|
| 1550 |
+
return self.total_latency_s < 15.0
|
| 1551 |
+
|
| 1552 |
+
@property
|
| 1553 |
+
def pass_cost(self) -> bool:
|
| 1554 |
+
"""Target: < $0.50 per session."""
|
| 1555 |
+
return self.total_cost_usd < 0.50
|
| 1556 |
+
|
| 1557 |
+
|
| 1558 |
+
class LatencyCostTracker:
|
| 1559 |
+
"""Track latency and cost across API calls."""
|
| 1560 |
+
|
| 1561 |
+
# Pricing per 1M tokens (approximate)
|
| 1562 |
+
PRICING = {
|
| 1563 |
+
"medgemma": {"input": 0.0, "output": 0.0}, # Self-hosted
|
| 1564 |
+
"gemini": {"input": 1.25, "output": 5.00}, # Gemini Pro
|
| 1565 |
+
"clinicaltrials_mcp": {"input": 0.0, "output": 0.0}, # Free API
|
| 1566 |
+
}
|
| 1567 |
+
|
| 1568 |
+
def __init__(self):
|
| 1569 |
+
self.sessions: list[SessionMetrics] = []
|
| 1570 |
+
self._current_session: Optional[SessionMetrics] = None
|
| 1571 |
+
|
| 1572 |
+
def start_session(self, patient_id: str):
|
| 1573 |
+
self._current_session = SessionMetrics(patient_id=patient_id)
|
| 1574 |
+
|
| 1575 |
+
def end_session(self) -> SessionMetrics:
|
| 1576 |
+
session = self._current_session
|
| 1577 |
+
if session:
|
| 1578 |
+
session.total_latency_ms = sum(c.latency_ms for c in session.api_calls)
|
| 1579 |
+
session.total_cost_usd = sum(c.cost_usd for c in session.api_calls)
|
| 1580 |
+
self.sessions.append(session)
|
| 1581 |
+
self._current_session = None
|
| 1582 |
+
return session
|
| 1583 |
+
|
| 1584 |
+
@contextmanager
|
| 1585 |
+
def track_call(self, service: str, operation: str):
|
| 1586 |
+
"""Context manager to track an API call."""
|
| 1587 |
+
start = time.monotonic()
|
| 1588 |
+
record = APICallRecord(service=service, operation=operation, latency_ms=0)
|
| 1589 |
+
try:
|
| 1590 |
+
yield record
|
| 1591 |
+
finally:
|
| 1592 |
+
record.latency_ms = (time.monotonic() - start) * 1000
|
| 1593 |
+
# Compute cost
|
| 1594 |
+
pricing = self.PRICING.get(service, {"input": 0, "output": 0})
|
| 1595 |
+
record.cost_usd = (
|
| 1596 |
+
record.input_tokens * pricing["input"] / 1_000_000
|
| 1597 |
+
+ record.output_tokens * pricing["output"] / 1_000_000
|
| 1598 |
+
)
|
| 1599 |
+
if self._current_session:
|
| 1600 |
+
self._current_session.api_calls.append(record)
|
| 1601 |
+
|
| 1602 |
+
def summary(self) -> dict:
|
| 1603 |
+
"""Generate aggregate summary across all sessions."""
|
| 1604 |
+
if not self.sessions:
|
| 1605 |
+
return {}
|
| 1606 |
+
|
| 1607 |
+
latencies = [s.total_latency_s for s in self.sessions]
|
| 1608 |
+
costs = [s.total_cost_usd for s in self.sessions]
|
| 1609 |
+
|
| 1610 |
+
return {
|
| 1611 |
+
"n_sessions": len(self.sessions),
|
| 1612 |
+
"latency": {
|
| 1613 |
+
"mean_s": round(sum(latencies) / len(latencies), 2),
|
| 1614 |
+
"p50_s": round(sorted(latencies)[len(latencies) // 2], 2),
|
| 1615 |
+
"p95_s": round(sorted(latencies)[int(len(latencies) * 0.95)], 2),
|
| 1616 |
+
"max_s": round(max(latencies), 2),
|
| 1617 |
+
"pass_rate": round(
|
| 1618 |
+
sum(1 for s in self.sessions if s.pass_latency) / len(self.sessions), 4
|
| 1619 |
+
),
|
| 1620 |
+
},
|
| 1621 |
+
"cost": {
|
| 1622 |
+
"mean_usd": round(sum(costs) / len(costs), 4),
|
| 1623 |
+
"total_usd": round(sum(costs), 4),
|
| 1624 |
+
"max_usd": round(max(costs), 4),
|
| 1625 |
+
"pass_rate": round(
|
| 1626 |
+
sum(1 for s in self.sessions if s.pass_cost) / len(self.sessions), 4
|
| 1627 |
+
),
|
| 1628 |
+
},
|
| 1629 |
+
"targets": {
|
| 1630 |
+
"latency_pass": all(s.pass_latency for s in self.sessions),
|
| 1631 |
+
"cost_pass": all(s.pass_cost for s in self.sessions),
|
| 1632 |
+
},
|
| 1633 |
+
}
|
| 1634 |
+
```
|
| 1635 |
+
|
| 1636 |
+
---
|
| 1637 |
+
|
| 1638 |
+
## 7. TDD 测试用例
|
| 1639 |
+
|
| 1640 |
+
### 7.1 Synthea 数据验证测试
|
| 1641 |
+
|
| 1642 |
+
```python
|
| 1643 |
+
# tests/test_synthea_data.py
|
| 1644 |
+
import pytest
|
| 1645 |
+
import json
|
| 1646 |
+
from pathlib import Path
|
| 1647 |
+
|
| 1648 |
+
# 预期的 FHIR Resource 类型
|
| 1649 |
+
REQUIRED_RESOURCE_TYPES = {"Patient", "Condition", "Observation", "Encounter"}
|
| 1650 |
+
|
| 1651 |
+
|
| 1652 |
+
class TestSyntheaDataValidation:
|
| 1653 |
+
"""Validate Synthea FHIR output for TrialPath requirements."""
|
| 1654 |
+
|
| 1655 |
+
def test_fhir_bundle_is_valid_json(self, fhir_file):
|
| 1656 |
+
"""Bundle must be valid JSON."""
|
| 1657 |
+
with open(fhir_file) as f:
|
| 1658 |
+
data = json.load(f)
|
| 1659 |
+
assert data["resourceType"] == "Bundle"
|
| 1660 |
+
assert "entry" in data
|
| 1661 |
+
|
| 1662 |
+
def test_bundle_contains_required_resources(self, fhir_file):
|
| 1663 |
+
"""Bundle must contain Patient, Condition, Observation, Encounter."""
|
| 1664 |
+
with open(fhir_file) as f:
|
| 1665 |
+
bundle = json.load(f)
|
| 1666 |
+
resource_types = {
|
| 1667 |
+
e["resource"]["resourceType"] for e in bundle["entry"]
|
| 1668 |
+
}
|
| 1669 |
+
for rt in REQUIRED_RESOURCE_TYPES:
|
| 1670 |
+
assert rt in resource_types, f"Missing {rt} resource"
|
| 1671 |
+
|
| 1672 |
+
def test_patient_has_demographics(self, fhir_file):
|
| 1673 |
+
"""Patient resource must have name, gender, birthDate."""
|
| 1674 |
+
with open(fhir_file) as f:
|
| 1675 |
+
bundle = json.load(f)
|
| 1676 |
+
patients = [
|
| 1677 |
+
e["resource"] for e in bundle["entry"]
|
| 1678 |
+
if e["resource"]["resourceType"] == "Patient"
|
| 1679 |
+
]
|
| 1680 |
+
assert len(patients) == 1
|
| 1681 |
+
patient = patients[0]
|
| 1682 |
+
assert "name" in patient
|
| 1683 |
+
assert "gender" in patient
|
| 1684 |
+
assert "birthDate" in patient
|
| 1685 |
+
|
| 1686 |
+
def test_lung_cancer_condition_present(self, fhir_file):
|
| 1687 |
+
"""At least one Condition must be NSCLC or lung cancer."""
|
| 1688 |
+
with open(fhir_file) as f:
|
| 1689 |
+
bundle = json.load(f)
|
| 1690 |
+
conditions = [
|
| 1691 |
+
e["resource"] for e in bundle["entry"]
|
| 1692 |
+
if e["resource"]["resourceType"] == "Condition"
|
| 1693 |
+
]
|
| 1694 |
+
lung_cancer_codes = {"254637007", "254632001", "162573006"}
|
| 1695 |
+
has_lung_cancer = False
|
| 1696 |
+
for cond in conditions:
|
| 1697 |
+
codings = cond.get("code", {}).get("coding", [])
|
| 1698 |
+
for c in codings:
|
| 1699 |
+
if c.get("code") in lung_cancer_codes:
|
| 1700 |
+
has_lung_cancer = True
|
| 1701 |
+
assert has_lung_cancer, "No lung cancer Condition found"
|
| 1702 |
+
|
| 1703 |
+
def test_patient_profile_conversion(self, fhir_file):
|
| 1704 |
+
"""FHIR Bundle must convert to valid PatientProfile."""
|
| 1705 |
+
profile = parse_fhir_bundle(Path(fhir_file))
|
| 1706 |
+
assert profile.patient_id != ""
|
| 1707 |
+
assert profile.demographics.name != ""
|
| 1708 |
+
assert profile.demographics.sex in ("male", "female")
|
| 1709 |
+
assert profile.diagnosis.primary != ""
|
| 1710 |
+
|
| 1711 |
+
def test_batch_generation_produces_500_patients(self, output_dir):
|
| 1712 |
+
"""Batch generation must produce at least 500 FHIR files."""
|
| 1713 |
+
fhir_files = list(Path(output_dir).glob("*.json"))
|
| 1714 |
+
assert len(fhir_files) >= 500
|
| 1715 |
+
|
| 1716 |
+
def test_nsclc_ratio(self, all_profiles):
|
| 1717 |
+
"""~85% of lung cancer patients should be NSCLC."""
|
| 1718 |
+
nsclc_count = sum(
|
| 1719 |
+
1 for p in all_profiles
|
| 1720 |
+
if "non-small cell" in p.diagnosis.primary.lower()
|
| 1721 |
+
or "nsclc" in p.diagnosis.primary.lower()
|
| 1722 |
+
)
|
| 1723 |
+
ratio = nsclc_count / len(all_profiles)
|
| 1724 |
+
assert 0.70 <= ratio <= 0.95, f"NSCLC ratio {ratio} outside expected range"
|
| 1725 |
+
```
|
| 1726 |
+
|
| 1727 |
+
### 7.2 PDF 生成正确性测试
|
| 1728 |
+
|
| 1729 |
+
```python
|
| 1730 |
+
# tests/test_pdf_generation.py
|
| 1731 |
+
import pytest
|
| 1732 |
+
from pathlib import Path
|
| 1733 |
+
from data.templates.clinical_letter import generate_clinical_letter
|
| 1734 |
+
from data.templates.pathology_report import generate_pathology_report
|
| 1735 |
+
from data.templates.lab_report import generate_lab_report
|
| 1736 |
+
|
| 1737 |
+
|
| 1738 |
+
class TestPDFGeneration:
|
| 1739 |
+
"""Test that PDF generation produces valid documents."""
|
| 1740 |
+
|
| 1741 |
+
SAMPLE_PROFILE = {
|
| 1742 |
+
"patient_id": "test-001",
|
| 1743 |
+
"demographics": {
|
| 1744 |
+
"name": "Jane Doe",
|
| 1745 |
+
"sex": "female",
|
| 1746 |
+
"date_of_birth": "1960-05-15",
|
| 1747 |
+
},
|
| 1748 |
+
"diagnosis": {
|
| 1749 |
+
"primary": "Non-small cell lung cancer, adenocarcinoma",
|
| 1750 |
+
"stage": "Stage IIIA",
|
| 1751 |
+
"histology": "adenocarcinoma",
|
| 1752 |
+
"diagnosis_date": "2024-01-15",
|
| 1753 |
+
},
|
| 1754 |
+
"biomarkers": {
|
| 1755 |
+
"egfr": "Exon 19 deletion",
|
| 1756 |
+
"alk": "Negative",
|
| 1757 |
+
"pdl1_tps": "60%",
|
| 1758 |
+
"kras": None,
|
| 1759 |
+
},
|
| 1760 |
+
"labs": [
|
| 1761 |
+
{"name": "WBC", "value": 7.2, "unit": "10*3/uL", "date": "2024-01-10", "loinc_code": "6690-2"},
|
| 1762 |
+
{"name": "Hemoglobin", "value": 12.5, "unit": "g/dL", "date": "2024-01-10", "loinc_code": "718-7"},
|
| 1763 |
+
],
|
| 1764 |
+
"treatments": [
|
| 1765 |
+
{"name": "Cisplatin", "type": "medication", "start_date": "2024-02-01"},
|
| 1766 |
+
],
|
| 1767 |
+
}
|
| 1768 |
+
|
| 1769 |
+
def test_clinical_letter_generates_pdf(self, tmp_path):
|
| 1770 |
+
"""Clinical letter must generate a non-empty PDF file."""
|
| 1771 |
+
output = tmp_path / "letter.pdf"
|
| 1772 |
+
generate_clinical_letter(self.SAMPLE_PROFILE, str(output))
|
| 1773 |
+
assert output.exists()
|
| 1774 |
+
assert output.stat().st_size > 0
|
| 1775 |
+
|
| 1776 |
+
def test_pathology_report_generates_pdf(self, tmp_path):
|
| 1777 |
+
"""Pathology report must generate a non-empty PDF file."""
|
| 1778 |
+
output = tmp_path / "pathology.pdf"
|
| 1779 |
+
generate_pathology_report(self.SAMPLE_PROFILE, str(output))
|
| 1780 |
+
assert output.exists()
|
| 1781 |
+
assert output.stat().st_size > 0
|
| 1782 |
+
|
| 1783 |
+
def test_lab_report_generates_pdf(self, tmp_path):
|
| 1784 |
+
"""Lab report must generate a non-empty PDF file."""
|
| 1785 |
+
output = tmp_path / "lab.pdf"
|
| 1786 |
+
generate_lab_report(self.SAMPLE_PROFILE, str(output))
|
| 1787 |
+
assert output.exists()
|
| 1788 |
+
assert output.stat().st_size > 0
|
| 1789 |
+
|
| 1790 |
+
def test_pdf_contains_patient_name(self, tmp_path):
|
| 1791 |
+
"""Generated PDF must contain patient name (OCR-verifiable)."""
|
| 1792 |
+
output = tmp_path / "letter.pdf"
|
| 1793 |
+
generate_clinical_letter(self.SAMPLE_PROFILE, str(output))
|
| 1794 |
+
# Read PDF text (using pdfplumber or PyPDF2)
|
| 1795 |
+
import pdfplumber
|
| 1796 |
+
with pdfplumber.open(str(output)) as pdf:
|
| 1797 |
+
text = ""
|
| 1798 |
+
for page in pdf.pages:
|
| 1799 |
+
text += page.extract_text() or ""
|
| 1800 |
+
assert "Jane Doe" in text
|
| 1801 |
+
|
| 1802 |
+
def test_pdf_contains_biomarkers(self, tmp_path):
|
| 1803 |
+
"""Generated PDF must contain biomarker results."""
|
| 1804 |
+
output = tmp_path / "pathology.pdf"
|
| 1805 |
+
generate_pathology_report(self.SAMPLE_PROFILE, str(output))
|
| 1806 |
+
import pdfplumber
|
| 1807 |
+
with pdfplumber.open(str(output)) as pdf:
|
| 1808 |
+
text = ""
|
| 1809 |
+
for page in pdf.pages:
|
| 1810 |
+
text += page.extract_text() or ""
|
| 1811 |
+
assert "EGFR" in text
|
| 1812 |
+
assert "Exon 19" in text or "positive" in text.lower()
|
| 1813 |
+
|
| 1814 |
+
def test_missing_biomarker_handled_gracefully(self, tmp_path):
|
| 1815 |
+
"""PDF generation should not crash when biomarkers are None."""
|
| 1816 |
+
profile = self.SAMPLE_PROFILE.copy()
|
| 1817 |
+
profile["biomarkers"] = {
|
| 1818 |
+
"egfr": None, "alk": None, "pdl1_tps": None, "kras": None
|
| 1819 |
+
}
|
| 1820 |
+
output = tmp_path / "letter.pdf"
|
| 1821 |
+
generate_clinical_letter(profile, str(output))
|
| 1822 |
+
assert output.exists()
|
| 1823 |
+
```
|
| 1824 |
+
|
| 1825 |
+
### 7.3 噪声注入效果验证测试
|
| 1826 |
+
|
| 1827 |
+
```python
|
| 1828 |
+
# tests/test_noise_injection.py
|
| 1829 |
+
import pytest
|
| 1830 |
+
from data.noise.noise_injector import NoiseInjector
|
| 1831 |
+
|
| 1832 |
+
|
| 1833 |
+
class TestNoiseInjection:
|
| 1834 |
+
"""Test noise injection produces expected results."""
|
| 1835 |
+
|
| 1836 |
+
def test_clean_noise_no_changes(self):
|
| 1837 |
+
"""Clean level should produce no changes."""
|
| 1838 |
+
injector = NoiseInjector(noise_level="clean", seed=42)
|
| 1839 |
+
text = "Patient has EGFR mutation positive"
|
| 1840 |
+
noisy, records = injector.inject_text_noise(text)
|
| 1841 |
+
assert noisy == text
|
| 1842 |
+
assert len(records) == 0
|
| 1843 |
+
|
| 1844 |
+
def test_mild_noise_produces_some_changes(self):
|
| 1845 |
+
"""Mild noise should produce some but limited changes."""
|
| 1846 |
+
injector = NoiseInjector(noise_level="mild", seed=42)
|
| 1847 |
+
# Use longer text to increase chance of noise
|
| 1848 |
+
text = "The patient is a 65 year old male with stage IIIA " * 10
|
| 1849 |
+
noisy, records = injector.inject_text_noise(text)
|
| 1850 |
+
# Should have some changes but not too many
|
| 1851 |
+
assert len(records) >= 0 # May or may not have changes depending on seed
|
| 1852 |
+
|
| 1853 |
+
def test_severe_noise_produces_many_changes(self):
|
| 1854 |
+
"""Severe noise should produce noticeable changes."""
|
| 1855 |
+
injector = NoiseInjector(noise_level="severe", seed=42)
|
| 1856 |
+
text = "The 50 year old patient has stage 1 NSCLC " * 20
|
| 1857 |
+
noisy, records = injector.inject_text_noise(text)
|
| 1858 |
+
assert noisy != text # Should differ from original
|
| 1859 |
+
assert len(records) > 0
|
| 1860 |
+
|
| 1861 |
+
def test_ocr_error_types_are_valid(self):
|
| 1862 |
+
"""OCR errors should only substitute known character pairs."""
|
| 1863 |
+
injector = NoiseInjector(noise_level="severe", seed=42)
|
| 1864 |
+
text = "0123456789 OIBS" * 10
|
| 1865 |
+
_, records = injector.inject_text_noise(text)
|
| 1866 |
+
for r in records:
|
| 1867 |
+
if r["type"] == "ocr_error":
|
| 1868 |
+
assert r["original"] in NoiseInjector.OCR_ERROR_MAP
|
| 1869 |
+
assert r["replacement"] in NoiseInjector.OCR_ERROR_MAP[r["original"]]
|
| 1870 |
+
|
| 1871 |
+
def test_missing_value_injection(self):
|
| 1872 |
+
"""Missing value injection should remove some fields."""
|
| 1873 |
+
injector = NoiseInjector(noise_level="moderate", seed=42)
|
| 1874 |
+
profile = {
|
| 1875 |
+
"biomarkers": {"egfr": "positive", "alk": "negative",
|
| 1876 |
+
"pdl1_tps": "60%", "kras": "negative", "ros1": "negative"},
|
| 1877 |
+
"diagnosis": {"stage": "IIIA", "histology": "adenocarcinoma"},
|
| 1878 |
+
}
|
| 1879 |
+
modified, removed = injector.inject_missing_values(profile)
|
| 1880 |
+
# At 10% rate with 7 fields, expect 0-3 removals
|
| 1881 |
+
assert len(removed) <= 7
|
| 1882 |
+
for field_path in removed:
|
| 1883 |
+
section, field_name = field_path.split(".")
|
| 1884 |
+
assert modified[section][field_name] is None
|
| 1885 |
+
|
| 1886 |
+
def test_noise_is_deterministic_with_seed(self):
|
| 1887 |
+
"""Same seed should produce identical results."""
|
| 1888 |
+
text = "Patient has stage IIIA non-small cell lung cancer"
|
| 1889 |
+
inj1 = NoiseInjector(noise_level="moderate", seed=123)
|
| 1890 |
+
inj2 = NoiseInjector(noise_level="moderate", seed=123)
|
| 1891 |
+
noisy1, _ = inj1.inject_text_noise(text)
|
| 1892 |
+
noisy2, _ = inj2.inject_text_noise(text)
|
| 1893 |
+
assert noisy1 == noisy2
|
| 1894 |
+
|
| 1895 |
+
def test_different_seeds_produce_different_results(self):
|
| 1896 |
+
"""Different seeds should generally produce different noise."""
|
| 1897 |
+
text = "The 50 year old patient has 10 biomarker tests 0 1 5 8" * 20
|
| 1898 |
+
inj1 = NoiseInjector(noise_level="severe", seed=1)
|
| 1899 |
+
inj2 = NoiseInjector(noise_level="severe", seed=999)
|
| 1900 |
+
noisy1, _ = inj1.inject_text_noise(text)
|
| 1901 |
+
noisy2, _ = inj2.inject_text_noise(text)
|
| 1902 |
+
# With severe noise on long text, different seeds should differ
|
| 1903 |
+
assert noisy1 != noisy2
|
| 1904 |
+
```
|
| 1905 |
+
|
| 1906 |
+
### 7.4 TREC 评估计算测试
|
| 1907 |
+
|
| 1908 |
+
```python
|
| 1909 |
+
# tests/test_trec_evaluation.py
|
| 1910 |
+
import pytest
|
| 1911 |
+
import ir_measures
|
| 1912 |
+
from ir_measures import nDCG, Recall, P, AP
|
| 1913 |
+
|
| 1914 |
+
|
| 1915 |
+
class TestTRECEvaluation:
|
| 1916 |
+
"""Test TREC evaluation metric computation."""
|
| 1917 |
+
|
| 1918 |
+
@pytest.fixture
|
| 1919 |
+
def sample_qrels(self):
|
| 1920 |
+
"""Sample qrels with known ground truth."""
|
| 1921 |
+
return [
|
| 1922 |
+
ir_measures.Qrel("q1", "d1", 2), # eligible
|
| 1923 |
+
ir_measures.Qrel("q1", "d2", 1), # excluded
|
| 1924 |
+
ir_measures.Qrel("q1", "d3", 0), # not relevant
|
| 1925 |
+
ir_measures.Qrel("q1", "d4", 2), # eligible
|
| 1926 |
+
ir_measures.Qrel("q1", "d5", 0), # not relevant
|
| 1927 |
+
]
|
| 1928 |
+
|
| 1929 |
+
@pytest.fixture
|
| 1930 |
+
def perfect_run(self):
|
| 1931 |
+
"""Run that ranks all relevant docs at top."""
|
| 1932 |
+
return [
|
| 1933 |
+
ir_measures.ScoredDoc("q1", "d1", 1.0),
|
| 1934 |
+
ir_measures.ScoredDoc("q1", "d4", 0.9),
|
| 1935 |
+
ir_measures.ScoredDoc("q1", "d2", 0.8),
|
| 1936 |
+
ir_measures.ScoredDoc("q1", "d3", 0.1),
|
| 1937 |
+
ir_measures.ScoredDoc("q1", "d5", 0.05),
|
| 1938 |
+
]
|
| 1939 |
+
|
| 1940 |
+
@pytest.fixture
|
| 1941 |
+
def worst_run(self):
|
| 1942 |
+
"""Run that ranks relevant docs at bottom."""
|
| 1943 |
+
return [
|
| 1944 |
+
ir_measures.ScoredDoc("q1", "d3", 1.0),
|
| 1945 |
+
ir_measures.ScoredDoc("q1", "d5", 0.9),
|
| 1946 |
+
ir_measures.ScoredDoc("q1", "d2", 0.5),
|
| 1947 |
+
ir_measures.ScoredDoc("q1", "d4", 0.2),
|
| 1948 |
+
ir_measures.ScoredDoc("q1", "d1", 0.1),
|
| 1949 |
+
]
|
| 1950 |
+
|
| 1951 |
+
def test_perfect_ndcg_at_10(self, sample_qrels, perfect_run):
|
| 1952 |
+
"""Perfect ranking should yield NDCG@10 = 1.0."""
|
| 1953 |
+
result = ir_measures.calc_aggregate([nDCG@10], sample_qrels, perfect_run)
|
| 1954 |
+
assert result[nDCG@10] == pytest.approx(1.0, abs=0.01)
|
| 1955 |
+
|
| 1956 |
+
def test_worst_ndcg_lower(self, sample_qrels, perfect_run, worst_run):
|
| 1957 |
+
"""Worst ranking should yield lower NDCG than perfect."""
|
| 1958 |
+
perfect = ir_measures.calc_aggregate([nDCG@10], sample_qrels, perfect_run)
|
| 1959 |
+
worst = ir_measures.calc_aggregate([nDCG@10], sample_qrels, worst_run)
|
| 1960 |
+
assert worst[nDCG@10] < perfect[nDCG@10]
|
| 1961 |
+
|
| 1962 |
+
def test_recall_at_50_perfect(self, sample_qrels, perfect_run):
|
| 1963 |
+
"""Perfect run should retrieve all relevant docs."""
|
| 1964 |
+
result = ir_measures.calc_aggregate([Recall@50], sample_qrels, perfect_run)
|
| 1965 |
+
assert result[Recall@50] == pytest.approx(1.0, abs=0.01)
|
| 1966 |
+
|
| 1967 |
+
def test_empty_run_yields_zero(self, sample_qrels):
|
| 1968 |
+
"""Empty run should yield 0 for all metrics."""
|
| 1969 |
+
empty_run = []
|
| 1970 |
+
result = ir_measures.calc_aggregate(
|
| 1971 |
+
[nDCG@10, Recall@50, P@10], sample_qrels, empty_run
|
| 1972 |
+
)
|
| 1973 |
+
assert result[nDCG@10] == 0.0
|
| 1974 |
+
assert result[Recall@50] == 0.0
|
| 1975 |
+
assert result[P@10] == 0.0
|
| 1976 |
+
|
| 1977 |
+
def test_per_query_results(self, sample_qrels, perfect_run):
|
| 1978 |
+
"""Per-query results should return one entry per query."""
|
| 1979 |
+
results = list(ir_measures.iter_calc(
|
| 1980 |
+
[nDCG@10], sample_qrels, perfect_run
|
| 1981 |
+
))
|
| 1982 |
+
assert len(results) == 1 # Only q1
|
| 1983 |
+
assert results[0].query_id == "q1"
|
| 1984 |
+
|
| 1985 |
+
def test_trec_run_format_conversion(self):
|
| 1986 |
+
"""Test TrialPath results to TREC format conversion."""
|
| 1987 |
+
results = {
|
| 1988 |
+
"1": [
|
| 1989 |
+
{"nct_id": "NCT001", "score": 0.95},
|
| 1990 |
+
{"nct_id": "NCT002", "score": 0.80},
|
| 1991 |
+
]
|
| 1992 |
+
}
|
| 1993 |
+
run_str = convert_trialpath_to_trec_run(results, "test-run")
|
| 1994 |
+
lines = run_str.strip().split("\n")
|
| 1995 |
+
assert len(lines) == 2
|
| 1996 |
+
assert "NCT001" in lines[0]
|
| 1997 |
+
assert "1" == lines[0].split()[3] # rank 1
|
| 1998 |
+
assert "2" == lines[1].split()[3] # rank 2
|
| 1999 |
+
|
| 2000 |
+
def test_graded_relevance_evaluation(self, sample_qrels, perfect_run):
|
| 2001 |
+
"""Test strict eligible-only evaluation (rel=2)."""
|
| 2002 |
+
strict = ir_measures.calc_aggregate(
|
| 2003 |
+
[AP(rel=2)], sample_qrels, perfect_run
|
| 2004 |
+
)
|
| 2005 |
+
assert strict[AP(rel=2)] > 0.0
|
| 2006 |
+
|
| 2007 |
+
def test_qrels_dict_format(self):
|
| 2008 |
+
"""Test evaluation from dict format."""
|
| 2009 |
+
qrels = {"q1": {"d1": 2, "d2": 1, "d3": 0}}
|
| 2010 |
+
run = [
|
| 2011 |
+
ir_measures.ScoredDoc("q1", "d1", 1.0),
|
| 2012 |
+
ir_measures.ScoredDoc("q1", "d2", 0.5),
|
| 2013 |
+
ir_measures.ScoredDoc("q1", "d3", 0.1),
|
| 2014 |
+
]
|
| 2015 |
+
result = ir_measures.calc_aggregate([nDCG@10], qrels, run)
|
| 2016 |
+
assert nDCG@10 in result
|
| 2017 |
+
```
|
| 2018 |
+
|
| 2019 |
+
### 7.5 F1 计算测试
|
| 2020 |
+
|
| 2021 |
+
```python
|
| 2022 |
+
# tests/test_extraction_f1.py
|
| 2023 |
+
import pytest
|
| 2024 |
+
from evaluation.extraction_eval import compute_field_level_f1
|
| 2025 |
+
|
| 2026 |
+
|
| 2027 |
+
class TestExtractionF1:
|
| 2028 |
+
"""Test F1 computation for field-level extraction."""
|
| 2029 |
+
|
| 2030 |
+
def test_perfect_extraction(self):
|
| 2031 |
+
"""All fields correctly extracted should yield F1=1.0."""
|
| 2032 |
+
annotations = [{
|
| 2033 |
+
"patient_id": "p1",
|
| 2034 |
+
"noise_level": "clean",
|
| 2035 |
+
"document_type": "clinical_letter",
|
| 2036 |
+
"fields": [
|
| 2037 |
+
{"field_name": "demographics.name", "ground_truth": "John", "extracted": "John", "correct": True},
|
| 2038 |
+
{"field_name": "demographics.sex", "ground_truth": "male", "extracted": "male", "correct": True},
|
| 2039 |
+
{"field_name": "diagnosis.primary", "ground_truth": "NSCLC", "extracted": "NSCLC", "correct": True},
|
| 2040 |
+
{"field_name": "biomarkers.egfr", "ground_truth": "positive", "extracted": "positive", "correct": True},
|
| 2041 |
+
]
|
| 2042 |
+
}]
|
| 2043 |
+
result = compute_field_level_f1(annotations)
|
| 2044 |
+
assert result["micro_f1"] == 1.0
|
| 2045 |
+
assert result["pass"] is True
|
| 2046 |
+
|
| 2047 |
+
def test_zero_extraction(self):
|
| 2048 |
+
"""No correct extractions should yield F1=0."""
|
| 2049 |
+
annotations = [{
|
| 2050 |
+
"patient_id": "p1",
|
| 2051 |
+
"noise_level": "clean",
|
| 2052 |
+
"document_type": "clinical_letter",
|
| 2053 |
+
"fields": [
|
| 2054 |
+
{"field_name": "demographics.name", "ground_truth": "John", "extracted": "Jane", "correct": False},
|
| 2055 |
+
{"field_name": "diagnosis.primary", "ground_truth": "NSCLC", "extracted": None, "correct": False},
|
| 2056 |
+
]
|
| 2057 |
+
}]
|
| 2058 |
+
result = compute_field_level_f1(annotations)
|
| 2059 |
+
assert result["micro_f1"] == 0.0
|
| 2060 |
+
assert result["pass"] is False
|
| 2061 |
+
|
| 2062 |
+
def test_partial_extraction(self):
|
| 2063 |
+
"""Partial extraction should yield 0 < F1 < 1."""
|
| 2064 |
+
annotations = [{
|
| 2065 |
+
"patient_id": "p1",
|
| 2066 |
+
"noise_level": "mild",
|
| 2067 |
+
"document_type": "clinical_letter",
|
| 2068 |
+
"fields": [
|
| 2069 |
+
{"field_name": "demographics.name", "ground_truth": "John", "extracted": "John", "correct": True},
|
| 2070 |
+
{"field_name": "diagnosis.primary", "ground_truth": "NSCLC", "extracted": "lung ca", "correct": False},
|
| 2071 |
+
{"field_name": "biomarkers.egfr", "ground_truth": "positive", "extracted": "positive", "correct": True},
|
| 2072 |
+
{"field_name": "biomarkers.alk", "ground_truth": "negative", "extracted": None, "correct": False},
|
| 2073 |
+
]
|
| 2074 |
+
}]
|
| 2075 |
+
result = compute_field_level_f1(annotations)
|
| 2076 |
+
assert 0.0 < result["micro_f1"] < 1.0
|
| 2077 |
+
|
| 2078 |
+
def test_f1_threshold_boundary(self):
|
| 2079 |
+
"""F1 exactly at 0.85 should pass."""
|
| 2080 |
+
# Create annotations that produce exactly 0.85 F1
|
| 2081 |
+
fields = []
|
| 2082 |
+
for i in range(85):
|
| 2083 |
+
fields.append({"field_name": f"field_{i}", "ground_truth": "val", "extracted": "val", "correct": True})
|
| 2084 |
+
for i in range(15):
|
| 2085 |
+
fields.append({"field_name": f"field_miss_{i}", "ground_truth": "val", "extracted": None, "correct": False})
|
| 2086 |
+
|
| 2087 |
+
annotations = [{"patient_id": "p1", "noise_level": "clean",
|
| 2088 |
+
"document_type": "test", "fields": fields}]
|
| 2089 |
+
result = compute_field_level_f1(annotations)
|
| 2090 |
+
# With 85/100 correct, F1 should be ~0.85
|
| 2091 |
+
assert result["pass"] is True
|
| 2092 |
+
|
| 2093 |
+
def test_empty_annotations(self):
|
| 2094 |
+
"""Empty annotations should not crash."""
|
| 2095 |
+
result = compute_field_level_f1([])
|
| 2096 |
+
assert result["micro_f1"] == 0.0
|
| 2097 |
+
|
| 2098 |
+
def test_none_ground_truth_not_counted(self):
|
| 2099 |
+
"""Fields with None ground truth should be handled."""
|
| 2100 |
+
annotations = [{
|
| 2101 |
+
"patient_id": "p1",
|
| 2102 |
+
"noise_level": "clean",
|
| 2103 |
+
"document_type": "test",
|
| 2104 |
+
"fields": [
|
| 2105 |
+
{"field_name": "biomarkers.ros1", "ground_truth": None,
|
| 2106 |
+
"extracted": None, "correct": False},
|
| 2107 |
+
]
|
| 2108 |
+
}]
|
| 2109 |
+
result = compute_field_level_f1(annotations)
|
| 2110 |
+
# Should not crash, though metrics may be 0
|
| 2111 |
+
assert "micro_f1" in result
|
| 2112 |
+
```
|
| 2113 |
+
|
| 2114 |
+
### 7.6 端到端管线测试
|
| 2115 |
+
|
| 2116 |
+
```python
|
| 2117 |
+
# tests/test_e2e_pipeline.py
|
| 2118 |
+
import pytest
|
| 2119 |
+
from pathlib import Path
|
| 2120 |
+
|
| 2121 |
+
|
| 2122 |
+
class TestE2EPipeline:
|
| 2123 |
+
"""End-to-end tests for the complete data & evaluation pipeline."""
|
| 2124 |
+
|
| 2125 |
+
def test_fhir_to_profile_to_pdf_roundtrip(self, sample_fhir_file, tmp_path):
|
| 2126 |
+
"""FHIR → PatientProfile → PDF should complete without error."""
|
| 2127 |
+
from data.generate_synthetic_patients import parse_fhir_bundle
|
| 2128 |
+
from data.templates.clinical_letter import generate_clinical_letter
|
| 2129 |
+
from dataclasses import asdict
|
| 2130 |
+
|
| 2131 |
+
# Step 1: Parse FHIR
|
| 2132 |
+
profile = parse_fhir_bundle(Path(sample_fhir_file))
|
| 2133 |
+
assert profile.patient_id != ""
|
| 2134 |
+
|
| 2135 |
+
# Step 2: Generate PDF
|
| 2136 |
+
pdf_path = tmp_path / "test_roundtrip.pdf"
|
| 2137 |
+
generate_clinical_letter(asdict(profile), str(pdf_path))
|
| 2138 |
+
assert pdf_path.exists()
|
| 2139 |
+
assert pdf_path.stat().st_size > 1000 # Reasonable PDF size
|
| 2140 |
+
|
| 2141 |
+
def test_noisy_pdf_pipeline(self, sample_profile, tmp_path):
|
| 2142 |
+
"""Profile → Noisy PDF should inject noise and produce valid PDF."""
|
| 2143 |
+
from data.templates.clinical_letter import generate_clinical_letter
|
| 2144 |
+
from data.noise.noise_injector import NoiseInjector
|
| 2145 |
+
|
| 2146 |
+
injector = NoiseInjector(noise_level="moderate", seed=42)
|
| 2147 |
+
|
| 2148 |
+
# Inject text noise into profile fields for PDF rendering
|
| 2149 |
+
profile = sample_profile.copy()
|
| 2150 |
+
dx_text = profile["diagnosis"]["primary"]
|
| 2151 |
+
noisy_dx, records = injector.inject_text_noise(dx_text)
|
| 2152 |
+
profile["diagnosis"]["primary"] = noisy_dx
|
| 2153 |
+
|
| 2154 |
+
pdf_path = tmp_path / "noisy.pdf"
|
| 2155 |
+
generate_clinical_letter(profile, str(pdf_path))
|
| 2156 |
+
assert pdf_path.exists()
|
| 2157 |
+
|
| 2158 |
+
def test_trec_evaluation_pipeline(self, tmp_path):
|
| 2159 |
+
"""Complete TREC evaluation from dicts should produce metrics."""
|
| 2160 |
+
import ir_measures
|
| 2161 |
+
from ir_measures import nDCG, Recall, P
|
| 2162 |
+
|
| 2163 |
+
qrels = [
|
| 2164 |
+
ir_measures.Qrel("1", "NCT001", 2),
|
| 2165 |
+
ir_measures.Qrel("1", "NCT002", 1),
|
| 2166 |
+
ir_measures.Qrel("1", "NCT003", 0),
|
| 2167 |
+
]
|
| 2168 |
+
run = [
|
| 2169 |
+
ir_measures.ScoredDoc("1", "NCT001", 0.9),
|
| 2170 |
+
ir_measures.ScoredDoc("1", "NCT002", 0.5),
|
| 2171 |
+
ir_measures.ScoredDoc("1", "NCT003", 0.1),
|
| 2172 |
+
]
|
| 2173 |
+
|
| 2174 |
+
result = ir_measures.calc_aggregate(
|
| 2175 |
+
[nDCG@10, Recall@50, P@10], qrels, run
|
| 2176 |
+
)
|
| 2177 |
+
assert nDCG@10 in result
|
| 2178 |
+
assert Recall@50 in result
|
| 2179 |
+
assert result[nDCG@10] > 0
|
| 2180 |
+
|
| 2181 |
+
def test_latency_tracker_integration(self):
|
| 2182 |
+
"""Latency tracker should record and summarize calls."""
|
| 2183 |
+
import time
|
| 2184 |
+
from evaluation.latency_cost_tracker import LatencyCostTracker
|
| 2185 |
+
|
| 2186 |
+
tracker = LatencyCostTracker()
|
| 2187 |
+
tracker.start_session("test-patient")
|
| 2188 |
+
|
| 2189 |
+
with tracker.track_call("gemini", "search_anchors") as record:
|
| 2190 |
+
time.sleep(0.01) # Simulate API call
|
| 2191 |
+
record.input_tokens = 500
|
| 2192 |
+
record.output_tokens = 200
|
| 2193 |
+
|
| 2194 |
+
session = tracker.end_session()
|
| 2195 |
+
assert session.total_latency_ms > 0
|
| 2196 |
+
assert len(session.api_calls) == 1
|
| 2197 |
+
|
| 2198 |
+
summary = tracker.summary()
|
| 2199 |
+
assert summary["n_sessions"] == 1
|
| 2200 |
+
assert summary["latency"]["mean_s"] > 0
|
| 2201 |
+
```
|
| 2202 |
+
|
| 2203 |
+
---
|
| 2204 |
+
|
| 2205 |
+
## 8. 附录
|
| 2206 |
+
|
| 2207 |
+
### 8.1 数据格式规范
|
| 2208 |
+
|
| 2209 |
+
#### PatientProfile v1 JSON Schema
|
| 2210 |
+
```json
|
| 2211 |
+
{
|
| 2212 |
+
"$schema": "http://json-schema.org/draft-07/schema#",
|
| 2213 |
+
"type": "object",
|
| 2214 |
+
"required": ["patient_id", "demographics", "diagnosis"],
|
| 2215 |
+
"properties": {
|
| 2216 |
+
"patient_id": {"type": "string"},
|
| 2217 |
+
"demographics": {
|
| 2218 |
+
"type": "object",
|
| 2219 |
+
"properties": {
|
| 2220 |
+
"name": {"type": "string"},
|
| 2221 |
+
"sex": {"type": "string", "enum": ["male", "female"]},
|
| 2222 |
+
"date_of_birth": {"type": "string", "format": "date"},
|
| 2223 |
+
"age": {"type": "integer"},
|
| 2224 |
+
"state": {"type": "string"}
|
| 2225 |
+
}
|
| 2226 |
+
},
|
| 2227 |
+
"diagnosis": {
|
| 2228 |
+
"type": "object",
|
| 2229 |
+
"properties": {
|
| 2230 |
+
"primary": {"type": "string"},
|
| 2231 |
+
"stage": {"type": ["string", "null"]},
|
| 2232 |
+
"histology": {"type": ["string", "null"]},
|
| 2233 |
+
"diagnosis_date": {"type": "string", "format": "date"}
|
| 2234 |
+
}
|
| 2235 |
+
},
|
| 2236 |
+
"biomarkers": {
|
| 2237 |
+
"type": "object",
|
| 2238 |
+
"properties": {
|
| 2239 |
+
"egfr": {"type": ["string", "null"]},
|
| 2240 |
+
"alk": {"type": ["string", "null"]},
|
| 2241 |
+
"pdl1_tps": {"type": ["string", "null"]},
|
| 2242 |
+
"kras": {"type": ["string", "null"]},
|
| 2243 |
+
"ros1": {"type": ["string", "null"]}
|
| 2244 |
+
}
|
| 2245 |
+
},
|
| 2246 |
+
"labs": {
|
| 2247 |
+
"type": "array",
|
| 2248 |
+
"items": {
|
| 2249 |
+
"type": "object",
|
| 2250 |
+
"properties": {
|
| 2251 |
+
"name": {"type": "string"},
|
| 2252 |
+
"value": {"type": "number"},
|
| 2253 |
+
"unit": {"type": "string"},
|
| 2254 |
+
"date": {"type": "string"},
|
| 2255 |
+
"loinc_code": {"type": "string"}
|
| 2256 |
+
}
|
| 2257 |
+
}
|
| 2258 |
+
},
|
| 2259 |
+
"treatments": {
|
| 2260 |
+
"type": "array",
|
| 2261 |
+
"items": {
|
| 2262 |
+
"type": "object",
|
| 2263 |
+
"properties": {
|
| 2264 |
+
"name": {"type": "string"},
|
| 2265 |
+
"type": {"type": "string", "enum": ["medication", "procedure", "radiation"]},
|
| 2266 |
+
"start_date": {"type": "string"},
|
| 2267 |
+
"end_date": {"type": ["string", "null"]}
|
| 2268 |
+
}
|
| 2269 |
+
}
|
| 2270 |
+
},
|
| 2271 |
+
"unknowns": {"type": "array", "items": {"type": "string"}},
|
| 2272 |
+
"evidence_spans": {"type": "array"}
|
| 2273 |
+
}
|
| 2274 |
+
}
|
| 2275 |
+
```
|
| 2276 |
+
|
| 2277 |
+
### 8.2 工具 API 参考
|
| 2278 |
+
|
| 2279 |
+
#### ir_datasets
|
| 2280 |
+
|
| 2281 |
+
| API | 说明 | 返回类型 |
|
| 2282 |
+
|-----|------|----------|
|
| 2283 |
+
| `ir_datasets.load("clinicaltrials/2021/trec-ct-2021")` | 加载 TREC CT 2021 数据集 | Dataset |
|
| 2284 |
+
| `dataset.queries_iter()` | 遍历 topics | GenericQuery(query_id, text) |
|
| 2285 |
+
| `dataset.qrels_iter()` | 遍历 qrels | TrecQrel(query_id, doc_id, relevance, iteration) |
|
| 2286 |
+
| `dataset.docs_iter()` | 遍历文档 | ClinicalTrialsDoc(doc_id, title, condition, summary, detailed_description, eligibility) |
|
| 2287 |
+
|
| 2288 |
+
**数据集 ID:**
|
| 2289 |
+
- `clinicaltrials/2021/trec-ct-2021` — 75 queries, 35,832 qrels
|
| 2290 |
+
- `clinicaltrials/2021/trec-ct-2022` — 50 queries
|
| 2291 |
+
- `clinicaltrials/2021` — 376K 文档(基础集)
|
| 2292 |
+
|
| 2293 |
+
#### ir-measures
|
| 2294 |
+
|
| 2295 |
+
| API | 说明 |
|
| 2296 |
+
|-----|------|
|
| 2297 |
+
| `ir_measures.calc_aggregate(measures, qrels, run)` | 计算聚合指标 |
|
| 2298 |
+
| `ir_measures.iter_calc(measures, qrels, run)` | 逐查询指标迭代 |
|
| 2299 |
+
| `ir_measures.read_trec_qrels(path)` | 读取 TREC qrels 文件 |
|
| 2300 |
+
| `ir_measures.read_trec_run(path)` | 读取 TREC run 文件 |
|
| 2301 |
+
| `ir_measures.Qrel(qid, did, rel)` | 创建 qrel 记录 |
|
| 2302 |
+
| `ir_measures.ScoredDoc(qid, did, score)` | 创建评分文档记录 |
|
| 2303 |
+
|
| 2304 |
+
**指标对象:**
|
| 2305 |
+
- `nDCG@10` — Normalized DCG at cutoff 10
|
| 2306 |
+
- `Recall@50` — Recall at cutoff 50
|
| 2307 |
+
- `P@10` — Precision at cutoff 10
|
| 2308 |
+
- `AP` — Average Precision
|
| 2309 |
+
- `AP(rel=2)` — AP with minimum relevance 2
|
| 2310 |
+
- `RR` — Reciprocal Rank
|
| 2311 |
+
|
| 2312 |
+
#### scikit-learn 评估
|
| 2313 |
+
|
| 2314 |
+
| API | 说明 |
|
| 2315 |
+
|-----|------|
|
| 2316 |
+
| `f1_score(y_true, y_pred, average=None)` | 逐类别 F1 |
|
| 2317 |
+
| `f1_score(y_true, y_pred, average='micro')` | 全局 micro F1 |
|
| 2318 |
+
| `f1_score(y_true, y_pred, average='macro')` | 逐类别平均 F1 |
|
| 2319 |
+
| `precision_score(y_true, y_pred)` | Precision |
|
| 2320 |
+
| `recall_score(y_true, y_pred)` | Recall |
|
| 2321 |
+
| `classification_report(y_true, y_pred)` | 完整分类报告 |
|
| 2322 |
+
| `confusion_matrix(y_true, y_pred)` | 混淆矩阵 |
|
| 2323 |
+
|
| 2324 |
+
#### Synthea CLI
|
| 2325 |
+
|
| 2326 |
+
| 参数 | 说明 | 示例 |
|
| 2327 |
+
|------|------|------|
|
| 2328 |
+
| `-p N` | 生成 N 个患者 | `-p 500` |
|
| 2329 |
+
| `-s SEED` | 随机种子 | `-s 42` |
|
| 2330 |
+
| `-m MODULE` | 指定疾病模块 | `-m lung_cancer` |
|
| 2331 |
+
| `STATE` | 指定州 | `Massachusetts` |
|
| 2332 |
+
| `--exporter.fhir.export` | 启用 FHIR R4 导出 | `=true` |
|
| 2333 |
+
| `--exporter.pretty_print` | 美化 JSON 输出 | `=true` |
|
| 2334 |
+
|
| 2335 |
+
#### ReportLab 核心 API
|
| 2336 |
+
|
| 2337 |
+
| 组件 | 说明 |
|
| 2338 |
+
|------|------|
|
| 2339 |
+
| `SimpleDocTemplate(path, pagesize=letter)` | 创建文档模板 |
|
| 2340 |
+
| `Paragraph(text, style)` | 段落流式组件 |
|
| 2341 |
+
| `Table(data, colWidths)` | 表格流式组件 |
|
| 2342 |
+
| `TableStyle(commands)` | 表格样式 |
|
| 2343 |
+
| `Spacer(width, height)` | 间距组件 |
|
| 2344 |
+
| `getSampleStyleSheet()` | 获取默认样式表 |
|
| 2345 |
+
|
| 2346 |
+
#### Augraphy 降质管线
|
| 2347 |
+
|
| 2348 |
+
| 组件 | 说明 |
|
| 2349 |
+
|------|------|
|
| 2350 |
+
| `AugraphyPipeline(ink_phase, paper_phase, post_phase)` | 完整降质管线 |
|
| 2351 |
+
| `InkBleed(p=0.5)` | 墨水渗透效果 |
|
| 2352 |
+
| `Letterpress(p=0.3)` | 活版印刷效果 |
|
| 2353 |
+
| `LowInkPeriodicLines(p=0.3)` | 低墨水周期性线条 |
|
| 2354 |
+
| `DirtyDrum(p=0.3)` | 脏鼓效果 |
|
| 2355 |
+
| `SubtleNoise(p=0.5)` | 微噪声 |
|
| 2356 |
+
| `Jpeg(p=0.5)` | JPEG 压缩伪影 |
|
| 2357 |
+
| `Brightness(p=0.5)` | 亮度变化 |
|
| 2358 |
+
|
| 2359 |
+
### 8.3 Python 依赖清单
|
| 2360 |
+
|
| 2361 |
+
```
|
| 2362 |
+
# requirements-data-eval.txt
|
| 2363 |
+
ir-datasets>=0.5.6
|
| 2364 |
+
ir-measures>=0.3.1
|
| 2365 |
+
reportlab>=4.0
|
| 2366 |
+
augraphy>=8.0
|
| 2367 |
+
Pillow>=10.0
|
| 2368 |
+
pdfplumber>=0.10
|
| 2369 |
+
scikit-learn>=1.3
|
| 2370 |
+
numpy>=1.24
|
| 2371 |
+
pandas>=2.0
|
| 2372 |
+
pdf2image>=1.16
|
| 2373 |
+
```
|
| 2374 |
+
|
| 2375 |
+
### 8.4 成功指标速查表
|
| 2376 |
+
|
| 2377 |
+
| 指标 | 目标值 | 评估工具 | 数据源 |
|
| 2378 |
+
|------|--------|----------|--------|
|
| 2379 |
+
| MedGemma Extraction F1 | >= 0.85 | scikit-learn `f1_score` | 合成患者 + Ground Truth |
|
| 2380 |
+
| Trial Retrieval Recall@50 | >= 0.75 | ir-measures `Recall@50` | TREC CT 2021/2022 |
|
| 2381 |
+
| Trial Ranking NDCG@10 | >= 0.60 | ir-measures `nDCG@10` | TREC CT 2021/2022 |
|
| 2382 |
+
| Criterion Decision Accuracy | >= 0.85 | Custom accuracy | 标注 EligibilityLedger |
|
| 2383 |
+
| Latency | < 15s | `LatencyCostTracker` | API call timing |
|
| 2384 |
+
| Cost | < $0.50/session | `LatencyCostTracker` | Token counting |
|
docs/tdd-guide-ux-frontend.md
ADDED
|
@@ -0,0 +1,1524 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# TrialPath UX & Frontend TDD-Ready Implementation Guide
|
| 2 |
+
|
| 3 |
+
> Generated from DeepWiki research on `streamlit/streamlit` and `emcie-co/parlant`, supplemented by official Parlant documentation (`parlant.io`).
|
| 4 |
+
>
|
| 5 |
+
> **Architecture Decisions:**
|
| 6 |
+
> - Parlant runs as **independent service** (REST API mode), FE communicates via `ParlantClient` (httpx)
|
| 7 |
+
> - Doctor packet export: **JSON + Markdown** (no PDF generation in PoC)
|
| 8 |
+
> - MedGemma: **HF Inference Endpoint** (cloud, no local GPU)
|
| 9 |
+
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
## 1. Architecture Overview
|
| 13 |
+
|
| 14 |
+
### 1.1 File Structure
|
| 15 |
+
|
| 16 |
+
```
|
| 17 |
+
app/
|
| 18 |
+
app.py # Entrypoint: st.navigation, shared sidebar, Parlant client init
|
| 19 |
+
pages/
|
| 20 |
+
1_upload.py # INGEST state: document upload + extraction trigger
|
| 21 |
+
2_profile_review.py # PRESCREEN state: PatientProfile review + edit
|
| 22 |
+
3_trial_matching.py # VALIDATE_TRIALS state: trial search + eligibility cards
|
| 23 |
+
4_gap_analysis.py # GAP_FOLLOWUP state: gap analysis + iterative refinement
|
| 24 |
+
5_summary.py # SUMMARY state: final report + doctor packet export
|
| 25 |
+
components/
|
| 26 |
+
file_uploader.py # Multi-file PDF uploader component
|
| 27 |
+
profile_card.py # PatientProfile display/edit component
|
| 28 |
+
trial_card.py # Traffic-light eligibility card component
|
| 29 |
+
gap_card.py # Gap analysis action card component
|
| 30 |
+
progress_tracker.py # Journey state progress indicator
|
| 31 |
+
chat_panel.py # Parlant message panel (send/receive)
|
| 32 |
+
search_process.py # Search refinement step-by-step visualization
|
| 33 |
+
disclaimer_banner.py # Medical disclaimer banner (always visible)
|
| 34 |
+
services/
|
| 35 |
+
parlant_client.py # Parlant REST API wrapper (sessions, events, agents)
|
| 36 |
+
state_manager.py # Session state orchestration
|
| 37 |
+
tests/
|
| 38 |
+
test_upload_page.py
|
| 39 |
+
test_profile_review_page.py
|
| 40 |
+
test_trial_matching_page.py
|
| 41 |
+
test_gap_analysis_page.py
|
| 42 |
+
test_summary_page.py
|
| 43 |
+
test_components.py
|
| 44 |
+
test_parlant_client.py
|
| 45 |
+
test_state_manager.py
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
### 1.2 Module Dependency Graph
|
| 49 |
+
|
| 50 |
+
```
|
| 51 |
+
app.py
|
| 52 |
+
-> pages/* (via st.navigation)
|
| 53 |
+
-> services/parlant_client.py (Parlant REST API)
|
| 54 |
+
-> services/state_manager.py (session state orchestration)
|
| 55 |
+
|
| 56 |
+
pages/*
|
| 57 |
+
-> components/* (UI building blocks)
|
| 58 |
+
-> services/parlant_client.py
|
| 59 |
+
-> services/state_manager.py
|
| 60 |
+
|
| 61 |
+
components/*
|
| 62 |
+
-> st.session_state (read/write)
|
| 63 |
+
|
| 64 |
+
services/parlant_client.py
|
| 65 |
+
-> parlant-client SDK or httpx (REST calls to Parlant server)
|
| 66 |
+
|
| 67 |
+
services/state_manager.py
|
| 68 |
+
-> st.session_state
|
| 69 |
+
-> services/parlant_client.py
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
### 1.3 Key Dependencies
|
| 73 |
+
|
| 74 |
+
| Package | Purpose |
|
| 75 |
+
|---------------------|--------------------------------------------|
|
| 76 |
+
| `streamlit>=1.40` | Frontend framework, multipage app, AppTest |
|
| 77 |
+
| `parlant-client` | Python SDK for Parlant REST API |
|
| 78 |
+
| `httpx` | Async HTTP client (fallback for Parlant) |
|
| 79 |
+
| `pytest` | Test runner |
|
| 80 |
+
|
| 81 |
+
---
|
| 82 |
+
|
| 83 |
+
## 2. Streamlit Framework Guide
|
| 84 |
+
|
| 85 |
+
### 2.1 Multipage App with `st.navigation`
|
| 86 |
+
|
| 87 |
+
TrialPath uses the modern `st.navigation` API (not legacy `pages/` auto-discovery) for explicit page control tied to Journey states.
|
| 88 |
+
|
| 89 |
+
**Pattern: Entrypoint with state-aware navigation**
|
| 90 |
+
|
| 91 |
+
```python
|
| 92 |
+
# app.py
|
| 93 |
+
import streamlit as st
|
| 94 |
+
from services.state_manager import get_current_journey_state
|
| 95 |
+
|
| 96 |
+
st.set_page_config(page_title="TrialPath", page_icon=":material/medical_services:", layout="wide")
|
| 97 |
+
|
| 98 |
+
# Define pages mapped to Journey states
|
| 99 |
+
pages = {
|
| 100 |
+
"Patient Journey": [
|
| 101 |
+
st.Page("pages/1_upload.py", title="Upload Documents", icon=":material/upload_file:"),
|
| 102 |
+
st.Page("pages/2_profile_review.py", title="Review Profile", icon=":material/person:"),
|
| 103 |
+
st.Page("pages/3_trial_matching.py", title="Trial Matching", icon=":material/search:"),
|
| 104 |
+
st.Page("pages/4_gap_analysis.py", title="Gap Analysis", icon=":material/analytics:"),
|
| 105 |
+
st.Page("pages/5_summary.py", title="Summary & Export", icon=":material/summarize:"),
|
| 106 |
+
]
|
| 107 |
+
}
|
| 108 |
+
|
| 109 |
+
current_page = st.navigation(pages)
|
| 110 |
+
|
| 111 |
+
# Shared sidebar: progress tracker
|
| 112 |
+
with st.sidebar:
|
| 113 |
+
st.markdown("### Journey Progress")
|
| 114 |
+
state = get_current_journey_state()
|
| 115 |
+
# Render progress indicator based on current Parlant Journey state
|
| 116 |
+
|
| 117 |
+
current_page.run()
|
| 118 |
+
```
|
| 119 |
+
|
| 120 |
+
**Key API details (from DeepWiki):**
|
| 121 |
+
- `st.navigation(pages, position="sidebar")` returns the current `StreamlitPage`, must call `.run()`.
|
| 122 |
+
- `st.switch_page("pages/2_profile_review.py")` for programmatic navigation (stops current page execution).
|
| 123 |
+
- `st.page_link(page, label, icon)` for clickable navigation links.
|
| 124 |
+
- Pages organized as dict = sections in sidebar nav.
|
| 125 |
+
|
| 126 |
+
### 2.2 File Upload (`st.file_uploader`)
|
| 127 |
+
|
| 128 |
+
**Pattern: Multi-file PDF upload with validation**
|
| 129 |
+
|
| 130 |
+
```python
|
| 131 |
+
# components/file_uploader.py
|
| 132 |
+
import streamlit as st
|
| 133 |
+
from typing import List
|
| 134 |
+
|
| 135 |
+
def render_file_uploader() -> List:
|
| 136 |
+
"""Render multi-file uploader for clinical documents."""
|
| 137 |
+
uploaded_files = st.file_uploader(
|
| 138 |
+
"Upload clinical documents (PDF)",
|
| 139 |
+
type=["pdf", "png", "jpg", "jpeg"],
|
| 140 |
+
accept_multiple_files=True,
|
| 141 |
+
key="clinical_docs_uploader",
|
| 142 |
+
help="Upload clinic letters, pathology reports, lab results",
|
| 143 |
+
)
|
| 144 |
+
|
| 145 |
+
if uploaded_files:
|
| 146 |
+
st.success(f"{len(uploaded_files)} file(s) uploaded")
|
| 147 |
+
for f in uploaded_files:
|
| 148 |
+
st.caption(f"{f.name} ({f.size / 1024:.1f} KB)")
|
| 149 |
+
|
| 150 |
+
return uploaded_files or []
|
| 151 |
+
```
|
| 152 |
+
|
| 153 |
+
**Key API details (from DeepWiki):**
|
| 154 |
+
- `accept_multiple_files=True` returns `List[UploadedFile]`.
|
| 155 |
+
- `UploadedFile` extends `io.BytesIO` -- can be passed directly to PDF parsers.
|
| 156 |
+
- Default size limit: 200 MB per file (configurable via `server.maxUploadSize` in `config.toml` or per-widget `max_upload_size` param).
|
| 157 |
+
- `type` parameter is best-effort filtering, not a security guarantee.
|
| 158 |
+
- Files are held in memory after upload.
|
| 159 |
+
- Additive selection: clicking browse again adds files, does not replace.
|
| 160 |
+
|
| 161 |
+
### 2.3 Session State Management
|
| 162 |
+
|
| 163 |
+
**Pattern: Centralized state initialization**
|
| 164 |
+
|
| 165 |
+
```python
|
| 166 |
+
# services/state_manager.py
|
| 167 |
+
import streamlit as st
|
| 168 |
+
|
| 169 |
+
JOURNEY_STATES = ["INGEST", "PRESCREEN", "VALIDATE_TRIALS", "GAP_FOLLOWUP", "SUMMARY"]
|
| 170 |
+
|
| 171 |
+
def init_session_state():
|
| 172 |
+
"""Initialize all session state variables with defaults."""
|
| 173 |
+
defaults = {
|
| 174 |
+
"journey_state": "INGEST",
|
| 175 |
+
"parlant_session_id": None,
|
| 176 |
+
"parlant_agent_id": None,
|
| 177 |
+
"patient_profile": None, # PatientProfile dict
|
| 178 |
+
"uploaded_files": [],
|
| 179 |
+
"search_anchors": None, # SearchAnchors dict
|
| 180 |
+
"trial_candidates": [], # List[TrialCandidate]
|
| 181 |
+
"eligibility_ledger": [], # List[EligibilityLedger]
|
| 182 |
+
"last_event_offset": 0, # For Parlant long-polling
|
| 183 |
+
}
|
| 184 |
+
for key, default_value in defaults.items():
|
| 185 |
+
if key not in st.session_state:
|
| 186 |
+
st.session_state[key] = default_value
|
| 187 |
+
|
| 188 |
+
def get_current_journey_state() -> str:
|
| 189 |
+
return st.session_state.get("journey_state", "INGEST")
|
| 190 |
+
|
| 191 |
+
def advance_journey(target_state: str):
|
| 192 |
+
"""Advance Journey to target state with validation."""
|
| 193 |
+
current_idx = JOURNEY_STATES.index(st.session_state.journey_state)
|
| 194 |
+
target_idx = JOURNEY_STATES.index(target_state)
|
| 195 |
+
if target_idx > current_idx:
|
| 196 |
+
st.session_state.journey_state = target_state
|
| 197 |
+
```
|
| 198 |
+
|
| 199 |
+
**Key API details (from DeepWiki):**
|
| 200 |
+
- `st.session_state` is a `SessionStateProxy` wrapping thread-safe `SafeSessionState`.
|
| 201 |
+
- Internal three-layer dict: `_old_state` (previous run), `_new_session_state` (user-set), `_new_widget_state` (widget values).
|
| 202 |
+
- Cannot modify widget-bound state after widget instantiation in same run (raises `StreamlitAPIException`).
|
| 203 |
+
- Widget `key` parameter maps to `st.session_state[key]` for read access.
|
| 204 |
+
- Values must be pickle-serializable.
|
| 205 |
+
|
| 206 |
+
### 2.4 Real-Time Progress Feedback
|
| 207 |
+
|
| 208 |
+
**Pattern: AI inference progress with `st.status`**
|
| 209 |
+
|
| 210 |
+
```python
|
| 211 |
+
# Usage in pages/1_upload.py
|
| 212 |
+
def run_extraction(uploaded_files):
|
| 213 |
+
"""Run MedGemma extraction with real-time status feedback."""
|
| 214 |
+
with st.status("Extracting clinical data from documents...", expanded=True) as status:
|
| 215 |
+
st.write("Reading uploaded documents...")
|
| 216 |
+
# Step 1: Send files to MedGemma
|
| 217 |
+
st.write("Running AI extraction (MedGemma 4B)...")
|
| 218 |
+
# Step 2: Poll for results
|
| 219 |
+
st.write("Building patient profile...")
|
| 220 |
+
# Step 3: Parse results into PatientProfile
|
| 221 |
+
status.update(label="Extraction complete!", state="complete")
|
| 222 |
+
```
|
| 223 |
+
|
| 224 |
+
**Pattern: Streaming LLM output with `st.write_stream`**
|
| 225 |
+
|
| 226 |
+
```python
|
| 227 |
+
def stream_gap_analysis(generator):
|
| 228 |
+
"""Stream Gemini gap analysis output with typewriter effect."""
|
| 229 |
+
st.write_stream(generator)
|
| 230 |
+
```
|
| 231 |
+
|
| 232 |
+
**Pattern: Auto-refreshing fragment for Parlant events**
|
| 233 |
+
|
| 234 |
+
```python
|
| 235 |
+
@st.fragment(run_every=3) # Poll every 3 seconds
|
| 236 |
+
def parlant_event_listener():
|
| 237 |
+
"""Fragment that polls Parlant for new events without full page rerun."""
|
| 238 |
+
from services.parlant_client import poll_events
|
| 239 |
+
new_events = poll_events(
|
| 240 |
+
st.session_state.parlant_session_id,
|
| 241 |
+
st.session_state.last_event_offset
|
| 242 |
+
)
|
| 243 |
+
if new_events:
|
| 244 |
+
for event in new_events:
|
| 245 |
+
if event["kind"] == "message" and event["source"] == "ai_agent":
|
| 246 |
+
st.chat_message("assistant").write(event["message"])
|
| 247 |
+
elif event["kind"] == "status":
|
| 248 |
+
st.caption(f"Agent status: {event['data']}")
|
| 249 |
+
st.session_state.last_event_offset = new_events[-1]["offset"] + 1
|
| 250 |
+
```
|
| 251 |
+
|
| 252 |
+
**Key API details (from DeepWiki):**
|
| 253 |
+
- `st.status(label, expanded, state)` -- context manager, auto-completes. States: `"running"`, `"complete"`, `"error"`.
|
| 254 |
+
- `st.spinner(text, show_time=True)` -- simple loading indicator.
|
| 255 |
+
- `st.progress(value, text)` -- 0-100 int or 0.0-1.0 float.
|
| 256 |
+
- `st.toast(body, icon, duration)` -- transient notification, top-right.
|
| 257 |
+
- `st.write_stream(generator)` -- typewriter effect for strings, `st.write` for other types. Supports OpenAI `ChatCompletionChunk` and LangChain `AIMessageChunk`.
|
| 258 |
+
- `@st.fragment(run_every=N)` -- partial rerun every N seconds, isolated from full app rerun.
|
| 259 |
+
- `st.rerun(scope="fragment")` -- rerun only the enclosing fragment.
|
| 260 |
+
|
| 261 |
+
### 2.5 Layout System (from DeepWiki `streamlit/streamlit`)
|
| 262 |
+
|
| 263 |
+
**Layout primitives for TrialPath UI:**
|
| 264 |
+
|
| 265 |
+
| Primitive | Purpose in TrialPath | Key Params |
|
| 266 |
+
|-----------|---------------------|------------|
|
| 267 |
+
| `st.columns(spec)` | Trial card grid, profile fields side-by-side | `spec` (int or list of ratios), `gap`, `vertical_alignment` |
|
| 268 |
+
| `st.tabs(labels)` | Switching between trial categories (Eligible/Borderline/Not Eligible) | Returns list of containers |
|
| 269 |
+
| `st.expander(label)` | Collapsible criterion detail, evidence citations | `expanded` (bool), `icon` |
|
| 270 |
+
| `st.container(height, border)` | Scrollable trial list, chat panel | `height` (int px), `horizontal` (bool) |
|
| 271 |
+
| `st.empty()` | Dynamic status updates, replacing content | Single-element, replaceable |
|
| 272 |
+
|
| 273 |
+
**Layout composition pattern for trial cards:**
|
| 274 |
+
|
| 275 |
+
```python
|
| 276 |
+
# Trial matching page layout
|
| 277 |
+
tabs = st.tabs(["Eligible", "Borderline", "Not Eligible", "Unknown"])
|
| 278 |
+
|
| 279 |
+
with tabs[0]: # Eligible trials
|
| 280 |
+
for trial in eligible_trials:
|
| 281 |
+
with st.expander(f"{trial['nct_id']} - {trial['title']}", expanded=False):
|
| 282 |
+
cols = st.columns([0.7, 0.3])
|
| 283 |
+
with cols[0]:
|
| 284 |
+
st.markdown(f"**Phase**: {trial['phase']}")
|
| 285 |
+
st.markdown(f"**Sponsor**: {trial['sponsor']}")
|
| 286 |
+
with cols[1]:
|
| 287 |
+
# Traffic light summary
|
| 288 |
+
met = sum(1 for c in trial['criteria'] if c['status'] == 'MET')
|
| 289 |
+
total = len(trial['criteria'])
|
| 290 |
+
st.metric("Criteria Met", f"{met}/{total}")
|
| 291 |
+
|
| 292 |
+
# Criterion-level detail
|
| 293 |
+
for criterion in trial['criteria']:
|
| 294 |
+
col1, col2 = st.columns([0.8, 0.2])
|
| 295 |
+
with col1:
|
| 296 |
+
st.write(criterion['description'])
|
| 297 |
+
with col2:
|
| 298 |
+
color_map = {"MET": "green", "NOT_MET": "red", "BORDERLINE": "orange", "UNKNOWN": "grey"}
|
| 299 |
+
st.markdown(f":{color_map[criterion['status']]}[{criterion['status']}]")
|
| 300 |
+
```
|
| 301 |
+
|
| 302 |
+
**Responsive behavior:**
|
| 303 |
+
- `st.columns` stacks vertically at viewport width <= 640px.
|
| 304 |
+
- Use `width="stretch"` for elements to fill available space.
|
| 305 |
+
- Avoid nesting columns more than once.
|
| 306 |
+
- Scrolling containers: avoid heights > 500px for mobile.
|
| 307 |
+
|
| 308 |
+
### 2.6 Caching System (from DeepWiki `streamlit/streamlit`)
|
| 309 |
+
|
| 310 |
+
**Two caching decorators:**
|
| 311 |
+
|
| 312 |
+
| Decorator | Returns | Serialization | Use Case |
|
| 313 |
+
|-----------|---------|---------------|----------|
|
| 314 |
+
| `@st.cache_data` | Copy of cached value | Requires pickle | Data transformations, API responses, search results |
|
| 315 |
+
| `@st.cache_resource` | Shared instance (singleton) | No pickle needed | ParlantClient instance, HTTP clients, model objects |
|
| 316 |
+
|
| 317 |
+
**TrialPath caching patterns:**
|
| 318 |
+
|
| 319 |
+
```python
|
| 320 |
+
@st.cache_resource
|
| 321 |
+
def get_parlant_client() -> ParlantClient:
|
| 322 |
+
"""Singleton Parlant client shared across all sessions."""
|
| 323 |
+
return ParlantClient(base_url=os.environ.get("PARLANT_URL", "http://localhost:8000"))
|
| 324 |
+
|
| 325 |
+
@st.cache_data(ttl=300) # 5-minute TTL
|
| 326 |
+
def search_trials(query_params: dict) -> list:
|
| 327 |
+
"""Cache trial search results to avoid redundant MCP calls."""
|
| 328 |
+
client = get_parlant_client()
|
| 329 |
+
# ... perform search
|
| 330 |
+
return results
|
| 331 |
+
```
|
| 332 |
+
|
| 333 |
+
**Key details:**
|
| 334 |
+
- Cache key = hash of (function source code + arguments).
|
| 335 |
+
- `ttl` (time-to-live): auto-expire entries. Use for API results that may change.
|
| 336 |
+
- `max_entries`: limit cache size.
|
| 337 |
+
- `hash_funcs`: custom hash for unhashable args.
|
| 338 |
+
- Prefix arg with `_` to exclude from hash (e.g., `_client`).
|
| 339 |
+
- `@st.cache_resource` objects are shared across ALL sessions/threads -- must be thread-safe.
|
| 340 |
+
- Do NOT call interactive widgets inside cached functions (triggers warning).
|
| 341 |
+
- Cache invalidated on: argument change, source code change, TTL expiry, `max_entries` overflow, explicit `.clear()`.
|
| 342 |
+
|
| 343 |
+
### 2.7 Global Disclaimer Banner (PRD Section 9)
|
| 344 |
+
|
| 345 |
+
Every page must display a medical disclaimer. Implement as a shared component called from `app.py` before navigation.
|
| 346 |
+
|
| 347 |
+
**Pattern: Global disclaimer in entrypoint**
|
| 348 |
+
|
| 349 |
+
```python
|
| 350 |
+
# app/app.py (add before st.navigation)
|
| 351 |
+
from components.disclaimer_banner import render_disclaimer
|
| 352 |
+
|
| 353 |
+
# Always render disclaimer at top of every page
|
| 354 |
+
render_disclaimer()
|
| 355 |
+
|
| 356 |
+
nav = st.navigation(pages)
|
| 357 |
+
nav.run()
|
| 358 |
+
```
|
| 359 |
+
|
| 360 |
+
**Component: disclaimer_banner.py**
|
| 361 |
+
|
| 362 |
+
```python
|
| 363 |
+
# app/components/disclaimer_banner.py
|
| 364 |
+
import streamlit as st
|
| 365 |
+
|
| 366 |
+
DISCLAIMER_TEXT = (
|
| 367 |
+
"This tool provides information for educational purposes only and does not "
|
| 368 |
+
"constitute medical advice. Always consult your healthcare provider before "
|
| 369 |
+
"making decisions about clinical trial participation."
|
| 370 |
+
)
|
| 371 |
+
|
| 372 |
+
def render_disclaimer():
|
| 373 |
+
"""Render medical disclaimer banner. Must appear on every page."""
|
| 374 |
+
st.info(DISCLAIMER_TEXT, icon="ℹ️")
|
| 375 |
+
```
|
| 376 |
+
|
| 377 |
+
---
|
| 378 |
+
|
| 379 |
+
## 3. Parlant Frontend Integration Guide
|
| 380 |
+
|
| 381 |
+
### 3.1 Architecture: Asynchronous Event-Driven Model
|
| 382 |
+
|
| 383 |
+
Parlant uses an **asynchronous, event-driven** conversation model -- NOT traditional request-reply. Both customer and AI agent can post events to a session at any time.
|
| 384 |
+
|
| 385 |
+
**Core concepts:**
|
| 386 |
+
- **Session** = timeline of all events (messages, status updates, tool calls, custom events)
|
| 387 |
+
- **Event** = timestamped item with `offset`, `kind`, `source`, `trace_id`
|
| 388 |
+
- **Long-polling** = client polls for new events with `min_offset` and `wait_for_data` timeout
|
| 389 |
+
|
| 390 |
+
### 3.2 REST API Endpoints
|
| 391 |
+
|
| 392 |
+
| Method | Path | Purpose |
|
| 393 |
+
|--------|------------------------------------------|----------------------------------------|
|
| 394 |
+
| POST | `/agents` | Create agent |
|
| 395 |
+
| POST | `/sessions` | Create session (agent + customer) |
|
| 396 |
+
| GET | `/sessions` | List sessions (filter by agent/customer, paginated) |
|
| 397 |
+
| POST | `/sessions/{id}/events` | Send message/event |
|
| 398 |
+
| GET | `/sessions/{id}/events` | List/poll events (long-polling) |
|
| 399 |
+
| PATCH | `/sessions/{id}/events/{event_id}` | Update event metadata |
|
| 400 |
+
|
| 401 |
+
**Create Event request schema** (`EventCreationParamsDTO`):
|
| 402 |
+
- `kind`: `"message"` | `"custom"` | `"status"`
|
| 403 |
+
- `source`: `"customer"` | `"human_agent"` | `"customer_ui"`
|
| 404 |
+
- `message`: string (for message events)
|
| 405 |
+
- `data`: dict (for custom/status events)
|
| 406 |
+
- `metadata`: dict (optional)
|
| 407 |
+
|
| 408 |
+
**List Events query params:**
|
| 409 |
+
- `min_offset`: int -- only return events after this offset
|
| 410 |
+
- `wait_for_data`: int (seconds) -- long-poll timeout; returns `504` if no new events
|
| 411 |
+
- `source`, `correlation_id`, `trace_id`, `kinds`: optional filters
|
| 412 |
+
|
| 413 |
+
### 3.3 Parlant Client Service
|
| 414 |
+
|
| 415 |
+
```python
|
| 416 |
+
# services/parlant_client.py
|
| 417 |
+
import httpx
|
| 418 |
+
from typing import Optional
|
| 419 |
+
|
| 420 |
+
PARLANT_BASE_URL = "http://localhost:8000"
|
| 421 |
+
|
| 422 |
+
class ParlantClient:
|
| 423 |
+
"""Synchronous wrapper around Parlant REST API for Streamlit."""
|
| 424 |
+
|
| 425 |
+
def __init__(self, base_url: str = PARLANT_BASE_URL):
|
| 426 |
+
self.base_url = base_url
|
| 427 |
+
self.http = httpx.Client(base_url=base_url, timeout=65.0) # > long-poll timeout
|
| 428 |
+
|
| 429 |
+
def create_agent(self, name: str, description: str = "") -> dict:
|
| 430 |
+
resp = self.http.post("/agents", json={"name": name, "description": description})
|
| 431 |
+
resp.raise_for_status()
|
| 432 |
+
return resp.json()
|
| 433 |
+
|
| 434 |
+
def create_session(self, agent_id: str, customer_id: Optional[str] = None) -> dict:
|
| 435 |
+
payload = {"agent_id": agent_id}
|
| 436 |
+
if customer_id:
|
| 437 |
+
payload["customer_id"] = customer_id
|
| 438 |
+
resp = self.http.post("/sessions", json=payload)
|
| 439 |
+
resp.raise_for_status()
|
| 440 |
+
return resp.json()
|
| 441 |
+
|
| 442 |
+
def send_message(self, session_id: str, message: str) -> dict:
|
| 443 |
+
resp = self.http.post(
|
| 444 |
+
f"/sessions/{session_id}/events",
|
| 445 |
+
json={"kind": "message", "source": "customer", "message": message}
|
| 446 |
+
)
|
| 447 |
+
resp.raise_for_status()
|
| 448 |
+
return resp.json()
|
| 449 |
+
|
| 450 |
+
def send_custom_event(self, session_id: str, event_type: str, data: dict) -> dict:
|
| 451 |
+
"""Send custom event (e.g., journey state change, file upload notification)."""
|
| 452 |
+
resp = self.http.post(
|
| 453 |
+
f"/sessions/{session_id}/events",
|
| 454 |
+
json={"kind": "custom", "source": "customer_ui", "data": {"type": event_type, **data}}
|
| 455 |
+
)
|
| 456 |
+
resp.raise_for_status()
|
| 457 |
+
return resp.json()
|
| 458 |
+
|
| 459 |
+
def poll_events(self, session_id: str, min_offset: int = 0, wait_seconds: int = 60) -> list:
|
| 460 |
+
resp = self.http.get(
|
| 461 |
+
f"/sessions/{session_id}/events",
|
| 462 |
+
params={"min_offset": min_offset, "wait_for_data": wait_seconds}
|
| 463 |
+
)
|
| 464 |
+
resp.raise_for_status()
|
| 465 |
+
return resp.json()
|
| 466 |
+
```
|
| 467 |
+
|
| 468 |
+
### 3.4 Event Types Reference
|
| 469 |
+
|
| 470 |
+
| Kind | Source(s) | Description |
|
| 471 |
+
|-----------|--------------------------------|--------------------------------------|
|
| 472 |
+
| message | customer, ai_agent | Text message from participant |
|
| 473 |
+
| status | ai_agent | Agent state: acknowledged, processing, typing, ready, error, cancelled |
|
| 474 |
+
| tool | ai_agent | Tool call result (MedGemma, MCP) |
|
| 475 |
+
| custom | customer_ui, system | App-defined (journey state, uploads) |
|
| 476 |
+
|
| 477 |
+
### 3.5 Journey State Synchronization
|
| 478 |
+
|
| 479 |
+
Map Parlant events to TrialPath Journey states:
|
| 480 |
+
|
| 481 |
+
```python
|
| 482 |
+
# services/state_manager.py (continued)
|
| 483 |
+
|
| 484 |
+
JOURNEY_CUSTOM_EVENTS = {
|
| 485 |
+
"extraction_complete": "PRESCREEN",
|
| 486 |
+
"profile_confirmed": "VALIDATE_TRIALS",
|
| 487 |
+
"trials_evaluated": "GAP_FOLLOWUP",
|
| 488 |
+
"gaps_resolved": "SUMMARY",
|
| 489 |
+
}
|
| 490 |
+
|
| 491 |
+
def handle_parlant_event(event: dict):
|
| 492 |
+
"""Process incoming Parlant event and update Journey state if needed."""
|
| 493 |
+
if event["kind"] == "custom" and event.get("data", {}).get("type") in JOURNEY_CUSTOM_EVENTS:
|
| 494 |
+
new_state = JOURNEY_CUSTOM_EVENTS[event["data"]["type"]]
|
| 495 |
+
advance_journey(new_state)
|
| 496 |
+
elif event["kind"] == "status" and event.get("data") == "error":
|
| 497 |
+
st.session_state["last_error"] = event.get("message", "Unknown error")
|
| 498 |
+
```
|
| 499 |
+
|
| 500 |
+
### 3.6 Parlant Journey System (from DeepWiki `emcie-co/parlant`)
|
| 501 |
+
|
| 502 |
+
Parlant's Journey System defines structured multi-step interaction flows. This is the core mechanism for implementing TrialPath's 5-state patient workflow.
|
| 503 |
+
|
| 504 |
+
**Journey state types:**
|
| 505 |
+
- **Chat State** -- agent converses with customer, guided by state's `action`. Can stay for multiple turns.
|
| 506 |
+
- **Tool State** -- agent calls external tool, result loaded into context. Must be followed by a chat state.
|
| 507 |
+
- **Fork State** -- agent evaluates conditions and branches the flow.
|
| 508 |
+
|
| 509 |
+
**TrialPath Journey definition pattern:**
|
| 510 |
+
|
| 511 |
+
```python
|
| 512 |
+
import parlant as p
|
| 513 |
+
|
| 514 |
+
async def create_trialpath_journey(agent: p.Agent):
|
| 515 |
+
journey = await agent.create_journey(
|
| 516 |
+
title="Clinical Trial Matching",
|
| 517 |
+
conditions=["The patient wants to find matching clinical trials"],
|
| 518 |
+
description="Guide NSCLC patients through clinical trial matching: "
|
| 519 |
+
"document upload, profile extraction, trial search, "
|
| 520 |
+
"eligibility analysis, and gap identification.",
|
| 521 |
+
)
|
| 522 |
+
|
| 523 |
+
# INGEST: Upload and extract
|
| 524 |
+
t1 = await journey.initial_state.transition_to(
|
| 525 |
+
chat_state="Ask patient to upload clinical documents (clinic letters, pathology reports, lab results)"
|
| 526 |
+
)
|
| 527 |
+
|
| 528 |
+
# Tool state: Run MedGemma extraction
|
| 529 |
+
t2a = await t1.target.transition_to(
|
| 530 |
+
condition="Documents uploaded",
|
| 531 |
+
tool_state=extract_patient_profile # MedGemma tool
|
| 532 |
+
)
|
| 533 |
+
# PRESCREEN: Review extracted profile
|
| 534 |
+
t2b = await t2a.target.transition_to(
|
| 535 |
+
chat_state="Present extracted PatientProfile for review and confirmation"
|
| 536 |
+
)
|
| 537 |
+
|
| 538 |
+
# Tool state: Search trials via MCP
|
| 539 |
+
t3a = await t2b.target.transition_to(
|
| 540 |
+
condition="Profile confirmed",
|
| 541 |
+
tool_state=search_clinical_trials # ClinicalTrials MCP tool
|
| 542 |
+
)
|
| 543 |
+
# VALIDATE_TRIALS: Show results with eligibility
|
| 544 |
+
t3b = await t3a.target.transition_to(
|
| 545 |
+
chat_state="Present trial matches with criterion-level eligibility assessment"
|
| 546 |
+
)
|
| 547 |
+
|
| 548 |
+
# GAP_FOLLOWUP: Identify gaps and suggest actions
|
| 549 |
+
t4 = await t3b.target.transition_to(
|
| 550 |
+
condition="Trials evaluated",
|
| 551 |
+
chat_state="Analyze eligibility gaps and suggest next steps "
|
| 552 |
+
"(additional tests, document uploads)"
|
| 553 |
+
)
|
| 554 |
+
|
| 555 |
+
# Loop back if new documents uploaded
|
| 556 |
+
await t4.target.transition_to(
|
| 557 |
+
condition="New documents uploaded for gap resolution",
|
| 558 |
+
state=t2a.target # Back to extraction
|
| 559 |
+
)
|
| 560 |
+
|
| 561 |
+
# SUMMARY: Final report
|
| 562 |
+
t5 = await t4.target.transition_to(
|
| 563 |
+
condition="Gaps resolved or patient ready for summary",
|
| 564 |
+
chat_state="Generate summary report and doctor packet"
|
| 565 |
+
)
|
| 566 |
+
```
|
| 567 |
+
|
| 568 |
+
**Key details (from DeepWiki):**
|
| 569 |
+
- Journeys are activated by `conditions` (observational guidelines matched by `GuidelineMatcher`).
|
| 570 |
+
- Transitions can be **direct** (always taken) or **conditional** (only if condition met).
|
| 571 |
+
- Can transition to existing states (for loops, e.g., gap resolution cycle).
|
| 572 |
+
- `END_JOURNEY` is a special terminal state.
|
| 573 |
+
- Journeys dynamically manage LLM context to include only relevant guidelines at each state.
|
| 574 |
+
|
| 575 |
+
### 3.7 Parlant Guideline System (from DeepWiki `emcie-co/parlant`)
|
| 576 |
+
|
| 577 |
+
Guidelines define behavioral rules for agents. Two types:
|
| 578 |
+
|
| 579 |
+
| Type | Has Action? | Purpose |
|
| 580 |
+
|------|-------------|---------|
|
| 581 |
+
| Observational | No | Track conditions, activate journeys |
|
| 582 |
+
| Actionable | Yes | Drive agent behavior when condition is met |
|
| 583 |
+
|
| 584 |
+
**Journey-scoped vs Global guidelines:**
|
| 585 |
+
- **Global** guidelines apply across all conversations.
|
| 586 |
+
- **Journey-scoped** guidelines are only active when their parent journey is active. Created via `journey.create_guideline()`.
|
| 587 |
+
|
| 588 |
+
**TrialPath guideline examples:**
|
| 589 |
+
|
| 590 |
+
```python
|
| 591 |
+
# Global guideline: always cite evidence
|
| 592 |
+
await agent.create_guideline(
|
| 593 |
+
condition="the agent makes a clinical assessment",
|
| 594 |
+
action="cite the source document, page number, and relevant text span"
|
| 595 |
+
)
|
| 596 |
+
|
| 597 |
+
# Journey-scoped: only during VALIDATE_TRIALS
|
| 598 |
+
await journey.create_guideline(
|
| 599 |
+
condition="a criterion cannot be evaluated due to missing data",
|
| 600 |
+
action="mark it as UNKNOWN and add to the gap list with the specific data needed"
|
| 601 |
+
)
|
| 602 |
+
```
|
| 603 |
+
|
| 604 |
+
**Matching pipeline** (from DeepWiki): GuidelineMatcher uses LLM-based evaluation with multiple batch types (observational, actionable, low-criticality, disambiguation, journey-node-selection) to determine which guidelines apply to the current conversation context.
|
| 605 |
+
|
| 606 |
+
### 3.8 Parlant Tool Integration (from DeepWiki `emcie-co/parlant`)
|
| 607 |
+
|
| 608 |
+
Parlant supports 4 tool service types: `local`, `sdk`/plugin, `openapi`, and `mcp`.
|
| 609 |
+
|
| 610 |
+
**TrialPath will use:**
|
| 611 |
+
- **SDK/Plugin tools** for MedGemma extraction
|
| 612 |
+
- **MCP tools** for ClinicalTrials.gov search
|
| 613 |
+
|
| 614 |
+
**Tool definition with `@p.tool` decorator:**
|
| 615 |
+
|
| 616 |
+
```python
|
| 617 |
+
@p.tool
|
| 618 |
+
async def extract_patient_profile(
|
| 619 |
+
context: p.ToolContext,
|
| 620 |
+
document_urls: list[str],
|
| 621 |
+
) -> p.ToolResult:
|
| 622 |
+
"""Extract patient clinical profile from uploaded documents using MedGemma 4B.
|
| 623 |
+
|
| 624 |
+
Args:
|
| 625 |
+
document_urls: List of URLs/paths to uploaded clinical documents.
|
| 626 |
+
"""
|
| 627 |
+
# Call MedGemma endpoint
|
| 628 |
+
profile = await call_medgemma(document_urls)
|
| 629 |
+
return p.ToolResult(
|
| 630 |
+
data=profile,
|
| 631 |
+
metadata={"source": "MedGemma 4B", "doc_count": len(document_urls)},
|
| 632 |
+
)
|
| 633 |
+
```
|
| 634 |
+
|
| 635 |
+
**Tool execution flow** (from DeepWiki):
|
| 636 |
+
1. GuidelineMatch identifies tools associated with matched guidelines
|
| 637 |
+
2. ToolCaller resolves tool parameters from ServiceRegistry
|
| 638 |
+
3. ToolCallBatcher groups tools for efficient LLM inference
|
| 639 |
+
4. LLM infers tool arguments from conversation context
|
| 640 |
+
5. ToolService.call_tool() executes and returns ToolResult
|
| 641 |
+
6. ToolEventGenerator emits ToolEvent to session
|
| 642 |
+
|
| 643 |
+
**ToolResult structure:**
|
| 644 |
+
- `data` -- visible to agent for further processing
|
| 645 |
+
- `metadata` -- frontend-only info (not used by agent)
|
| 646 |
+
- `control` -- processing options: `mode` (auto/manual), `lifespan` (response/session)
|
| 647 |
+
|
| 648 |
+
### 3.9 Parlant NLP Provider: Gemini (from DeepWiki `emcie-co/parlant`)
|
| 649 |
+
|
| 650 |
+
Parlant natively supports Google Gemini, which aligns with TrialPath's planned use of Gemini 3 Pro.
|
| 651 |
+
|
| 652 |
+
**Configuration:**
|
| 653 |
+
```bash
|
| 654 |
+
# Install with Gemini support
|
| 655 |
+
pip install parlant[gemini]
|
| 656 |
+
|
| 657 |
+
# Set API key
|
| 658 |
+
export GEMINI_API_KEY="your-api-key"
|
| 659 |
+
|
| 660 |
+
# Start server with Gemini backend
|
| 661 |
+
parlant-server --gemini
|
| 662 |
+
```
|
| 663 |
+
|
| 664 |
+
**Supported providers** (from DeepWiki): OpenAI, Anthropic, Azure, AWS Bedrock, Google Gemini, Vertex AI, Together.ai, LiteLLM, Cerebras, DeepSeek, Ollama, Mistral, and more.
|
| 665 |
+
|
| 666 |
+
**Vertex AI alternative** -- for production, can use `pip install parlant[vertex]` with `VERTEX_AI_MODEL=gemini-2.5-pro`.
|
| 667 |
+
|
| 668 |
+
### 3.10 AlphaEngine Processing Pipeline (from DeepWiki `emcie-co/parlant`)
|
| 669 |
+
|
| 670 |
+
This is the complete flow from customer message to agent response. Critical for understanding latency and UI feedback points.
|
| 671 |
+
|
| 672 |
+
**Step-by-step pipeline:**
|
| 673 |
+
|
| 674 |
+
```
|
| 675 |
+
1. EVENT CREATION
|
| 676 |
+
Customer sends message -> POST /sessions/{id}/events
|
| 677 |
+
-> SessionModule creates event, dispatches background processing
|
| 678 |
+
|
| 679 |
+
2. CONTEXT LOADING
|
| 680 |
+
AlphaEngine.process() loads:
|
| 681 |
+
- Session history (interaction events)
|
| 682 |
+
- Agent identity + description
|
| 683 |
+
- Customer info
|
| 684 |
+
- Context variables (per-customer/per-tag/global)
|
| 685 |
+
-> Assembled into EngineContext
|
| 686 |
+
|
| 687 |
+
3. PREPARATION LOOP (while not prepared_to_respond)
|
| 688 |
+
a. GUIDELINE MATCHING
|
| 689 |
+
GuidelineMatcher evaluates guidelines against conversation context
|
| 690 |
+
- Observational guidelines (track conditions)
|
| 691 |
+
- Actionable guidelines (drive behavior)
|
| 692 |
+
- Journey-node guidelines (determine next journey step)
|
| 693 |
+
Uses LLM to score relevance -> GuidelineMatch objects
|
| 694 |
+
|
| 695 |
+
b. TOOL CALLING (if guidelines require tools)
|
| 696 |
+
ToolCaller resolves + executes tools
|
| 697 |
+
- ToolCallBatcher groups for efficient LLM inference
|
| 698 |
+
- LLM infers arguments from context
|
| 699 |
+
- ToolService.call_tool() executes
|
| 700 |
+
- ToolEventGenerator emits ToolEvent to session
|
| 701 |
+
-> Tool results may trigger re-evaluation of guidelines
|
| 702 |
+
|
| 703 |
+
4. PREAMBLE GENERATION (optional)
|
| 704 |
+
Quick acknowledgment for perceived responsiveness
|
| 705 |
+
-> Emitted as early status event ("acknowledged" / "processing")
|
| 706 |
+
|
| 707 |
+
5. MESSAGE COMPOSITION
|
| 708 |
+
Based on agent's CompositionMode:
|
| 709 |
+
- FLUID: MessageGenerator builds prompt, generates via SchematicGenerator
|
| 710 |
+
-> Revision loop with temperature-based retries
|
| 711 |
+
- CANNED_STRICT: Only uses predefined templates
|
| 712 |
+
- CANNED_COMPOSITED: Mimics style of canned responses
|
| 713 |
+
- CANNED_FLUID: Prefers canned but falls back to fluid
|
| 714 |
+
|
| 715 |
+
6. EVENT EMISSION
|
| 716 |
+
Generated message -> emitted as message event
|
| 717 |
+
"ready" status event signals completion
|
| 718 |
+
```
|
| 719 |
+
|
| 720 |
+
**UI feedback mapping for TrialPath:**
|
| 721 |
+
|
| 722 |
+
| Pipeline Step | Parlant Status Event | UI Feedback |
|
| 723 |
+
|---------------|---------------------|-------------|
|
| 724 |
+
| Event created | `acknowledged` | "Message received" indicator |
|
| 725 |
+
| Context loading | `processing` | `st.status("Analyzing your request...")` |
|
| 726 |
+
| Tool calling | `tool` events | `st.status("Searching ClinicalTrials.gov...")` |
|
| 727 |
+
| Message generation | `typing` | Typing indicator animation |
|
| 728 |
+
| Complete | `ready` | Display agent response |
|
| 729 |
+
| Error | `error` | `st.error()` with retry option |
|
| 730 |
+
|
| 731 |
+
### 3.11 Context Variables (from DeepWiki `emcie-co/parlant`)
|
| 732 |
+
|
| 733 |
+
Context variables store dynamic data that agents can reference during conversations. Essential for TrialPath to maintain patient profile state across the journey.
|
| 734 |
+
|
| 735 |
+
**Variable scoping (priority order):**
|
| 736 |
+
1. Customer-specific values (per patient)
|
| 737 |
+
2. Tag-specific values (e.g., per disease type)
|
| 738 |
+
3. Global defaults
|
| 739 |
+
|
| 740 |
+
**TrialPath context variable examples:**
|
| 741 |
+
|
| 742 |
+
```python
|
| 743 |
+
# Create context variables for patient data
|
| 744 |
+
patient_profile_var = await client.context_variables.create(
|
| 745 |
+
name="patient_profile",
|
| 746 |
+
description="Current patient clinical profile extracted from documents",
|
| 747 |
+
)
|
| 748 |
+
|
| 749 |
+
# Set per-customer value
|
| 750 |
+
await client.context_variables.set_value(
|
| 751 |
+
variable_id=patient_profile_var.id,
|
| 752 |
+
key=customer_id, # Per-patient
|
| 753 |
+
value=patient_profile_dict,
|
| 754 |
+
)
|
| 755 |
+
|
| 756 |
+
# Auto-refresh variable via tool (with freshness rules)
|
| 757 |
+
trial_results_var = await client.context_variables.create(
|
| 758 |
+
name="matching_trials",
|
| 759 |
+
description="Current list of matching clinical trials",
|
| 760 |
+
tool_id=search_trials_tool_id,
|
| 761 |
+
freshness_rules="*/10 * * * *", # Refresh every 10 minutes
|
| 762 |
+
)
|
| 763 |
+
```
|
| 764 |
+
|
| 765 |
+
**Key details:**
|
| 766 |
+
- Values are JSON-serializable.
|
| 767 |
+
- Included in PromptBuilder's `add_context_variables` section for LLM context.
|
| 768 |
+
- Can be auto-refreshed via associated tools + cron-based `freshness_rules`.
|
| 769 |
+
- `ContextVariableStore.GLOBAL_KEY` for default values.
|
| 770 |
+
|
| 771 |
+
### 3.12 MCP Tool Service Details (from DeepWiki `emcie-co/parlant`)
|
| 772 |
+
|
| 773 |
+
Parlant has native MCP support via `MCPToolClient`. This is how TrialPath connects to ClinicalTrials.gov.
|
| 774 |
+
|
| 775 |
+
**Registration:**
|
| 776 |
+
|
| 777 |
+
```python
|
| 778 |
+
# Via REST API
|
| 779 |
+
PUT /services/clinicaltrials_mcp
|
| 780 |
+
{
|
| 781 |
+
"kind": "mcp",
|
| 782 |
+
"mcp": {
|
| 783 |
+
"url": "http://localhost:8080"
|
| 784 |
+
}
|
| 785 |
+
}
|
| 786 |
+
```
|
| 787 |
+
|
| 788 |
+
```bash
|
| 789 |
+
# Via CLI
|
| 790 |
+
parlant service create \
|
| 791 |
+
--name clinicaltrials_mcp \
|
| 792 |
+
--kind mcp \
|
| 793 |
+
--url http://localhost:8080
|
| 794 |
+
```
|
| 795 |
+
|
| 796 |
+
**MCPToolClient internals:**
|
| 797 |
+
- Connects via `StreamableHttpTransport` to MCP server's `/mcp` endpoint.
|
| 798 |
+
- `list_tools()` discovers available tools from MCP server.
|
| 799 |
+
- `mcp_tool_to_parlant_tool()` converts MCP tool schemas to Parlant's `Tool` objects.
|
| 800 |
+
- Type mapping: `string`, `integer`, `number`, `boolean`, `date`, `datetime`, `uuid`, `array`, `enum`.
|
| 801 |
+
- `call_tool()` invokes MCP tool, extracts text content from result, wraps in `ToolResult`.
|
| 802 |
+
- Default MCP port: `8181`.
|
| 803 |
+
|
| 804 |
+
**Integration with Guideline System:**
|
| 805 |
+
|
| 806 |
+
```python
|
| 807 |
+
# Associate MCP tool with a guideline
|
| 808 |
+
search_guideline = await agent.create_guideline(
|
| 809 |
+
condition="the patient profile has been confirmed and trial search is needed",
|
| 810 |
+
action="search ClinicalTrials.gov for matching NSCLC trials using the patient's biomarkers and staging",
|
| 811 |
+
tools=[clinicaltrials_search_tool], # MCP tool reference
|
| 812 |
+
)
|
| 813 |
+
```
|
| 814 |
+
|
| 815 |
+
### 3.13 Prompt Construction (from DeepWiki `emcie-co/parlant`)
|
| 816 |
+
|
| 817 |
+
Understanding how Parlant builds LLM prompts is essential for designing effective guidelines and journey states.
|
| 818 |
+
|
| 819 |
+
**PromptBuilder sections (in order):**
|
| 820 |
+
|
| 821 |
+
| Section | Content | TrialPath Relevance |
|
| 822 |
+
|---------|---------|-------------------|
|
| 823 |
+
| General Instructions | Task description, role | Define clinical trial matching context |
|
| 824 |
+
| Agent Identity | Agent name + description | "patient_trial_copilot" identity |
|
| 825 |
+
| Customer Identity | Customer name, session ID | Patient identifier |
|
| 826 |
+
| Context Variables | Dynamic data (JSON) | PatientProfile, SearchAnchors, prior results |
|
| 827 |
+
| Glossary | Domain terms | NSCLC, ECOG, biomarker definitions |
|
| 828 |
+
| Capabilities | What agent can do | Tool descriptions (MedGemma, MCP) |
|
| 829 |
+
| Interaction History | Conversation events | Full chat history with tool results |
|
| 830 |
+
| Guidelines | Matched condition/action pairs | Active behavioral rules for current state |
|
| 831 |
+
| Journey State | Current position in journey | Which step in INGEST->SUMMARY flow |
|
| 832 |
+
| Few-shot Examples | Desired output format | Example eligibility assessments |
|
| 833 |
+
| Staged Tool Events | Pending/completed tool results | MedGemma extraction results, MCP search results |
|
| 834 |
+
|
| 835 |
+
**Context window management:**
|
| 836 |
+
- GuidelineMatcher selectively loads only relevant guidelines and journeys.
|
| 837 |
+
- Journey-scoped guidelines only included when journey is active.
|
| 838 |
+
- Prevents context bloat by pruning low-probability journey guidelines.
|
| 839 |
+
|
| 840 |
+
### 3.14 Parlant Testing Framework (from DeepWiki `emcie-co/parlant`)
|
| 841 |
+
|
| 842 |
+
Parlant provides a dedicated testing framework with NLP-based assertions (LLM-as-a-Judge).
|
| 843 |
+
|
| 844 |
+
**Key test utilities:**
|
| 845 |
+
|
| 846 |
+
| Class | Purpose |
|
| 847 |
+
|-------|---------|
|
| 848 |
+
| `Suite` | Test runner, manages server connection and scenarios |
|
| 849 |
+
| `Session` | Test session context manager |
|
| 850 |
+
| `Response` | Agent response with `.should()` assertion |
|
| 851 |
+
| `InteractionBuilder` | Build conversation history for preloading |
|
| 852 |
+
| `CustomerMessage` / `AgentMessage` | Step types for conversation construction |
|
| 853 |
+
|
| 854 |
+
**TrialPath test examples:**
|
| 855 |
+
|
| 856 |
+
```python
|
| 857 |
+
from parlant.testing import Suite, InteractionBuilder
|
| 858 |
+
from parlant.testing.steps import AgentMessage, CustomerMessage
|
| 859 |
+
|
| 860 |
+
suite = Suite(
|
| 861 |
+
server_url="http://localhost:8800",
|
| 862 |
+
agent_id="patient_trial_copilot"
|
| 863 |
+
)
|
| 864 |
+
|
| 865 |
+
@suite.scenario
|
| 866 |
+
async def test_extraction_journey_step():
|
| 867 |
+
"""Test that agent asks for documents in INGEST state."""
|
| 868 |
+
async with suite.session() as session:
|
| 869 |
+
response = await session.send("I want to find clinical trials for my lung cancer")
|
| 870 |
+
await response.should("ask the patient to upload clinical documents")
|
| 871 |
+
await response.should("mention accepted file types like PDF or images")
|
| 872 |
+
|
| 873 |
+
@suite.scenario
|
| 874 |
+
async def test_gap_analysis_identifies_missing_data():
|
| 875 |
+
"""Test gap analysis identifies unknown biomarkers."""
|
| 876 |
+
async with suite.session() as session:
|
| 877 |
+
# Preload history simulating completed extraction + matching
|
| 878 |
+
history = (
|
| 879 |
+
InteractionBuilder()
|
| 880 |
+
.step(CustomerMessage("Here are my medical documents"))
|
| 881 |
+
.step(AgentMessage("I've extracted your profile. You have NSCLC Stage IIIB, "
|
| 882 |
+
"EGFR positive, but KRAS status is unknown."))
|
| 883 |
+
.step(CustomerMessage("What trials am I eligible for?"))
|
| 884 |
+
.step(AgentMessage("I found 5 trials. For NCT04000005, KRAS status is required "
|
| 885 |
+
"but missing from your records."))
|
| 886 |
+
.build()
|
| 887 |
+
)
|
| 888 |
+
await session.add_events(history)
|
| 889 |
+
|
| 890 |
+
response = await session.send("What should I do about the missing KRAS test?")
|
| 891 |
+
await response.should("suggest getting a KRAS mutation test")
|
| 892 |
+
await response.should("explain which trials require KRAS status")
|
| 893 |
+
|
| 894 |
+
@suite.scenario
|
| 895 |
+
async def test_multi_turn_journey_flow():
|
| 896 |
+
"""Test complete journey flow with unfold()."""
|
| 897 |
+
async with suite.session() as session:
|
| 898 |
+
await session.unfold([
|
| 899 |
+
CustomerMessage("I have NSCLC and want to find trials"),
|
| 900 |
+
AgentMessage(
|
| 901 |
+
text="I'd be happy to help. Please upload your clinical documents.",
|
| 902 |
+
should="ask for document upload",
|
| 903 |
+
),
|
| 904 |
+
CustomerMessage("I've uploaded my pathology report"),
|
| 905 |
+
AgentMessage(
|
| 906 |
+
text="I've extracted your profile...",
|
| 907 |
+
should=["confirm profile extraction", "present key findings"],
|
| 908 |
+
),
|
| 909 |
+
CustomerMessage("That looks correct, please search for trials"),
|
| 910 |
+
AgentMessage(
|
| 911 |
+
text="I found 8 matching trials...",
|
| 912 |
+
should=["present trial matches", "include eligibility assessment"],
|
| 913 |
+
),
|
| 914 |
+
])
|
| 915 |
+
```
|
| 916 |
+
|
| 917 |
+
**Running tests:**
|
| 918 |
+
```bash
|
| 919 |
+
parlant-test tests/ # Run all test files
|
| 920 |
+
parlant-test tests/ -k gap # Filter by pattern
|
| 921 |
+
parlant-test tests/ -n 4 # Run in parallel
|
| 922 |
+
```
|
| 923 |
+
|
| 924 |
+
### 3.15 Canned Response System (from DeepWiki `emcie-co/parlant`)
|
| 925 |
+
|
| 926 |
+
Canned responses provide consistent, template-based messaging. Useful for TrialPath's structured outputs.
|
| 927 |
+
|
| 928 |
+
**CompositionMode options:**
|
| 929 |
+
|
| 930 |
+
| Mode | Behavior | TrialPath Use |
|
| 931 |
+
|------|----------|--------------|
|
| 932 |
+
| `FLUID` | Free-form LLM generation | General conversation, gap explanations |
|
| 933 |
+
| `CANNED_STRICT` | Only predefined templates | Disclaimer text, safety warnings |
|
| 934 |
+
| `CANNED_COMPOSITED` | Mimics canned style | Eligibility summaries |
|
| 935 |
+
| `CANNED_FLUID` | Prefers canned, falls back to fluid | Standard responses with flexibility |
|
| 936 |
+
|
| 937 |
+
**Journey-state-scoped canned responses:**
|
| 938 |
+
|
| 939 |
+
```python
|
| 940 |
+
# Canned response only active during SUMMARY state
|
| 941 |
+
summary_template = await journey.create_canned_response(
|
| 942 |
+
value="Based on your clinical profile, you match {{match_count}} trials. "
|
| 943 |
+
"{{eligible_count}} are likely eligible, {{borderline_count}} are borderline, "
|
| 944 |
+
"and {{gap_count}} have unresolved gaps. "
|
| 945 |
+
"See the attached doctor packet for full details.",
|
| 946 |
+
fields=["match_count", "eligible_count", "borderline_count", "gap_count"],
|
| 947 |
+
)
|
| 948 |
+
```
|
| 949 |
+
|
| 950 |
+
**Template features:**
|
| 951 |
+
- Jinja2 syntax for dynamic fields (e.g., `{{std.customer.name}}`).
|
| 952 |
+
- Fields auto-populated from tool results and context variables.
|
| 953 |
+
- Relevance-scored matching via LLM when multiple templates exist.
|
| 954 |
+
- `signals` and `metadata` for additional template categorization.
|
| 955 |
+
|
| 956 |
+
---
|
| 957 |
+
|
| 958 |
+
## 4. UI Component Design per Journey State
|
| 959 |
+
|
| 960 |
+
### 4.1 INGEST State -- Upload Page
|
| 961 |
+
|
| 962 |
+
```
|
| 963 |
+
+------------------------------------------+
|
| 964 |
+
| [i] This tool is for information only... |
|
| 965 |
+
| [Sidebar: Journey Progress] |
|
| 966 |
+
| |
|
| 967 |
+
| Upload Clinical Documents |
|
| 968 |
+
| +---------------------------------+ |
|
| 969 |
+
| | Drag & drop or browse | |
|
| 970 |
+
| | Accepted: PDF, PNG, JPG | |
|
| 971 |
+
| +---------------------------------+ |
|
| 972 |
+
| |
|
| 973 |
+
| Uploaded Files: |
|
| 974 |
+
| - clinic_letter.pdf (245 KB) [x] |
|
| 975 |
+
| - pathology_report.pdf (1.2 MB) [x] |
|
| 976 |
+
| - lab_results.png (890 KB) [x] |
|
| 977 |
+
| |
|
| 978 |
+
| [Start Extraction] |
|
| 979 |
+
| |
|
| 980 |
+
| st.status: "Extracting clinical data..." |
|
| 981 |
+
| - Reading documents... |
|
| 982 |
+
| - Running MedGemma 4B... |
|
| 983 |
+
| - Building patient profile... |
|
| 984 |
+
+------------------------------------------+
|
| 985 |
+
```
|
| 986 |
+
|
| 987 |
+
**Key components:** `file_uploader`, `progress_tracker`
|
| 988 |
+
|
| 989 |
+
### 4.2 PRESCREEN State -- Profile Review Page
|
| 990 |
+
|
| 991 |
+
```
|
| 992 |
+
+------------------------------------------+
|
| 993 |
+
| [i] This tool is for information only... |
|
| 994 |
+
| [Sidebar: Journey Progress] |
|
| 995 |
+
| |
|
| 996 |
+
| Patient Clinical Profile |
|
| 997 |
+
| +--------------------------------------+ |
|
| 998 |
+
| | Demographics: Female, 62, ECOG 1 | |
|
| 999 |
+
| | Diagnosis: NSCLC Stage IIIB | |
|
| 1000 |
+
| | Histology: Adenocarcinoma | |
|
| 1001 |
+
| | Biomarkers: | |
|
| 1002 |
+
| | EGFR: Positive (exon 19 del) | |
|
| 1003 |
+
| | ALK: Negative | |
|
| 1004 |
+
| | PD-L1: 45% | |
|
| 1005 |
+
| | Prior Treatment: | |
|
| 1006 |
+
| | Carboplatin+Pemetrexed (2 cycles) | |
|
| 1007 |
+
| | Unknowns: | |
|
| 1008 |
+
| | [!] KRAS status not found | |
|
| 1009 |
+
| | [!] Brain MRI not available | |
|
| 1010 |
+
| +--------------------------------------+ |
|
| 1011 |
+
| |
|
| 1012 |
+
| [Edit Profile] [Confirm & Search Trials] |
|
| 1013 |
+
| |
|
| 1014 |
+
| Searching ClinicalTrials.gov... |
|
| 1015 |
+
| Step 1: Initial query -> 47 results |
|
| 1016 |
+
| Refining: adding Phase 3 filter... |
|
| 1017 |
+
| Step 2: Refined query -> 12 results |
|
| 1018 |
+
| Shortlisting top candidates... |
|
| 1019 |
+
+------------------------------------------+
|
| 1020 |
+
```
|
| 1021 |
+
|
| 1022 |
+
**Key components:** `profile_card`, `search_process`, `progress_tracker`
|
| 1023 |
+
|
| 1024 |
+
### 4.3 VALIDATE_TRIALS State -- Trial Matching Page
|
| 1025 |
+
|
| 1026 |
+
```
|
| 1027 |
+
+------------------------------------------+
|
| 1028 |
+
| [i] This tool is for information only... |
|
| 1029 |
+
| [Sidebar: Journey Progress] |
|
| 1030 |
+
| |
|
| 1031 |
+
| Matching Trials (8 found) |
|
| 1032 |
+
| |
|
| 1033 |
+
| Search Process: |
|
| 1034 |
+
| Step 1: NSCLC + Stage IV + DE -> 47 |
|
| 1035 |
+
| -> Refined: added Phase 3 |
|
| 1036 |
+
| Step 2: + Phase 3 -> 12 results |
|
| 1037 |
+
| -> Shortlisted: reading summaries |
|
| 1038 |
+
| Step 3: 5 trials selected for review |
|
| 1039 |
+
| [Show/Hide Search Details] |
|
| 1040 |
+
| |
|
| 1041 |
+
| +--------------------------------------+ |
|
| 1042 |
+
| | NCT04000001 - KEYNOTE-999 | |
|
| 1043 |
+
| | Pembrolizumab + Chemo for NSCLC | |
|
| 1044 |
+
| | Overall: LIKELY ELIGIBLE | |
|
| 1045 |
+
| | | |
|
| 1046 |
+
| | Criteria: | |
|
| 1047 |
+
| | [G] NSCLC confirmed | |
|
| 1048 |
+
| | [G] ECOG 0-1 | |
|
| 1049 |
+
| | [Y] PD-L1 >= 50% (yours: 45%) | |
|
| 1050 |
+
| | [R] No prior immunotherapy | |
|
| 1051 |
+
| | [?] Brain mets (unknown) | |
|
| 1052 |
+
| +--------------------------------------+ |
|
| 1053 |
+
| | NCT04000002 - ... | |
|
| 1054 |
+
| +--------------------------------------+ |
|
| 1055 |
+
| |
|
| 1056 |
+
| [G]=Met [Y]=Borderline [R]=Not Met |
|
| 1057 |
+
| [?]=Unknown/Needs Info |
|
| 1058 |
+
+------------------------------------------+
|
| 1059 |
+
```
|
| 1060 |
+
|
| 1061 |
+
**Key components:** `trial_card` (traffic-light display), `search_process`, `progress_tracker`
|
| 1062 |
+
|
| 1063 |
+
### 4.4 GAP_FOLLOWUP State -- Gap Analysis Page
|
| 1064 |
+
|
| 1065 |
+
```
|
| 1066 |
+
+------------------------------------------+
|
| 1067 |
+
| [i] This tool is for information only... |
|
| 1068 |
+
| [Sidebar: Journey Progress] |
|
| 1069 |
+
| |
|
| 1070 |
+
| Gap Analysis & Next Steps |
|
| 1071 |
+
| |
|
| 1072 |
+
| +--------------------------------------+ |
|
| 1073 |
+
| | GAP: Brain MRI results needed | |
|
| 1074 |
+
| | Impact: Would resolve [?] criteria | |
|
| 1075 |
+
| | for NCT04000001, NCT04000003 | |
|
| 1076 |
+
| | Action: Upload brain MRI report | |
|
| 1077 |
+
| | [Upload Document] | |
|
| 1078 |
+
| +--------------------------------------+ |
|
| 1079 |
+
| | GAP: KRAS mutation status | |
|
| 1080 |
+
| | Impact: Required for NCT04000005 | |
|
| 1081 |
+
| | Action: Request test from oncologist | |
|
| 1082 |
+
| +--------------------------------------+ |
|
| 1083 |
+
| |
|
| 1084 |
+
| [Re-run Matching with New Data] |
|
| 1085 |
+
| [Proceed to Summary] |
|
| 1086 |
+
+------------------------------------------+
|
| 1087 |
+
```
|
| 1088 |
+
|
| 1089 |
+
**Key components:** `gap_card`, `file_uploader` (for additional docs), `progress_tracker`
|
| 1090 |
+
|
| 1091 |
+
### 4.5 SUMMARY State -- Summary & Export Page
|
| 1092 |
+
|
| 1093 |
+
```
|
| 1094 |
+
+------------------------------------------+
|
| 1095 |
+
| [i] This tool is for information only... |
|
| 1096 |
+
| [Sidebar: Journey Progress] |
|
| 1097 |
+
| |
|
| 1098 |
+
| Clinical Trial Matching Summary |
|
| 1099 |
+
| |
|
| 1100 |
+
| Eligible Trials: 3 |
|
| 1101 |
+
| Borderline Trials: 2 |
|
| 1102 |
+
| Not Eligible: 3 |
|
| 1103 |
+
| Unresolved Gaps: 1 |
|
| 1104 |
+
| |
|
| 1105 |
+
| [Download Doctor Packet (JSON/Markdown)] |
|
| 1106 |
+
| [Start New Session] |
|
| 1107 |
+
| |
|
| 1108 |
+
| Chat with AI Copilot: |
|
| 1109 |
+
| +--------------------------------------+ |
|
| 1110 |
+
| | AI: Based on your profile... | |
|
| 1111 |
+
| | You: What about trial NCT...? | |
|
| 1112 |
+
| | AI: That trial requires... | |
|
| 1113 |
+
| +--------------------------------------+ |
|
| 1114 |
+
| | [Type a message...] [Send] | |
|
| 1115 |
+
| +--------------------------------------+ |
|
| 1116 |
+
+------------------------------------------+
|
| 1117 |
+
```
|
| 1118 |
+
|
| 1119 |
+
**Key components:** `chat_panel`, `progress_tracker`
|
| 1120 |
+
|
| 1121 |
+
---
|
| 1122 |
+
|
| 1123 |
+
## 5. TDD Test Cases
|
| 1124 |
+
|
| 1125 |
+
### 5.1 Upload Page Tests
|
| 1126 |
+
|
| 1127 |
+
| Test Case | Input | Expected Output | Boundary |
|
| 1128 |
+
|-----------|-------|-----------------|----------|
|
| 1129 |
+
| No files uploaded | Empty uploader | "Start Extraction" button disabled | N/A |
|
| 1130 |
+
| Single PDF upload | 1 PDF file | File listed, extraction button enabled | N/A |
|
| 1131 |
+
| Multiple files | 3 PDF + 1 PNG | All 4 files listed with sizes | N/A |
|
| 1132 |
+
| Invalid file type | 1 .docx file | File rejected, error message shown | File type filter |
|
| 1133 |
+
| Large file | 250 MB PDF | Error or warning per `maxUploadSize` | Size limit |
|
| 1134 |
+
| Extraction triggered | Click "Start Extraction" | `st.status` shows running, Parlant event sent | N/A |
|
| 1135 |
+
| Extraction completes | MedGemma returns profile | Journey advances to PRESCREEN, profile in session_state | State transition |
|
| 1136 |
+
| Extraction fails | MedGemma error | `st.status` shows error state, retry option | Error handling |
|
| 1137 |
+
|
| 1138 |
+
### 5.2 Profile Review Page Tests
|
| 1139 |
+
|
| 1140 |
+
| Test Case | Input | Expected Output | Boundary |
|
| 1141 |
+
|-----------|-------|-----------------|----------|
|
| 1142 |
+
| Profile display | PatientProfile in session_state | All fields rendered correctly | N/A |
|
| 1143 |
+
| Unknown fields highlighted | Profile with unknowns list | Unknowns shown with warning icon | N/A |
|
| 1144 |
+
| Edit profile | Click Edit, modify ECOG | session_state updated, confirmation shown | N/A |
|
| 1145 |
+
| Confirm profile | Click "Confirm & Search" | Journey advances to VALIDATE_TRIALS | State transition |
|
| 1146 |
+
| Empty profile | No profile in session_state | Redirect to Upload page | Guard clause |
|
| 1147 |
+
| Biomarker display | Complex biomarker data | All biomarkers with values and methods | Data richness |
|
| 1148 |
+
|
| 1149 |
+
### 5.3 Trial Matching Page Tests
|
| 1150 |
+
|
| 1151 |
+
| Test Case | Input | Expected Output | Boundary |
|
| 1152 |
+
|-----------|-------|-----------------|----------|
|
| 1153 |
+
| Trials loading | Matching in progress | `st.spinner` or `st.status` shown | N/A |
|
| 1154 |
+
| Trials displayed | 8 TrialCandidates | 8 trial cards with traffic-light criteria | N/A |
|
| 1155 |
+
| Green criterion | Criterion met with evidence | Green indicator, evidence citation | N/A |
|
| 1156 |
+
| Yellow criterion | Borderline match | Yellow indicator, explanation | N/A |
|
| 1157 |
+
| Red criterion | Criterion not met | Red indicator, specific reason | N/A |
|
| 1158 |
+
| Unknown criterion | Missing data | Question mark, linked to gap | N/A |
|
| 1159 |
+
| Zero trials | No matches found | Informative message, suggest broadening | Empty state |
|
| 1160 |
+
| Many trials | 50+ results | Pagination or scroll, performance ok | Scale |
|
| 1161 |
+
| Search process displayed | SearchLog with 3 steps | 3 step entries shown with query params and result counts | N/A |
|
| 1162 |
+
| Refinement visible | >50 initial results refined to 12 | Shows refinement action and reason | Iterative loop |
|
| 1163 |
+
| Relaxation visible | 0 initial results relaxed to 5 | Shows relaxation action and reason | Iterative loop |
|
| 1164 |
+
|
| 1165 |
+
### 5.4 Gap Analysis Page Tests
|
| 1166 |
+
|
| 1167 |
+
| Test Case | Input | Expected Output | Boundary |
|
| 1168 |
+
|-----------|-------|-----------------|----------|
|
| 1169 |
+
| Gaps identified | 3 gaps in ledger | 3 gap cards with actions | N/A |
|
| 1170 |
+
| Upload resolves gap | Upload brain MRI report | Gap card updates, re-match option | Iterative flow |
|
| 1171 |
+
| No gaps | All criteria resolved | Message: "No gaps", proceed to summary | Happy path |
|
| 1172 |
+
| Gap impacts multiple trials | 1 gap affects 3 trials | Gap card lists all 3 affected trials | Cross-reference |
|
| 1173 |
+
| Re-run matching | Click re-run after upload | New extraction + matching cycle | Loop back |
|
| 1174 |
+
|
| 1175 |
+
### 5.5 Summary Page Tests
|
| 1176 |
+
|
| 1177 |
+
| Test Case | Input | Expected Output | Boundary |
|
| 1178 |
+
|-----------|-------|-----------------|----------|
|
| 1179 |
+
| Summary statistics | Complete ledger | Correct counts per category | N/A |
|
| 1180 |
+
| Download doctor packet | Click download | JSON + Markdown files downloadable via st.download_button | N/A |
|
| 1181 |
+
| Chat interaction | Send message | Message appears, agent responds | N/A |
|
| 1182 |
+
| New session | Click "Start New" | State cleared, redirect to Upload | State reset |
|
| 1183 |
+
|
| 1184 |
+
### 5.6 Disclaimer Tests
|
| 1185 |
+
|
| 1186 |
+
| Test Case | Input | Expected Output | Boundary |
|
| 1187 |
+
|-----------|-------|-----------------|----------|
|
| 1188 |
+
| Disclaimer on upload page | Navigate to Upload | Info banner with disclaimer text visible | N/A |
|
| 1189 |
+
| Disclaimer on profile page | Navigate to Profile Review | Info banner with disclaimer text visible | N/A |
|
| 1190 |
+
| Disclaimer on matching page | Navigate to Trial Matching | Info banner with disclaimer text visible | N/A |
|
| 1191 |
+
| Disclaimer on gap page | Navigate to Gap Analysis | Info banner with disclaimer text visible | N/A |
|
| 1192 |
+
| Disclaimer on summary page | Navigate to Summary | Info banner with disclaimer text visible | N/A |
|
| 1193 |
+
| Disclaimer text content | Any page | Contains "information only" and "not medical advice" | Exact wording |
|
| 1194 |
+
|
| 1195 |
+
---
|
| 1196 |
+
|
| 1197 |
+
## 6. Streamlit AppTest Testing Strategy
|
| 1198 |
+
|
| 1199 |
+
### 6.1 Test Setup Pattern
|
| 1200 |
+
|
| 1201 |
+
```python
|
| 1202 |
+
# tests/test_upload_page.py
|
| 1203 |
+
import pytest
|
| 1204 |
+
from streamlit.testing.v1 import AppTest
|
| 1205 |
+
|
| 1206 |
+
@pytest.fixture
|
| 1207 |
+
def upload_app():
|
| 1208 |
+
"""Create AppTest instance for upload page."""
|
| 1209 |
+
at = AppTest.from_file("pages/1_upload.py")
|
| 1210 |
+
# Initialize required session state
|
| 1211 |
+
at.session_state["journey_state"] = "INGEST"
|
| 1212 |
+
at.session_state["parlant_session_id"] = "test-session-123"
|
| 1213 |
+
at.session_state["uploaded_files"] = []
|
| 1214 |
+
return at.run()
|
| 1215 |
+
|
| 1216 |
+
def test_initial_state(upload_app):
|
| 1217 |
+
"""Upload page shows uploader and disabled extraction button."""
|
| 1218 |
+
at = upload_app
|
| 1219 |
+
# Check file uploader exists
|
| 1220 |
+
assert len(at.file_uploader) > 0
|
| 1221 |
+
# Check no error state
|
| 1222 |
+
assert len(at.exception) == 0
|
| 1223 |
+
|
| 1224 |
+
def test_extraction_button_disabled_without_files(upload_app):
|
| 1225 |
+
"""Extraction button should be disabled when no files uploaded."""
|
| 1226 |
+
at = upload_app
|
| 1227 |
+
# Button should exist but extraction should not proceed without files
|
| 1228 |
+
assert at.button[0].disabled or at.session_state.get("uploaded_files") == []
|
| 1229 |
+
```
|
| 1230 |
+
|
| 1231 |
+
### 6.2 Widget Interaction Patterns
|
| 1232 |
+
|
| 1233 |
+
```python
|
| 1234 |
+
def test_text_input_profile_edit():
|
| 1235 |
+
"""Test editing patient profile fields via text input."""
|
| 1236 |
+
at = AppTest.from_file("pages/2_profile_review.py")
|
| 1237 |
+
at.session_state["journey_state"] = "PRESCREEN"
|
| 1238 |
+
at.session_state["patient_profile"] = {
|
| 1239 |
+
"demographics": {"age": 62, "sex": "Female"},
|
| 1240 |
+
"diagnosis": {"stage": "IIIB", "histology": "Adenocarcinoma"},
|
| 1241 |
+
}
|
| 1242 |
+
at = at.run()
|
| 1243 |
+
|
| 1244 |
+
# Simulate editing a field
|
| 1245 |
+
if len(at.text_input) > 0:
|
| 1246 |
+
at.text_input[0].input("IIIA").run()
|
| 1247 |
+
# Assert profile updated in session state
|
| 1248 |
+
|
| 1249 |
+
def test_button_click_advances_journey():
|
| 1250 |
+
"""Clicking confirm button advances journey to next state."""
|
| 1251 |
+
at = AppTest.from_file("pages/2_profile_review.py")
|
| 1252 |
+
at.session_state["journey_state"] = "PRESCREEN"
|
| 1253 |
+
at.session_state["patient_profile"] = {"demographics": {"age": 62}}
|
| 1254 |
+
at = at.run()
|
| 1255 |
+
|
| 1256 |
+
# Find and click confirm button
|
| 1257 |
+
confirm_buttons = [b for b in at.button if "Confirm" in str(b.label)]
|
| 1258 |
+
if confirm_buttons:
|
| 1259 |
+
confirm_buttons[0].click()
|
| 1260 |
+
at = at.run()
|
| 1261 |
+
assert at.session_state["journey_state"] == "VALIDATE_TRIALS"
|
| 1262 |
+
```
|
| 1263 |
+
|
| 1264 |
+
### 6.3 Page Navigation Test
|
| 1265 |
+
|
| 1266 |
+
```python
|
| 1267 |
+
def test_guard_redirect_without_profile():
|
| 1268 |
+
"""Profile review page redirects to upload if no profile exists."""
|
| 1269 |
+
at = AppTest.from_file("pages/2_profile_review.py")
|
| 1270 |
+
at.session_state["journey_state"] = "PRESCREEN"
|
| 1271 |
+
at.session_state["patient_profile"] = None # No profile
|
| 1272 |
+
at = at.run()
|
| 1273 |
+
|
| 1274 |
+
# Should show warning or error, not crash
|
| 1275 |
+
assert len(at.exception) == 0
|
| 1276 |
+
# Could check for warning message
|
| 1277 |
+
warnings = [m for m in at.warning if "upload" in str(m.value).lower()]
|
| 1278 |
+
assert len(warnings) > 0 or at.session_state["journey_state"] == "INGEST"
|
| 1279 |
+
```
|
| 1280 |
+
|
| 1281 |
+
### 6.4 Session State Test
|
| 1282 |
+
|
| 1283 |
+
```python
|
| 1284 |
+
def test_session_state_initialization():
|
| 1285 |
+
"""All session state keys should be initialized on first run."""
|
| 1286 |
+
at = AppTest.from_file("app.py").run()
|
| 1287 |
+
|
| 1288 |
+
required_keys = [
|
| 1289 |
+
"journey_state", "parlant_session_id", "patient_profile",
|
| 1290 |
+
"uploaded_files", "trial_candidates", "eligibility_ledger"
|
| 1291 |
+
]
|
| 1292 |
+
for key in required_keys:
|
| 1293 |
+
assert key in at.session_state, f"Missing session state key: {key}"
|
| 1294 |
+
|
| 1295 |
+
def test_session_state_persists_across_reruns():
|
| 1296 |
+
"""Session state values persist across multiple reruns."""
|
| 1297 |
+
at = AppTest.from_file("app.py").run()
|
| 1298 |
+
at.session_state["journey_state"] = "PRESCREEN"
|
| 1299 |
+
at = at.run()
|
| 1300 |
+
assert at.session_state["journey_state"] == "PRESCREEN"
|
| 1301 |
+
```
|
| 1302 |
+
|
| 1303 |
+
### 6.5 Component Rendering Tests
|
| 1304 |
+
|
| 1305 |
+
```python
|
| 1306 |
+
def test_trial_card_traffic_light_rendering():
|
| 1307 |
+
"""Trial card displays correct traffic light colors for criteria."""
|
| 1308 |
+
at = AppTest.from_file("pages/3_trial_matching.py")
|
| 1309 |
+
at.session_state["journey_state"] = "VALIDATE_TRIALS"
|
| 1310 |
+
at.session_state["trial_candidates"] = [
|
| 1311 |
+
{
|
| 1312 |
+
"nct_id": "NCT04000001",
|
| 1313 |
+
"title": "Test Trial",
|
| 1314 |
+
"criteria_results": [
|
| 1315 |
+
{"criterion": "NSCLC", "status": "MET", "evidence": "pathology report p.1"},
|
| 1316 |
+
{"criterion": "ECOG 0-1", "status": "MET", "evidence": "clinic letter"},
|
| 1317 |
+
{"criterion": "No prior IO", "status": "NOT_MET", "evidence": "treatment history"},
|
| 1318 |
+
{"criterion": "Brain mets", "status": "UNKNOWN", "evidence": None},
|
| 1319 |
+
]
|
| 1320 |
+
}
|
| 1321 |
+
]
|
| 1322 |
+
at = at.run()
|
| 1323 |
+
|
| 1324 |
+
# Check that trial card content is rendered
|
| 1325 |
+
assert len(at.exception) == 0
|
| 1326 |
+
# Check for presence of trial ID in rendered markdown
|
| 1327 |
+
markdown_texts = [str(m.value) for m in at.markdown]
|
| 1328 |
+
assert any("NCT04000001" in text for text in markdown_texts)
|
| 1329 |
+
```
|
| 1330 |
+
|
| 1331 |
+
### 6.6 Error Handling Tests
|
| 1332 |
+
|
| 1333 |
+
```python
|
| 1334 |
+
def test_parlant_connection_error_handling():
|
| 1335 |
+
"""App should handle Parlant server unavailability gracefully."""
|
| 1336 |
+
at = AppTest.from_file("app.py")
|
| 1337 |
+
at.session_state["parlant_session_id"] = None # Simulate no connection
|
| 1338 |
+
at = at.run()
|
| 1339 |
+
|
| 1340 |
+
# Should not crash
|
| 1341 |
+
assert len(at.exception) == 0
|
| 1342 |
+
|
| 1343 |
+
def test_extraction_error_shows_retry():
|
| 1344 |
+
"""When extraction fails, user sees error status and retry option."""
|
| 1345 |
+
at = AppTest.from_file("pages/1_upload.py")
|
| 1346 |
+
at.session_state["journey_state"] = "INGEST"
|
| 1347 |
+
at.session_state["extraction_error"] = "MedGemma timeout"
|
| 1348 |
+
at = at.run()
|
| 1349 |
+
|
| 1350 |
+
# Should show error message
|
| 1351 |
+
assert len(at.exception) == 0
|
| 1352 |
+
error_msgs = [str(e.value) for e in at.error]
|
| 1353 |
+
assert len(error_msgs) > 0 or at.session_state.get("extraction_error") is not None
|
| 1354 |
+
```
|
| 1355 |
+
|
| 1356 |
+
### 6.7 Search Process Component Tests
|
| 1357 |
+
|
| 1358 |
+
```python
|
| 1359 |
+
# tests/test_components.py (addition)
|
| 1360 |
+
|
| 1361 |
+
class TestSearchProcessComponent:
|
| 1362 |
+
"""Test search process visualization component."""
|
| 1363 |
+
|
| 1364 |
+
def test_renders_search_steps(self):
|
| 1365 |
+
"""Search process should display all refinement steps."""
|
| 1366 |
+
at = AppTest.from_file("app/components/search_process.py")
|
| 1367 |
+
at.session_state["search_log"] = {
|
| 1368 |
+
"steps": [
|
| 1369 |
+
{"step": 1, "query": {"condition": "NSCLC", "location": "DE"}, "found": 47, "action": "refine", "reason": "Too many results, adding phase filter"},
|
| 1370 |
+
{"step": 2, "query": {"condition": "NSCLC", "location": "DE", "phase": "Phase 3"}, "found": 12, "action": "shortlist", "reason": "Right size for detailed review"},
|
| 1371 |
+
],
|
| 1372 |
+
"final_shortlist_nct_ids": ["NCT001", "NCT002", "NCT003", "NCT004", "NCT005"],
|
| 1373 |
+
}
|
| 1374 |
+
at.run()
|
| 1375 |
+
# Verify steps are displayed
|
| 1376 |
+
assert "47" in at.text[0].value # First step result count
|
| 1377 |
+
assert "12" in at.text[1].value # Second step result count
|
| 1378 |
+
assert "Phase 3" in at.text[0].value or "Phase 3" in at.text[1].value
|
| 1379 |
+
|
| 1380 |
+
def test_empty_search_log(self):
|
| 1381 |
+
"""Should handle missing search log gracefully."""
|
| 1382 |
+
at = AppTest.from_file("app/components/search_process.py")
|
| 1383 |
+
at.run()
|
| 1384 |
+
# Should not crash, show placeholder
|
| 1385 |
+
assert not at.exception
|
| 1386 |
+
|
| 1387 |
+
def test_collapsible_details(self):
|
| 1388 |
+
"""Search details should be in an expander for clean UI."""
|
| 1389 |
+
at = AppTest.from_file("app/components/search_process.py")
|
| 1390 |
+
at.session_state["search_log"] = {
|
| 1391 |
+
"steps": [{"step": 1, "query": {}, "found": 10, "action": "shortlist", "reason": "OK"}],
|
| 1392 |
+
}
|
| 1393 |
+
at.run()
|
| 1394 |
+
# Verify expander exists for search details
|
| 1395 |
+
assert len(at.expander) >= 1
|
| 1396 |
+
```
|
| 1397 |
+
|
| 1398 |
+
### 6.8 Disclaimer Component Tests
|
| 1399 |
+
|
| 1400 |
+
```python
|
| 1401 |
+
# tests/test_components.py (addition)
|
| 1402 |
+
|
| 1403 |
+
class TestDisclaimerBanner:
|
| 1404 |
+
"""Test medical disclaimer banner appears correctly."""
|
| 1405 |
+
|
| 1406 |
+
def test_disclaimer_renders(self):
|
| 1407 |
+
"""Disclaimer banner should render on every page."""
|
| 1408 |
+
at = AppTest.from_file("app/components/disclaimer_banner.py")
|
| 1409 |
+
at.run()
|
| 1410 |
+
assert len(at.info) >= 1
|
| 1411 |
+
assert "information" in at.info[0].value.lower()
|
| 1412 |
+
assert "medical advice" in at.info[0].value.lower()
|
| 1413 |
+
|
| 1414 |
+
def test_disclaimer_in_upload_page(self):
|
| 1415 |
+
"""Upload page should include disclaimer."""
|
| 1416 |
+
at = AppTest.from_file("app/pages/1_upload.py")
|
| 1417 |
+
at.run()
|
| 1418 |
+
info_texts = [i.value.lower() for i in at.info]
|
| 1419 |
+
assert any("information" in t and "medical" in t for t in info_texts)
|
| 1420 |
+
```
|
| 1421 |
+
|
| 1422 |
+
### 6.9 AppTest Limitations
|
| 1423 |
+
|
| 1424 |
+
- `AppTest` does not support testing `st.file_uploader` file content directly (mock at service layer instead).
|
| 1425 |
+
- Not yet compatible with `st.navigation`/`st.Page` multipage (test individual pages via `from_file`).
|
| 1426 |
+
- No browser rendering -- tests run headless, pure Python.
|
| 1427 |
+
- Must call `.run()` after every interaction to see updated state.
|
| 1428 |
+
|
| 1429 |
+
---
|
| 1430 |
+
|
| 1431 |
+
## 7. Appendix: API Reference
|
| 1432 |
+
|
| 1433 |
+
### 7.1 Streamlit Key APIs
|
| 1434 |
+
|
| 1435 |
+
| API | Purpose | Notes |
|
| 1436 |
+
|-----|---------|-------|
|
| 1437 |
+
| `st.navigation(pages, position)` | Define multipage app | Returns current page, must call `.run()` |
|
| 1438 |
+
| `st.Page(page, title, icon, url_path)` | Define a page | `page` = filepath or callable |
|
| 1439 |
+
| `st.switch_page(page)` | Programmatic navigation | Stops current page execution |
|
| 1440 |
+
| `st.page_link(page, label, icon)` | Clickable nav link | Non-blocking |
|
| 1441 |
+
| `st.file_uploader(label, type, accept_multiple_files, key)` | File upload widget | Returns `UploadedFile` (extends `BytesIO`) |
|
| 1442 |
+
| `st.session_state` | Persistent key-value store | Survives reruns, per-session |
|
| 1443 |
+
| `st.status(label, expanded, state)` | Collapsible status container | Context manager, auto-completes |
|
| 1444 |
+
| `st.spinner(text, show_time)` | Loading spinner | Context manager |
|
| 1445 |
+
| `st.progress(value, text)` | Progress bar | 0-100 int or 0.0-1.0 float |
|
| 1446 |
+
| `st.toast(body, icon, duration)` | Transient notification | Top-right corner |
|
| 1447 |
+
| `st.write_stream(generator)` | Streaming text output | Typewriter effect for strings |
|
| 1448 |
+
| `@st.fragment(run_every=N)` | Partial rerun decorator | Isolated from full app rerun |
|
| 1449 |
+
| `st.rerun(scope)` | Trigger rerun | `"app"` or `"fragment"` |
|
| 1450 |
+
| `st.chat_message(name)` | Chat bubble | `"user"`, `"assistant"`, or custom |
|
| 1451 |
+
| `st.chat_input(placeholder)` | Chat text input | Fixed at bottom of container |
|
| 1452 |
+
| `AppTest.from_file(path)` | Create test instance | `.run()` to execute |
|
| 1453 |
+
| `AppTest.from_string(code)` | Test from string | Quick inline tests |
|
| 1454 |
+
| `at.button[i].click()` | Simulate button click | Chain with `.run()` |
|
| 1455 |
+
| `at.text_input[i].input(val)` | Simulate text entry | Chain with `.run()` |
|
| 1456 |
+
| `at.slider[i].set_value(val)` | Set slider value | Chain with `.run()` |
|
| 1457 |
+
|
| 1458 |
+
### 7.2 Parlant Key APIs (from DeepWiki `emcie-co/parlant`)
|
| 1459 |
+
|
| 1460 |
+
**REST Endpoints:**
|
| 1461 |
+
|
| 1462 |
+
| Endpoint | Method | Purpose | Key Params |
|
| 1463 |
+
|----------|--------|---------|------------|
|
| 1464 |
+
| `/agents` | POST | Create agent | `name`, `description` |
|
| 1465 |
+
| `/sessions` | POST | Create session | `agent_id`, `customer_id` (optional), `title`, `metadata` |
|
| 1466 |
+
| `/sessions` | GET | List sessions | `agent_id`, `customer_id`, `limit`, `cursor`, `sort` |
|
| 1467 |
+
| `/sessions/{id}/events` | POST | Send event | `kind`, `source`, `message`/`data`, `metadata`; query: `moderation` |
|
| 1468 |
+
| `/sessions/{id}/events` | GET | Poll events | `min_offset`, `wait_for_data`, `source`, `correlation_id`, `trace_id`, `kinds` |
|
| 1469 |
+
| `/sessions/{id}/events/{eid}` | PATCH | Update event | metadata updates only |
|
| 1470 |
+
|
| 1471 |
+
**Event kinds:** `message`, `status`, `tool`, `custom`
|
| 1472 |
+
|
| 1473 |
+
**Event sources:** `customer`, `customer_ui`, `ai_agent`, `human_agent`, `human_agent_on_behalf_of_ai_agent`, `system`
|
| 1474 |
+
|
| 1475 |
+
**Status event states:** `acknowledged`, `processing`, `typing`, `ready`, `error`, `cancelled`
|
| 1476 |
+
|
| 1477 |
+
**Long-polling behavior:** `wait_for_data` > 0 blocks until new events or timeout; returns `504` on timeout.
|
| 1478 |
+
|
| 1479 |
+
**SDK APIs:**
|
| 1480 |
+
|
| 1481 |
+
| SDK Method | Purpose |
|
| 1482 |
+
|------------|---------|
|
| 1483 |
+
| `agent.create_journey(title, conditions, description)` | Create Journey with state machine |
|
| 1484 |
+
| `journey.initial_state.transition_to(chat_state=..., tool_state=..., condition=...)` | Define state transitions |
|
| 1485 |
+
| `agent.create_guideline(condition, action, tools=[...])` | Create global guideline |
|
| 1486 |
+
| `journey.create_guideline(condition, action, tools=[...])` | Create journey-scoped guideline |
|
| 1487 |
+
| `p.Server(session_store="local"/"mongodb://...")` | Configure session persistence |
|
| 1488 |
+
|
| 1489 |
+
**Tool decorator:** `@p.tool` auto-extracts name, description, parameters from function signature.
|
| 1490 |
+
|
| 1491 |
+
**NLP backend:** `parlant-server --gemini` (requires `GEMINI_API_KEY` and `pip install parlant[gemini]`).
|
| 1492 |
+
|
| 1493 |
+
**Client SDK:** `parlant-client` (Python), TypeScript client, or direct REST.
|
| 1494 |
+
|
| 1495 |
+
**Storage options:** in-memory (default/testing), local JSON, MongoDB (production).
|
| 1496 |
+
|
| 1497 |
+
### 7.3 Integration Pattern: Streamlit + Parlant
|
| 1498 |
+
|
| 1499 |
+
```
|
| 1500 |
+
User Action (Streamlit UI)
|
| 1501 |
+
-> st.session_state update
|
| 1502 |
+
-> ParlantClient.send_message() or send_custom_event()
|
| 1503 |
+
-> Parlant Server processes (async)
|
| 1504 |
+
-> @st.fragment polls ParlantClient.poll_events()
|
| 1505 |
+
-> New events update st.session_state
|
| 1506 |
+
-> UI rerenders with new data
|
| 1507 |
+
```
|
| 1508 |
+
|
| 1509 |
+
This polling loop runs via `@st.fragment(run_every=3)` to avoid blocking the main app thread, providing near-real-time updates without full page reruns.
|
| 1510 |
+
|
| 1511 |
+
---
|
| 1512 |
+
|
| 1513 |
+
## References
|
| 1514 |
+
|
| 1515 |
+
- Streamlit source: DeepWiki analysis of `streamlit/streamlit`
|
| 1516 |
+
- Parlant source: DeepWiki analysis of `emcie-co/parlant`
|
| 1517 |
+
- Parlant official docs: https://www.parlant.io/docs/
|
| 1518 |
+
- Parlant Sessions: https://www.parlant.io/docs/concepts/sessions/
|
| 1519 |
+
- Parlant Conversation API: https://www.parlant.io/docs/engine-internals/conversation-api/
|
| 1520 |
+
- Parlant GitHub: https://github.com/emcie-co/parlant
|
| 1521 |
+
- Parlant Journey System: DeepWiki `emcie-co/parlant` section 5.2
|
| 1522 |
+
- Parlant Guideline System: DeepWiki `emcie-co/parlant` section 5.1
|
| 1523 |
+
- Parlant Tool Integration: DeepWiki `emcie-co/parlant` section 6
|
| 1524 |
+
- Parlant NLP Providers: DeepWiki `emcie-co/parlant` section 10.1
|
pyproject.toml
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[project]
|
| 2 |
+
name = "trialpath"
|
| 3 |
+
version = "0.1.0"
|
| 4 |
+
description = "AI-powered clinical trial matching for NSCLC patients"
|
| 5 |
+
requires-python = ">=3.11"
|
| 6 |
+
dependencies = [
|
| 7 |
+
"pydantic>=2.0",
|
| 8 |
+
"httpx>=0.27",
|
| 9 |
+
"streamlit>=1.40",
|
| 10 |
+
"pytest>=8.0",
|
| 11 |
+
"pytest-asyncio>=0.24",
|
| 12 |
+
]
|
| 13 |
+
|
| 14 |
+
[project.optional-dependencies]
|
| 15 |
+
dev = [
|
| 16 |
+
"ruff>=0.8",
|
| 17 |
+
"pytest-cov>=6.0",
|
| 18 |
+
]
|
| 19 |
+
|
| 20 |
+
[tool.ruff]
|
| 21 |
+
line-length = 100
|
| 22 |
+
target-version = "py311"
|
| 23 |
+
|
| 24 |
+
[tool.ruff.lint]
|
| 25 |
+
select = ["E", "F", "I", "W"]
|
| 26 |
+
|
| 27 |
+
[tool.pytest.ini_options]
|
| 28 |
+
testpaths = ["trialpath/tests", "app/tests"]
|
| 29 |
+
asyncio_mode = "auto"
|
trialpath/__init__.py
ADDED
|
File without changes
|
trialpath/agent/__init__.py
ADDED
|
File without changes
|
trialpath/models/__init__.py
ADDED
|
File without changes
|
trialpath/services/__init__.py
ADDED
|
File without changes
|
trialpath/tests/__init__.py
ADDED
|
File without changes
|