TrialPath / docs /Trialpath PRD.md
yakilee's picture
chore: initialize project skeleton with pyproject.toml
1abff4e
# HAI-DEF Pitch: MedGemma Match – Patient Trial Copilot
**PoC Goal:** Demonstrate MedGemma + Gemini 3 Pro + Parlant agentic architecture for patient-facing clinical trial matching with **explainable eligibility reasoning** and **iterative gap-filling**.
---
## 1. Problem & Unmet Need
### The Challenge
- **Low trial participation:** <5% of adult cancer patients enroll in clinical trials despite potential eligibility
- **Complex eligibility criteria:** Free-text criteria mix demographics, biomarkers, labs, imaging findings, and treatment history
- **Patient barrier:** Patients receive PDFs/reports but have no way to understand which trials fit their situation
- **Manual screening burden:** Clinicians spend hours per patient manually reviewing eligibility; automated tools show mixed real-world performance
### Why AI? Why Now?
- Eligibility criteria require synthesis across multiple document types (pathology, labs, imaging, treatment history)β€”impossible with keyword search alone
- Recent LLM-based matching systems (TrialGPT, PRISM) show promise but lack patient-centric design and multimodal medical understanding
- HAI-DEF open-weight health models enable privacy-preserving deployment with medical domain expertise
---
## 2. Solution: MedGemma as Clinical Understanding Engine
### Core Concept
**"Agentic Search + Multimodal Extraction"** replacing traditional vector-RAG approaches.
**Architecture:**
- **MedGemma (HAI-DEF):** Extracts structured clinical facts from messy PDFs/reports + understands medical imaging contexts
- **Gemini 3 Pro:** Orchestrates agentic search through ClinicalTrials.gov API with iterative query refinement
- **Parlant:** Enforces state machine (search β†’ filter β†’ verify) and prevents parameter hallucination
- **ClinicalTrials MCP:** Structured API wrapper for trials data (no vector DB needed)
### Why MedGemma is Central (Not Replaceable)
1. **Multimodal medical reasoning:** Designed for radiology reports, pathology, labsβ€”where generic LLMs are weaker
2. **Domain-aligned extraction:** Medical entity recognition with units, dates, and clinical context preservation
3. **Open weights:** Enables VPC deployment for future PHI handling (vs closed-weight alternatives)
4. **Health-safety guardrails:** Model card emphasizes validation/adaptation patterns we follow
---
## 3. User Journey (Patient-Centric)
### Target User (PoC Persona)
**"Anna"** – 52-year-old NSCLC patient in Berlin with PDFs from her oncologist but no trial navigation support.
### Journey Flow
1. **Upload Documents** β†’ Clinic letter, pathology report, lab results (synthetic PDFs in PoC)
2. **MedGemma Extraction** β†’ System builds "My Clinical Profile (draft)": Stage IVa, EGFR status unknown, ECOG 1
3. **Agentic Search** β†’ Gemini queries ClinicalTrials.gov via MCP:
- Initial: `condition=NSCLC, location=DE, status=RECRUITING, keywords=EGFR` β†’ 47 results
- Refines: Adds `phase=PHASE3` β†’ 12 results
- Reads summaries, filters to 5 relevant trials
4. **Eligibility Analysis** β†’ For each trial, MedGemma evaluates criteria against extracted facts
5. **Gap Identification** β†’ System highlights: *"You'd likely qualify IF you had EGFR mutation test"*
6. **Iteration** β†’ Anna uploads biomarker report β†’ System re-matches β†’ 3 new trials appear
7. **Share with Doctor** β†’ Generate clinician packet with evidence-linked eligibility ledger
### Key Differentiator: The "Gap Analysis"
- We don't just say "No Match"
- We say: **"You would match NCT12345 IF you had: recent brain MRI showing no active CNS disease"**
- This transforms "rejection" into "actionable next steps"
---
## 4. Technical Innovation: Smart Agentic Search (No Vector DB)
### Traditional Approach (What We're *Not* Doing)
```
Patient text β†’ Embeddings β†’ Vector similarity search β†’
Retrieve top-K trials β†’ LLM re-ranks
```
**Problem:** Vector search is "dumb" about structured constraints (Phase, Location, Status) and negations.
### Our Approach: Iterative Query Refinement
```
MedGemma extracts "Search Anchors" (Condition, Biomarkers, Location) β†’
Gemini formulates API query with filters β†’
ClinicalTrials MCP returns results β†’
Too many (>50)? β†’ Parlant enforces refinement (add phase/keywords)
Too few (0)? β†’ Parlant enforces relaxation (remove city filter)
Right size (10-30)? β†’ Gemini reads summaries in 2M context window β†’
Shortlist 5 NCT IDs β†’ Deep eligibility verification with MedGemma
```
**Why This is Better:**
- **Precision:** Leverages native API filters (Phase, Status, Location) that vectors can't handle
- **Transparency:** Every search step is logged and explainable ("I searched X, got Y results, refined to Z")
- **Feasibility:** No vector DB infrastructure; uses live API
- **Showcases Gemini reasoning:** Demonstrates multi-step planning vs one-shot retrieval
---
## 5. MedGemma Showcase Moments (HAI-DEF "Fullest Potential")
### Use Case 1: Temporal Lab Extraction
**Challenge:** Criterion requires "ANC β‰₯ 1.5 Γ— 10⁹/L within 14 days of enrollment"
- **MedGemma extracts:** Value=1.8, Units=10⁹/L, Date=2026-01-28, DocID=labs_jan.pdf
- **System verifies:** Current date Feb 4 β†’ 7 days ago β†’ βœ“ MEETS criterion
- **Evidence link:** User can click to see exact lab table and date
### Use Case 2: Multimodal Imaging Context
**Challenge:** Criterion requires "No active CNS metastases"
- **MedGemma reads:** Brain MRI report text: *"Stable 3mm left frontal lesion, no enhancement, likely scarring from prior SRS"*
- **System interprets:** "Stable" + "no enhancement" + "scarring" β†’ Likely inactive β†’ Flags as ⚠️ UNKNOWN (requires clinician confirmation)
- **Evidence link:** Highlights report section for doctor review
### Use Case 3: Treatment Line Reconstruction
**Challenge:** Criterion excludes "Prior immune checkpoint inhibitor therapy"
- **MedGemma reconstructs:** From medication list and notes β†’ Patient received Pembrolizumab 2024-06 to 2024-11
- **System verifies:** β†’ βœ— EXCLUDED
- **Evidence link:** Shows medication timeline with dates and sources
---
## 6. PoC Scope & Data Strategy
### In Scope (3-Month PoC)
- **Disease:** NSCLC only (complex biomarkers, high trial volume)
- **Data:** Synthetic patients only (no real PHI)
- **Deliverables:**
- Working web prototype (video demo)
- Experimental validation on TREC benchmarks
- Technical write-up + public code repo
### Data Sources
**Patients (Synthetic):**
- Structured ground truth: Synthea FHIR (500 NSCLC patients)
- Unstructured artifacts: LLM-generated clinic letters + lab PDFs with controlled noise (abbreviations, OCR errors, missing values)
**Trials (Real):**
- ClinicalTrials.gov live API via MCP wrapper
- Focus on NSCLC recruiting trials in Europe + US
**Benchmarking:**
- TREC Clinical Trials Track 2021/2022 (75 patient topics + judged relevance)
- Custom criterion-extraction test set (labeled synthetic reports)
---
## 7. Success Metrics & Evaluation Plan
### Model Performance
| Metric | Target | Baseline | Method |
|--------|--------|----------|--------|
| **MedGemma Extraction F1** | β‰₯0.85 | Gemini-only: 0.65-0.75 | Field-level (stage, ECOG, biomarkers, labs) on labeled synthetic reports |
| **Trial Retrieval Recall@50** | β‰₯0.75 | BM25: ~0.60 | TREC 2021 patient topics |
| **Trial Ranking NDCG@10** | β‰₯0.60 | Non-LLM baseline: ~0.45 | TREC judged relevance |
| **Criterion Decision Accuracy** | β‰₯0.85 | Rule-based: ~0.70 | Per-criterion classification on synthetic patient-trial pairs |
### Product Quality
- **Latency:** <15s from upload to first match results
- **Explainability:** 100% of "met/not met" decisions must include evidence pointer (trial text + patient doc ID)
- **Cost:** <$0.50 per patient session (token + GPU usage)
### UX Validation (Small Study)
- Task completion: Can lay users identify β‰₯1 plausible trial from shortlist?
- Explanation clarity: SUS-style usability score β‰₯70
- Reading level: B1/8th-grade equivalent (Flesch-Kincaid)
---
## 8. Impact Potential
### If PoC Succeeds (Quantified)
**Near-term (PoC phase):**
- Demonstrate 15-25% relative improvement in ranking quality (NDCG) vs non-LLM baselines on TREC benchmarks
- Show multimodal extraction advantage: MedGemma F1 β‰₯0.10 higher than Gemini-only on medical fields
**Post-PoC (Real-world projection):**
- **Patient impact:** Based on literature showing automated tools can surface 20-30% more eligible trials vs manual search, and considering NSCLC patients often face 50+ active trials but only learn about 2-3 from their oncologist
- **Clinician impact:** Trial coordinators report spending 2-4 hours per patient on manual screening; if our tool pre-screens with 85% sensitivity, reduces manual verification by ~60%
- **Trial enrollment:** Even a 10% increase in eligible patient identification could improve trial recruitment timelines (major pharma pain point)
---
## 9. Risks & Mitigations
| Risk | Mitigation |
|------|-----------|
| **Synthetic data too clean** | Add controlled noise to PDFs (OCR errors, abbreviations); validate against TREC which uses realistic synthetic cases |
| **MedGemma hallucination on edge cases** | Implement evidence-pointer system (every decision must cite doc ID + span); flag low-confidence as "unknown" not "met" |
| **API rate limits** | Cache trial protocols; batch requests during search refinement |
| **Regulatory misunderstanding** | Explicit "information only, not medical advice" framing throughout UI; follow MedGemma model card guidance on validation/adaptation |
---
## 10. Deliverables for HAI-DEF Submission
### Video Demo (~5-7 min)
- Patient persona introduction
- Upload β†’ extraction visualization (showing MedGemma in action)
- Agentic search loop (showing query refinement)
- Match results with traffic-light eligibility cards
- Gap-filling iteration (upload biomarker β†’ new matches)
- "Share with doctor" packet generation
### Technical Write-up
1. Problem + why HAI-DEF models
2. Architecture diagram (Parlant journey + MedGemma + Gemini + MCP)
3. Data generation pipeline
4. Experiments: extraction, retrieval, ranking (tables + ablations)
5. Limitations + path to real PHI deployment
### Code Repository
- `data/generate_synthetic_patients.py`
- `data/generate_noisy_pdfs.py`
- `matching/medgemma_extractor.py`
- `matching/agentic_search.py` (Parlant + Gemini + MCP)
- `evaluation/run_trec_benchmark.py`
- Clear README with one-command reproducibility
---
## 11. Why This Wins HAI-DEF
### Effective Use of Models (20%)
βœ“ MedGemma as primary clinical understanding engine (extraction + multimodal)
βœ“ Concrete demos showing where non-HAI-DEF models fail (extraction accuracy gaps)
βœ“ Plan for task-specific evaluation showing measurable improvement
### Problem Domain (15%)
βœ“ Clear unmet need (low trial enrollment, manual screening burden)
βœ“ Patient-centric storytelling ("Anna's journey")
βœ“ Evidence-based magnitude (enrollment stats, screening time data)
### Impact Potential (15%)
βœ“ Quantified near-term (benchmark improvements) and long-term (enrollment lift) impact
βœ“ Clear calculation logic grounded in literature
### Product Feasibility (20%)
βœ“ Detailed technical architecture (agentic search innovation)
βœ“ Realistic synthetic data strategy
βœ“ Concrete evaluation plan with baselines
βœ“ Deployment considerations (latency, cost, safety)
### Execution & Communication (30%)
βœ“ Cohesive narrative across video + write-up + code
βœ“ Reproducible experiments
βœ“ Clear explanation of design choices
βœ“ Professional polish (evidence pointers, explanations, UX details)
---
**Timeline:** 3 months to PoC demo ready for HAI-DEF submission.
**Team needs:** 1 ML engineer (MedGemma fine-tuning + evaluation), 1 full-stack engineer (web app + Parlant orchestration), 1 CPO (coordination + submission materials).