Spaces:

yakilee
/

TrialPath

Sleeping

App Files Files Community

TrialPath / docs /Trialpath PRD.md

yakilee

chore: initialize project skeleton with pyproject.toml

1abff4e about 2 months ago

preview code

raw

history blame contribute delete

11.9 kB

	# HAI-DEF Pitch: MedGemma Match – Patient Trial Copilot

	PoC Goal: Demonstrate MedGemma + Gemini 3 Pro + Parlant agentic architecture for patient-facing clinical trial matching with explainable eligibility reasoning and iterative gap-filling.

	---

	## 1. Problem & Unmet Need

	### The Challenge
	- Low trial participation: <5% of adult cancer patients enroll in clinical trials despite potential eligibility
	- Complex eligibility criteria: Free-text criteria mix demographics, biomarkers, labs, imaging findings, and treatment history
	- Patient barrier: Patients receive PDFs/reports but have no way to understand which trials fit their situation
	- Manual screening burden: Clinicians spend hours per patient manually reviewing eligibility; automated tools show mixed real-world performance

	### Why AI? Why Now?
	- Eligibility criteria require synthesis across multiple document types (pathology, labs, imaging, treatment history)—impossible with keyword search alone
	- Recent LLM-based matching systems (TrialGPT, PRISM) show promise but lack patient-centric design and multimodal medical understanding
	- HAI-DEF open-weight health models enable privacy-preserving deployment with medical domain expertise

	---

	## 2. Solution: MedGemma as Clinical Understanding Engine

	### Core Concept
	"Agentic Search + Multimodal Extraction" replacing traditional vector-RAG approaches.

	Architecture:
	- MedGemma (HAI-DEF): Extracts structured clinical facts from messy PDFs/reports + understands medical imaging contexts
	- Gemini 3 Pro: Orchestrates agentic search through ClinicalTrials.gov API with iterative query refinement
	- Parlant: Enforces state machine (search → filter → verify) and prevents parameter hallucination
	- ClinicalTrials MCP: Structured API wrapper for trials data (no vector DB needed)

	### Why MedGemma is Central (Not Replaceable)
	1. Multimodal medical reasoning: Designed for radiology reports, pathology, labs—where generic LLMs are weaker
	2. Domain-aligned extraction: Medical entity recognition with units, dates, and clinical context preservation
	3. Open weights: Enables VPC deployment for future PHI handling (vs closed-weight alternatives)
	4. Health-safety guardrails: Model card emphasizes validation/adaptation patterns we follow

	---

	## 3. User Journey (Patient-Centric)

	### Target User (PoC Persona)
	"Anna" – 52-year-old NSCLC patient in Berlin with PDFs from her oncologist but no trial navigation support.

	### Journey Flow
	1. Upload Documents → Clinic letter, pathology report, lab results (synthetic PDFs in PoC)
	2. MedGemma Extraction → System builds "My Clinical Profile (draft)": Stage IVa, EGFR status unknown, ECOG 1
	3. Agentic Search → Gemini queries ClinicalTrials.gov via MCP:
	- Initial: `condition=NSCLC, location=DE, status=RECRUITING, keywords=EGFR` → 47 results
	- Refines: Adds `phase=PHASE3` → 12 results
	- Reads summaries, filters to 5 relevant trials
	4. Eligibility Analysis → For each trial, MedGemma evaluates criteria against extracted facts
	5. Gap Identification → System highlights: "You'd likely qualify IF you had EGFR mutation test"
	6. Iteration → Anna uploads biomarker report → System re-matches → 3 new trials appear
	7. Share with Doctor → Generate clinician packet with evidence-linked eligibility ledger

	### Key Differentiator: The "Gap Analysis"
	- We don't just say "No Match"
	- We say: "You would match NCT12345 IF you had: recent brain MRI showing no active CNS disease"
	- This transforms "rejection" into "actionable next steps"

	---

	## 4. Technical Innovation: Smart Agentic Search (No Vector DB)

	### Traditional Approach (What We're Not Doing)
	```
	Patient text → Embeddings → Vector similarity search →
	Retrieve top-K trials → LLM re-ranks
	```
	Problem: Vector search is "dumb" about structured constraints (Phase, Location, Status) and negations.

	### Our Approach: Iterative Query Refinement
	```
	MedGemma extracts "Search Anchors" (Condition, Biomarkers, Location) →
	Gemini formulates API query with filters →
	ClinicalTrials MCP returns results →
	Too many (>50)? → Parlant enforces refinement (add phase/keywords)
	Too few (0)? → Parlant enforces relaxation (remove city filter)
	Right size (10-30)? → Gemini reads summaries in 2M context window →
	Shortlist 5 NCT IDs → Deep eligibility verification with MedGemma
	```

	Why This is Better:
	- Precision: Leverages native API filters (Phase, Status, Location) that vectors can't handle
	- Transparency: Every search step is logged and explainable ("I searched X, got Y results, refined to Z")
	- Feasibility: No vector DB infrastructure; uses live API
	- Showcases Gemini reasoning: Demonstrates multi-step planning vs one-shot retrieval

	---

	## 5. MedGemma Showcase Moments (HAI-DEF "Fullest Potential")

	### Use Case 1: Temporal Lab Extraction
	Challenge: Criterion requires "ANC ≥ 1.5 × 10⁹/L within 14 days of enrollment"
	- MedGemma extracts: Value=1.8, Units=10⁹/L, Date=2026-01-28, DocID=labs_jan.pdf
	- System verifies: Current date Feb 4 → 7 days ago → ✓ MEETS criterion
	- Evidence link: User can click to see exact lab table and date

	### Use Case 2: Multimodal Imaging Context
	Challenge: Criterion requires "No active CNS metastases"
	- MedGemma reads: Brain MRI report text: "Stable 3mm left frontal lesion, no enhancement, likely scarring from prior SRS"
	- System interprets: "Stable" + "no enhancement" + "scarring" → Likely inactive → Flags as ⚠️ UNKNOWN (requires clinician confirmation)
	- Evidence link: Highlights report section for doctor review

	### Use Case 3: Treatment Line Reconstruction
	Challenge: Criterion excludes "Prior immune checkpoint inhibitor therapy"
	- MedGemma reconstructs: From medication list and notes → Patient received Pembrolizumab 2024-06 to 2024-11
	- System verifies: → ✗ EXCLUDED
	- Evidence link: Shows medication timeline with dates and sources

	---

	## 6. PoC Scope & Data Strategy

	### In Scope (3-Month PoC)
	- Disease: NSCLC only (complex biomarkers, high trial volume)
	- Data: Synthetic patients only (no real PHI)
	- Deliverables:
	- Working web prototype (video demo)
	- Experimental validation on TREC benchmarks
	- Technical write-up + public code repo

	### Data Sources
	Patients (Synthetic):
	- Structured ground truth: Synthea FHIR (500 NSCLC patients)
	- Unstructured artifacts: LLM-generated clinic letters + lab PDFs with controlled noise (abbreviations, OCR errors, missing values)

	Trials (Real):
	- ClinicalTrials.gov live API via MCP wrapper
	- Focus on NSCLC recruiting trials in Europe + US

	Benchmarking:
	- TREC Clinical Trials Track 2021/2022 (75 patient topics + judged relevance)
	- Custom criterion-extraction test set (labeled synthetic reports)

	---

	## 7. Success Metrics & Evaluation Plan

	### Model Performance
	\| Metric \| Target \| Baseline \| Method \|
	\|--------\|--------\|----------\|--------\|
	\| MedGemma Extraction F1 \| ≥0.85 \| Gemini-only: 0.65-0.75 \| Field-level (stage, ECOG, biomarkers, labs) on labeled synthetic reports \|
	\| Trial Retrieval Recall@50 \| ≥0.75 \| BM25: ~0.60 \| TREC 2021 patient topics \|
	\| Trial Ranking NDCG@10 \| ≥0.60 \| Non-LLM baseline: ~0.45 \| TREC judged relevance \|
	\| Criterion Decision Accuracy \| ≥0.85 \| Rule-based: ~0.70 \| Per-criterion classification on synthetic patient-trial pairs \|

	### Product Quality
	- Latency: <15s from upload to first match results
	- Explainability: 100% of "met/not met" decisions must include evidence pointer (trial text + patient doc ID)
	- Cost: <$0.50 per patient session (token + GPU usage)

	### UX Validation (Small Study)
	- Task completion: Can lay users identify ≥1 plausible trial from shortlist?
	- Explanation clarity: SUS-style usability score ≥70
	- Reading level: B1/8th-grade equivalent (Flesch-Kincaid)

	---

	## 8. Impact Potential

	### If PoC Succeeds (Quantified)
	Near-term (PoC phase):
	- Demonstrate 15-25% relative improvement in ranking quality (NDCG) vs non-LLM baselines on TREC benchmarks
	- Show multimodal extraction advantage: MedGemma F1 ≥0.10 higher than Gemini-only on medical fields

	Post-PoC (Real-world projection):
	- Patient impact: Based on literature showing automated tools can surface 20-30% more eligible trials vs manual search, and considering NSCLC patients often face 50+ active trials but only learn about 2-3 from their oncologist
	- Clinician impact: Trial coordinators report spending 2-4 hours per patient on manual screening; if our tool pre-screens with 85% sensitivity, reduces manual verification by ~60%
	- Trial enrollment: Even a 10% increase in eligible patient identification could improve trial recruitment timelines (major pharma pain point)

	---

	## 9. Risks & Mitigations

	\| Risk \| Mitigation \|
	\|------\|-----------\|
	\| Synthetic data too clean \| Add controlled noise to PDFs (OCR errors, abbreviations); validate against TREC which uses realistic synthetic cases \|
	\| MedGemma hallucination on edge cases \| Implement evidence-pointer system (every decision must cite doc ID + span); flag low-confidence as "unknown" not "met" \|
	\| API rate limits \| Cache trial protocols; batch requests during search refinement \|
	\| Regulatory misunderstanding \| Explicit "information only, not medical advice" framing throughout UI; follow MedGemma model card guidance on validation/adaptation \|

	---

	## 10. Deliverables for HAI-DEF Submission

	### Video Demo (~5-7 min)
	- Patient persona introduction
	- Upload → extraction visualization (showing MedGemma in action)
	- Agentic search loop (showing query refinement)
	- Match results with traffic-light eligibility cards
	- Gap-filling iteration (upload biomarker → new matches)
	- "Share with doctor" packet generation

	### Technical Write-up
	1. Problem + why HAI-DEF models
	2. Architecture diagram (Parlant journey + MedGemma + Gemini + MCP)
	3. Data generation pipeline
	4. Experiments: extraction, retrieval, ranking (tables + ablations)
	5. Limitations + path to real PHI deployment

	### Code Repository
	- `data/generate_synthetic_patients.py`
	- `data/generate_noisy_pdfs.py`
	- `matching/medgemma_extractor.py`
	- `matching/agentic_search.py` (Parlant + Gemini + MCP)
	- `evaluation/run_trec_benchmark.py`
	- Clear README with one-command reproducibility

	---

	## 11. Why This Wins HAI-DEF

	### Effective Use of Models (20%)
	✓ MedGemma as primary clinical understanding engine (extraction + multimodal)
	✓ Concrete demos showing where non-HAI-DEF models fail (extraction accuracy gaps)
	✓ Plan for task-specific evaluation showing measurable improvement

	### Problem Domain (15%)
	✓ Clear unmet need (low trial enrollment, manual screening burden)
	✓ Patient-centric storytelling ("Anna's journey")
	✓ Evidence-based magnitude (enrollment stats, screening time data)

	### Impact Potential (15%)
	✓ Quantified near-term (benchmark improvements) and long-term (enrollment lift) impact
	✓ Clear calculation logic grounded in literature

	### Product Feasibility (20%)
	✓ Detailed technical architecture (agentic search innovation)
	✓ Realistic synthetic data strategy
	✓ Concrete evaluation plan with baselines
	✓ Deployment considerations (latency, cost, safety)

	### Execution & Communication (30%)
	✓ Cohesive narrative across video + write-up + code
	✓ Reproducible experiments
	✓ Clear explanation of design choices
	✓ Professional polish (evidence pointers, explanations, UX details)

	---

	Timeline: 3 months to PoC demo ready for HAI-DEF submission.

	Team needs: 1 ML engineer (MedGemma fine-tuning + evaluation), 1 full-stack engineer (web app + Parlant orchestration), 1 CPO (coordination + submission materials).