Spaces:

yakilee
/

TrialPath

Sleeping

App Files Files Community

TrialPath / docs /Trialpath PRD.md

yakilee

chore: initialize project skeleton with pyproject.toml

1abff4e about 1 month ago

preview code

raw

history blame contribute delete

11.9 kB

HAI-DEF Pitch: MedGemma Match – Patient Trial Copilot

PoC Goal: Demonstrate MedGemma + Gemini 3 Pro + Parlant agentic architecture for patient-facing clinical trial matching with explainable eligibility reasoning and iterative gap-filling.

1. Problem & Unmet Need

The Challenge

Low trial participation: <5% of adult cancer patients enroll in clinical trials despite potential eligibility
Complex eligibility criteria: Free-text criteria mix demographics, biomarkers, labs, imaging findings, and treatment history
Patient barrier: Patients receive PDFs/reports but have no way to understand which trials fit their situation
Manual screening burden: Clinicians spend hours per patient manually reviewing eligibility; automated tools show mixed real-world performance

Why AI? Why Now?

Eligibility criteria require synthesis across multiple document types (pathology, labs, imaging, treatment history)—impossible with keyword search alone
Recent LLM-based matching systems (TrialGPT, PRISM) show promise but lack patient-centric design and multimodal medical understanding
HAI-DEF open-weight health models enable privacy-preserving deployment with medical domain expertise

2. Solution: MedGemma as Clinical Understanding Engine

Core Concept

"Agentic Search + Multimodal Extraction" replacing traditional vector-RAG approaches.

Architecture:

MedGemma (HAI-DEF): Extracts structured clinical facts from messy PDFs/reports + understands medical imaging contexts
Gemini 3 Pro: Orchestrates agentic search through ClinicalTrials.gov API with iterative query refinement
Parlant: Enforces state machine (search → filter → verify) and prevents parameter hallucination
ClinicalTrials MCP: Structured API wrapper for trials data (no vector DB needed)

Why MedGemma is Central (Not Replaceable)

Multimodal medical reasoning: Designed for radiology reports, pathology, labs—where generic LLMs are weaker
Domain-aligned extraction: Medical entity recognition with units, dates, and clinical context preservation
Open weights: Enables VPC deployment for future PHI handling (vs closed-weight alternatives)
Health-safety guardrails: Model card emphasizes validation/adaptation patterns we follow

3. User Journey (Patient-Centric)

Target User (PoC Persona)

"Anna" – 52-year-old NSCLC patient in Berlin with PDFs from her oncologist but no trial navigation support.

Journey Flow

Upload Documents → Clinic letter, pathology report, lab results (synthetic PDFs in PoC)
MedGemma Extraction → System builds "My Clinical Profile (draft)": Stage IVa, EGFR status unknown, ECOG 1
Agentic Search → Gemini queries ClinicalTrials.gov via MCP:
- Initial: condition=NSCLC, location=DE, status=RECRUITING, keywords=EGFR → 47 results
- Refines: Adds phase=PHASE3 → 12 results
- Reads summaries, filters to 5 relevant trials
Eligibility Analysis → For each trial, MedGemma evaluates criteria against extracted facts
Gap Identification → System highlights: "You'd likely qualify IF you had EGFR mutation test"
Iteration → Anna uploads biomarker report → System re-matches → 3 new trials appear
Share with Doctor → Generate clinician packet with evidence-linked eligibility ledger

Key Differentiator: The "Gap Analysis"

We don't just say "No Match"
We say: "You would match NCT12345 IF you had: recent brain MRI showing no active CNS disease"
This transforms "rejection" into "actionable next steps"

4. Technical Innovation: Smart Agentic Search (No Vector DB)

Traditional Approach (What We're Not Doing)

Patient text → Embeddings → Vector similarity search → 
Retrieve top-K trials → LLM re-ranks

Problem: Vector search is "dumb" about structured constraints (Phase, Location, Status) and negations.

Our Approach: Iterative Query Refinement

MedGemma extracts "Search Anchors" (Condition, Biomarkers, Location) →
Gemini formulates API query with filters →
ClinicalTrials MCP returns results →
Too many (>50)? → Parlant enforces refinement (add phase/keywords)
Too few (0)? → Parlant enforces relaxation (remove city filter)
Right size (10-30)? → Gemini reads summaries in 2M context window →
Shortlist 5 NCT IDs → Deep eligibility verification with MedGemma

Why This is Better:

Precision: Leverages native API filters (Phase, Status, Location) that vectors can't handle
Transparency: Every search step is logged and explainable ("I searched X, got Y results, refined to Z")
Feasibility: No vector DB infrastructure; uses live API
Showcases Gemini reasoning: Demonstrates multi-step planning vs one-shot retrieval

5. MedGemma Showcase Moments (HAI-DEF "Fullest Potential")

Use Case 1: Temporal Lab Extraction

Challenge: Criterion requires "ANC ≥ 1.5 × 10⁹/L within 14 days of enrollment"

MedGemma extracts: Value=1.8, Units=10⁹/L, Date=2026-01-28, DocID=labs_jan.pdf
System verifies: Current date Feb 4 → 7 days ago → ✓ MEETS criterion
Evidence link: User can click to see exact lab table and date

Use Case 2: Multimodal Imaging Context

Challenge: Criterion requires "No active CNS metastases"

MedGemma reads: Brain MRI report text: "Stable 3mm left frontal lesion, no enhancement, likely scarring from prior SRS"
System interprets: "Stable" + "no enhancement" + "scarring" → Likely inactive → Flags as ⚠️ UNKNOWN (requires clinician confirmation)
Evidence link: Highlights report section for doctor review

Use Case 3: Treatment Line Reconstruction

Challenge: Criterion excludes "Prior immune checkpoint inhibitor therapy"

MedGemma reconstructs: From medication list and notes → Patient received Pembrolizumab 2024-06 to 2024-11
System verifies: → ✗ EXCLUDED
Evidence link: Shows medication timeline with dates and sources

6. PoC Scope & Data Strategy

In Scope (3-Month PoC)

Disease: NSCLC only (complex biomarkers, high trial volume)
Data: Synthetic patients only (no real PHI)
Deliverables:
- Working web prototype (video demo)
- Experimental validation on TREC benchmarks
- Technical write-up + public code repo

Data Sources

Patients (Synthetic):

Structured ground truth: Synthea FHIR (500 NSCLC patients)
Unstructured artifacts: LLM-generated clinic letters + lab PDFs with controlled noise (abbreviations, OCR errors, missing values)

Trials (Real):

ClinicalTrials.gov live API via MCP wrapper
Focus on NSCLC recruiting trials in Europe + US

Benchmarking:

TREC Clinical Trials Track 2021/2022 (75 patient topics + judged relevance)
Custom criterion-extraction test set (labeled synthetic reports)

7. Success Metrics & Evaluation Plan

Model Performance

Metric	Target	Baseline	Method
MedGemma Extraction F1	≥0.85	Gemini-only: 0.65-0.75	Field-level (stage, ECOG, biomarkers, labs) on labeled synthetic reports
Trial Retrieval Recall@50	≥0.75	BM25: ~0.60	TREC 2021 patient topics
Trial Ranking NDCG@10	≥0.60	Non-LLM baseline: ~0.45	TREC judged relevance
Criterion Decision Accuracy	≥0.85	Rule-based: ~0.70	Per-criterion classification on synthetic patient-trial pairs

Product Quality

Latency: <15s from upload to first match results
Explainability: 100% of "met/not met" decisions must include evidence pointer (trial text + patient doc ID)
Cost: <$0.50 per patient session (token + GPU usage)

UX Validation (Small Study)

Task completion: Can lay users identify ≥1 plausible trial from shortlist?
Explanation clarity: SUS-style usability score ≥70
Reading level: B1/8th-grade equivalent (Flesch-Kincaid)

8. Impact Potential

If PoC Succeeds (Quantified)

Near-term (PoC phase):

Demonstrate 15-25% relative improvement in ranking quality (NDCG) vs non-LLM baselines on TREC benchmarks
Show multimodal extraction advantage: MedGemma F1 ≥0.10 higher than Gemini-only on medical fields

Post-PoC (Real-world projection):

Patient impact: Based on literature showing automated tools can surface 20-30% more eligible trials vs manual search, and considering NSCLC patients often face 50+ active trials but only learn about 2-3 from their oncologist
Clinician impact: Trial coordinators report spending 2-4 hours per patient on manual screening; if our tool pre-screens with 85% sensitivity, reduces manual verification by ~60%
Trial enrollment: Even a 10% increase in eligible patient identification could improve trial recruitment timelines (major pharma pain point)

9. Risks & Mitigations

Risk	Mitigation
Synthetic data too clean	Add controlled noise to PDFs (OCR errors, abbreviations); validate against TREC which uses realistic synthetic cases
MedGemma hallucination on edge cases	Implement evidence-pointer system (every decision must cite doc ID + span); flag low-confidence as "unknown" not "met"
API rate limits	Cache trial protocols; batch requests during search refinement
Regulatory misunderstanding	Explicit "information only, not medical advice" framing throughout UI; follow MedGemma model card guidance on validation/adaptation

10. Deliverables for HAI-DEF Submission

Video Demo (~5-7 min)

Patient persona introduction
Upload → extraction visualization (showing MedGemma in action)
Agentic search loop (showing query refinement)
Match results with traffic-light eligibility cards
Gap-filling iteration (upload biomarker → new matches)
"Share with doctor" packet generation

Technical Write-up

Problem + why HAI-DEF models
Architecture diagram (Parlant journey + MedGemma + Gemini + MCP)
Data generation pipeline
Experiments: extraction, retrieval, ranking (tables + ablations)
Limitations + path to real PHI deployment

Code Repository

data/generate_synthetic_patients.py
data/generate_noisy_pdfs.py
matching/medgemma_extractor.py
matching/agentic_search.py (Parlant + Gemini + MCP)
evaluation/run_trec_benchmark.py
Clear README with one-command reproducibility

11. Why This Wins HAI-DEF

Effective Use of Models (20%)

✓ MedGemma as primary clinical understanding engine (extraction + multimodal)
✓ Concrete demos showing where non-HAI-DEF models fail (extraction accuracy gaps)
✓ Plan for task-specific evaluation showing measurable improvement

Problem Domain (15%)

✓ Clear unmet need (low trial enrollment, manual screening burden)
✓ Patient-centric storytelling ("Anna's journey")
✓ Evidence-based magnitude (enrollment stats, screening time data)

Impact Potential (15%)

✓ Quantified near-term (benchmark improvements) and long-term (enrollment lift) impact
✓ Clear calculation logic grounded in literature

Product Feasibility (20%)

✓ Detailed technical architecture (agentic search innovation)
✓ Realistic synthetic data strategy
✓ Concrete evaluation plan with baselines
✓ Deployment considerations (latency, cost, safety)

Execution & Communication (30%)

✓ Cohesive narrative across video + write-up + code
✓ Reproducible experiments
✓ Clear explanation of design choices
✓ Professional polish (evidence pointers, explanations, UX details)

Timeline: 3 months to PoC demo ready for HAI-DEF submission.

Team needs: 1 ML engineer (MedGemma fine-tuning + evaluation), 1 full-stack engineer (web app + Parlant orchestration), 1 CPO (coordination + submission materials).