TrialPath / docs /Trialpath PRD.md
yakilee's picture
chore: initialize project skeleton with pyproject.toml
1abff4e

HAI-DEF Pitch: MedGemma Match – Patient Trial Copilot

PoC Goal: Demonstrate MedGemma + Gemini 3 Pro + Parlant agentic architecture for patient-facing clinical trial matching with explainable eligibility reasoning and iterative gap-filling.


1. Problem & Unmet Need

The Challenge

  • Low trial participation: <5% of adult cancer patients enroll in clinical trials despite potential eligibility
  • Complex eligibility criteria: Free-text criteria mix demographics, biomarkers, labs, imaging findings, and treatment history
  • Patient barrier: Patients receive PDFs/reports but have no way to understand which trials fit their situation
  • Manual screening burden: Clinicians spend hours per patient manually reviewing eligibility; automated tools show mixed real-world performance

Why AI? Why Now?

  • Eligibility criteria require synthesis across multiple document types (pathology, labs, imaging, treatment history)β€”impossible with keyword search alone
  • Recent LLM-based matching systems (TrialGPT, PRISM) show promise but lack patient-centric design and multimodal medical understanding
  • HAI-DEF open-weight health models enable privacy-preserving deployment with medical domain expertise

2. Solution: MedGemma as Clinical Understanding Engine

Core Concept

"Agentic Search + Multimodal Extraction" replacing traditional vector-RAG approaches.

Architecture:

  • MedGemma (HAI-DEF): Extracts structured clinical facts from messy PDFs/reports + understands medical imaging contexts
  • Gemini 3 Pro: Orchestrates agentic search through ClinicalTrials.gov API with iterative query refinement
  • Parlant: Enforces state machine (search β†’ filter β†’ verify) and prevents parameter hallucination
  • ClinicalTrials MCP: Structured API wrapper for trials data (no vector DB needed)

Why MedGemma is Central (Not Replaceable)

  1. Multimodal medical reasoning: Designed for radiology reports, pathology, labsβ€”where generic LLMs are weaker
  2. Domain-aligned extraction: Medical entity recognition with units, dates, and clinical context preservation
  3. Open weights: Enables VPC deployment for future PHI handling (vs closed-weight alternatives)
  4. Health-safety guardrails: Model card emphasizes validation/adaptation patterns we follow

3. User Journey (Patient-Centric)

Target User (PoC Persona)

"Anna" – 52-year-old NSCLC patient in Berlin with PDFs from her oncologist but no trial navigation support.

Journey Flow

  1. Upload Documents β†’ Clinic letter, pathology report, lab results (synthetic PDFs in PoC)
  2. MedGemma Extraction β†’ System builds "My Clinical Profile (draft)": Stage IVa, EGFR status unknown, ECOG 1
  3. Agentic Search β†’ Gemini queries ClinicalTrials.gov via MCP:
    • Initial: condition=NSCLC, location=DE, status=RECRUITING, keywords=EGFR β†’ 47 results
    • Refines: Adds phase=PHASE3 β†’ 12 results
    • Reads summaries, filters to 5 relevant trials
  4. Eligibility Analysis β†’ For each trial, MedGemma evaluates criteria against extracted facts
  5. Gap Identification β†’ System highlights: "You'd likely qualify IF you had EGFR mutation test"
  6. Iteration β†’ Anna uploads biomarker report β†’ System re-matches β†’ 3 new trials appear
  7. Share with Doctor β†’ Generate clinician packet with evidence-linked eligibility ledger

Key Differentiator: The "Gap Analysis"

  • We don't just say "No Match"
  • We say: "You would match NCT12345 IF you had: recent brain MRI showing no active CNS disease"
  • This transforms "rejection" into "actionable next steps"

4. Technical Innovation: Smart Agentic Search (No Vector DB)

Traditional Approach (What We're Not Doing)

Patient text β†’ Embeddings β†’ Vector similarity search β†’ 
Retrieve top-K trials β†’ LLM re-ranks

Problem: Vector search is "dumb" about structured constraints (Phase, Location, Status) and negations.

Our Approach: Iterative Query Refinement

MedGemma extracts "Search Anchors" (Condition, Biomarkers, Location) β†’
Gemini formulates API query with filters β†’
ClinicalTrials MCP returns results β†’
Too many (>50)? β†’ Parlant enforces refinement (add phase/keywords)
Too few (0)? β†’ Parlant enforces relaxation (remove city filter)
Right size (10-30)? β†’ Gemini reads summaries in 2M context window β†’
Shortlist 5 NCT IDs β†’ Deep eligibility verification with MedGemma

Why This is Better:

  • Precision: Leverages native API filters (Phase, Status, Location) that vectors can't handle
  • Transparency: Every search step is logged and explainable ("I searched X, got Y results, refined to Z")
  • Feasibility: No vector DB infrastructure; uses live API
  • Showcases Gemini reasoning: Demonstrates multi-step planning vs one-shot retrieval

5. MedGemma Showcase Moments (HAI-DEF "Fullest Potential")

Use Case 1: Temporal Lab Extraction

Challenge: Criterion requires "ANC β‰₯ 1.5 Γ— 10⁹/L within 14 days of enrollment"

  • MedGemma extracts: Value=1.8, Units=10⁹/L, Date=2026-01-28, DocID=labs_jan.pdf
  • System verifies: Current date Feb 4 β†’ 7 days ago β†’ βœ“ MEETS criterion
  • Evidence link: User can click to see exact lab table and date

Use Case 2: Multimodal Imaging Context

Challenge: Criterion requires "No active CNS metastases"

  • MedGemma reads: Brain MRI report text: "Stable 3mm left frontal lesion, no enhancement, likely scarring from prior SRS"
  • System interprets: "Stable" + "no enhancement" + "scarring" β†’ Likely inactive β†’ Flags as ⚠️ UNKNOWN (requires clinician confirmation)
  • Evidence link: Highlights report section for doctor review

Use Case 3: Treatment Line Reconstruction

Challenge: Criterion excludes "Prior immune checkpoint inhibitor therapy"

  • MedGemma reconstructs: From medication list and notes β†’ Patient received Pembrolizumab 2024-06 to 2024-11
  • System verifies: β†’ βœ— EXCLUDED
  • Evidence link: Shows medication timeline with dates and sources

6. PoC Scope & Data Strategy

In Scope (3-Month PoC)

  • Disease: NSCLC only (complex biomarkers, high trial volume)
  • Data: Synthetic patients only (no real PHI)
  • Deliverables:
    • Working web prototype (video demo)
    • Experimental validation on TREC benchmarks
    • Technical write-up + public code repo

Data Sources

Patients (Synthetic):

  • Structured ground truth: Synthea FHIR (500 NSCLC patients)
  • Unstructured artifacts: LLM-generated clinic letters + lab PDFs with controlled noise (abbreviations, OCR errors, missing values)

Trials (Real):

  • ClinicalTrials.gov live API via MCP wrapper
  • Focus on NSCLC recruiting trials in Europe + US

Benchmarking:

  • TREC Clinical Trials Track 2021/2022 (75 patient topics + judged relevance)
  • Custom criterion-extraction test set (labeled synthetic reports)

7. Success Metrics & Evaluation Plan

Model Performance

Metric Target Baseline Method
MedGemma Extraction F1 β‰₯0.85 Gemini-only: 0.65-0.75 Field-level (stage, ECOG, biomarkers, labs) on labeled synthetic reports
Trial Retrieval Recall@50 β‰₯0.75 BM25: ~0.60 TREC 2021 patient topics
Trial Ranking NDCG@10 β‰₯0.60 Non-LLM baseline: ~0.45 TREC judged relevance
Criterion Decision Accuracy β‰₯0.85 Rule-based: ~0.70 Per-criterion classification on synthetic patient-trial pairs

Product Quality

  • Latency: <15s from upload to first match results
  • Explainability: 100% of "met/not met" decisions must include evidence pointer (trial text + patient doc ID)
  • Cost: <$0.50 per patient session (token + GPU usage)

UX Validation (Small Study)

  • Task completion: Can lay users identify β‰₯1 plausible trial from shortlist?
  • Explanation clarity: SUS-style usability score β‰₯70
  • Reading level: B1/8th-grade equivalent (Flesch-Kincaid)

8. Impact Potential

If PoC Succeeds (Quantified)

Near-term (PoC phase):

  • Demonstrate 15-25% relative improvement in ranking quality (NDCG) vs non-LLM baselines on TREC benchmarks
  • Show multimodal extraction advantage: MedGemma F1 β‰₯0.10 higher than Gemini-only on medical fields

Post-PoC (Real-world projection):

  • Patient impact: Based on literature showing automated tools can surface 20-30% more eligible trials vs manual search, and considering NSCLC patients often face 50+ active trials but only learn about 2-3 from their oncologist
  • Clinician impact: Trial coordinators report spending 2-4 hours per patient on manual screening; if our tool pre-screens with 85% sensitivity, reduces manual verification by ~60%
  • Trial enrollment: Even a 10% increase in eligible patient identification could improve trial recruitment timelines (major pharma pain point)

9. Risks & Mitigations

Risk Mitigation
Synthetic data too clean Add controlled noise to PDFs (OCR errors, abbreviations); validate against TREC which uses realistic synthetic cases
MedGemma hallucination on edge cases Implement evidence-pointer system (every decision must cite doc ID + span); flag low-confidence as "unknown" not "met"
API rate limits Cache trial protocols; batch requests during search refinement
Regulatory misunderstanding Explicit "information only, not medical advice" framing throughout UI; follow MedGemma model card guidance on validation/adaptation

10. Deliverables for HAI-DEF Submission

Video Demo (~5-7 min)

  • Patient persona introduction
  • Upload β†’ extraction visualization (showing MedGemma in action)
  • Agentic search loop (showing query refinement)
  • Match results with traffic-light eligibility cards
  • Gap-filling iteration (upload biomarker β†’ new matches)
  • "Share with doctor" packet generation

Technical Write-up

  1. Problem + why HAI-DEF models
  2. Architecture diagram (Parlant journey + MedGemma + Gemini + MCP)
  3. Data generation pipeline
  4. Experiments: extraction, retrieval, ranking (tables + ablations)
  5. Limitations + path to real PHI deployment

Code Repository

  • data/generate_synthetic_patients.py
  • data/generate_noisy_pdfs.py
  • matching/medgemma_extractor.py
  • matching/agentic_search.py (Parlant + Gemini + MCP)
  • evaluation/run_trec_benchmark.py
  • Clear README with one-command reproducibility

11. Why This Wins HAI-DEF

Effective Use of Models (20%)

βœ“ MedGemma as primary clinical understanding engine (extraction + multimodal)
βœ“ Concrete demos showing where non-HAI-DEF models fail (extraction accuracy gaps)
βœ“ Plan for task-specific evaluation showing measurable improvement

Problem Domain (15%)

βœ“ Clear unmet need (low trial enrollment, manual screening burden)
βœ“ Patient-centric storytelling ("Anna's journey")
βœ“ Evidence-based magnitude (enrollment stats, screening time data)

Impact Potential (15%)

βœ“ Quantified near-term (benchmark improvements) and long-term (enrollment lift) impact
βœ“ Clear calculation logic grounded in literature

Product Feasibility (20%)

βœ“ Detailed technical architecture (agentic search innovation)
βœ“ Realistic synthetic data strategy
βœ“ Concrete evaluation plan with baselines
βœ“ Deployment considerations (latency, cost, safety)

Execution & Communication (30%)

βœ“ Cohesive narrative across video + write-up + code
βœ“ Reproducible experiments
βœ“ Clear explanation of design choices
βœ“ Professional polish (evidence pointers, explanations, UX details)


Timeline: 3 months to PoC demo ready for HAI-DEF submission.

Team needs: 1 ML engineer (MedGemma fine-tuning + evaluation), 1 full-stack engineer (web app + Parlant orchestration), 1 CPO (coordination + submission materials).