HAI-DEF Pitch: MedGemma Match β Patient Trial Copilot
PoC Goal: Demonstrate MedGemma + Gemini 3 Pro + Parlant agentic architecture for patient-facing clinical trial matching with explainable eligibility reasoning and iterative gap-filling.
1. Problem & Unmet Need
The Challenge
- Low trial participation: <5% of adult cancer patients enroll in clinical trials despite potential eligibility
- Complex eligibility criteria: Free-text criteria mix demographics, biomarkers, labs, imaging findings, and treatment history
- Patient barrier: Patients receive PDFs/reports but have no way to understand which trials fit their situation
- Manual screening burden: Clinicians spend hours per patient manually reviewing eligibility; automated tools show mixed real-world performance
Why AI? Why Now?
- Eligibility criteria require synthesis across multiple document types (pathology, labs, imaging, treatment history)βimpossible with keyword search alone
- Recent LLM-based matching systems (TrialGPT, PRISM) show promise but lack patient-centric design and multimodal medical understanding
- HAI-DEF open-weight health models enable privacy-preserving deployment with medical domain expertise
2. Solution: MedGemma as Clinical Understanding Engine
Core Concept
"Agentic Search + Multimodal Extraction" replacing traditional vector-RAG approaches.
Architecture:
- MedGemma (HAI-DEF): Extracts structured clinical facts from messy PDFs/reports + understands medical imaging contexts
- Gemini 3 Pro: Orchestrates agentic search through ClinicalTrials.gov API with iterative query refinement
- Parlant: Enforces state machine (search β filter β verify) and prevents parameter hallucination
- ClinicalTrials MCP: Structured API wrapper for trials data (no vector DB needed)
Why MedGemma is Central (Not Replaceable)
- Multimodal medical reasoning: Designed for radiology reports, pathology, labsβwhere generic LLMs are weaker
- Domain-aligned extraction: Medical entity recognition with units, dates, and clinical context preservation
- Open weights: Enables VPC deployment for future PHI handling (vs closed-weight alternatives)
- Health-safety guardrails: Model card emphasizes validation/adaptation patterns we follow
3. User Journey (Patient-Centric)
Target User (PoC Persona)
"Anna" β 52-year-old NSCLC patient in Berlin with PDFs from her oncologist but no trial navigation support.
Journey Flow
- Upload Documents β Clinic letter, pathology report, lab results (synthetic PDFs in PoC)
- MedGemma Extraction β System builds "My Clinical Profile (draft)": Stage IVa, EGFR status unknown, ECOG 1
- Agentic Search β Gemini queries ClinicalTrials.gov via MCP:
- Initial:
condition=NSCLC, location=DE, status=RECRUITING, keywords=EGFRβ 47 results - Refines: Adds
phase=PHASE3β 12 results - Reads summaries, filters to 5 relevant trials
- Initial:
- Eligibility Analysis β For each trial, MedGemma evaluates criteria against extracted facts
- Gap Identification β System highlights: "You'd likely qualify IF you had EGFR mutation test"
- Iteration β Anna uploads biomarker report β System re-matches β 3 new trials appear
- Share with Doctor β Generate clinician packet with evidence-linked eligibility ledger
Key Differentiator: The "Gap Analysis"
- We don't just say "No Match"
- We say: "You would match NCT12345 IF you had: recent brain MRI showing no active CNS disease"
- This transforms "rejection" into "actionable next steps"
4. Technical Innovation: Smart Agentic Search (No Vector DB)
Traditional Approach (What We're Not Doing)
Patient text β Embeddings β Vector similarity search β
Retrieve top-K trials β LLM re-ranks
Problem: Vector search is "dumb" about structured constraints (Phase, Location, Status) and negations.
Our Approach: Iterative Query Refinement
MedGemma extracts "Search Anchors" (Condition, Biomarkers, Location) β
Gemini formulates API query with filters β
ClinicalTrials MCP returns results β
Too many (>50)? β Parlant enforces refinement (add phase/keywords)
Too few (0)? β Parlant enforces relaxation (remove city filter)
Right size (10-30)? β Gemini reads summaries in 2M context window β
Shortlist 5 NCT IDs β Deep eligibility verification with MedGemma
Why This is Better:
- Precision: Leverages native API filters (Phase, Status, Location) that vectors can't handle
- Transparency: Every search step is logged and explainable ("I searched X, got Y results, refined to Z")
- Feasibility: No vector DB infrastructure; uses live API
- Showcases Gemini reasoning: Demonstrates multi-step planning vs one-shot retrieval
5. MedGemma Showcase Moments (HAI-DEF "Fullest Potential")
Use Case 1: Temporal Lab Extraction
Challenge: Criterion requires "ANC β₯ 1.5 Γ 10βΉ/L within 14 days of enrollment"
- MedGemma extracts: Value=1.8, Units=10βΉ/L, Date=2026-01-28, DocID=labs_jan.pdf
- System verifies: Current date Feb 4 β 7 days ago β β MEETS criterion
- Evidence link: User can click to see exact lab table and date
Use Case 2: Multimodal Imaging Context
Challenge: Criterion requires "No active CNS metastases"
- MedGemma reads: Brain MRI report text: "Stable 3mm left frontal lesion, no enhancement, likely scarring from prior SRS"
- System interprets: "Stable" + "no enhancement" + "scarring" β Likely inactive β Flags as β οΈ UNKNOWN (requires clinician confirmation)
- Evidence link: Highlights report section for doctor review
Use Case 3: Treatment Line Reconstruction
Challenge: Criterion excludes "Prior immune checkpoint inhibitor therapy"
- MedGemma reconstructs: From medication list and notes β Patient received Pembrolizumab 2024-06 to 2024-11
- System verifies: β β EXCLUDED
- Evidence link: Shows medication timeline with dates and sources
6. PoC Scope & Data Strategy
In Scope (3-Month PoC)
- Disease: NSCLC only (complex biomarkers, high trial volume)
- Data: Synthetic patients only (no real PHI)
- Deliverables:
- Working web prototype (video demo)
- Experimental validation on TREC benchmarks
- Technical write-up + public code repo
Data Sources
Patients (Synthetic):
- Structured ground truth: Synthea FHIR (500 NSCLC patients)
- Unstructured artifacts: LLM-generated clinic letters + lab PDFs with controlled noise (abbreviations, OCR errors, missing values)
Trials (Real):
- ClinicalTrials.gov live API via MCP wrapper
- Focus on NSCLC recruiting trials in Europe + US
Benchmarking:
- TREC Clinical Trials Track 2021/2022 (75 patient topics + judged relevance)
- Custom criterion-extraction test set (labeled synthetic reports)
7. Success Metrics & Evaluation Plan
Model Performance
| Metric | Target | Baseline | Method |
|---|---|---|---|
| MedGemma Extraction F1 | β₯0.85 | Gemini-only: 0.65-0.75 | Field-level (stage, ECOG, biomarkers, labs) on labeled synthetic reports |
| Trial Retrieval Recall@50 | β₯0.75 | BM25: ~0.60 | TREC 2021 patient topics |
| Trial Ranking NDCG@10 | β₯0.60 | Non-LLM baseline: ~0.45 | TREC judged relevance |
| Criterion Decision Accuracy | β₯0.85 | Rule-based: ~0.70 | Per-criterion classification on synthetic patient-trial pairs |
Product Quality
- Latency: <15s from upload to first match results
- Explainability: 100% of "met/not met" decisions must include evidence pointer (trial text + patient doc ID)
- Cost: <$0.50 per patient session (token + GPU usage)
UX Validation (Small Study)
- Task completion: Can lay users identify β₯1 plausible trial from shortlist?
- Explanation clarity: SUS-style usability score β₯70
- Reading level: B1/8th-grade equivalent (Flesch-Kincaid)
8. Impact Potential
If PoC Succeeds (Quantified)
Near-term (PoC phase):
- Demonstrate 15-25% relative improvement in ranking quality (NDCG) vs non-LLM baselines on TREC benchmarks
- Show multimodal extraction advantage: MedGemma F1 β₯0.10 higher than Gemini-only on medical fields
Post-PoC (Real-world projection):
- Patient impact: Based on literature showing automated tools can surface 20-30% more eligible trials vs manual search, and considering NSCLC patients often face 50+ active trials but only learn about 2-3 from their oncologist
- Clinician impact: Trial coordinators report spending 2-4 hours per patient on manual screening; if our tool pre-screens with 85% sensitivity, reduces manual verification by ~60%
- Trial enrollment: Even a 10% increase in eligible patient identification could improve trial recruitment timelines (major pharma pain point)
9. Risks & Mitigations
| Risk | Mitigation |
|---|---|
| Synthetic data too clean | Add controlled noise to PDFs (OCR errors, abbreviations); validate against TREC which uses realistic synthetic cases |
| MedGemma hallucination on edge cases | Implement evidence-pointer system (every decision must cite doc ID + span); flag low-confidence as "unknown" not "met" |
| API rate limits | Cache trial protocols; batch requests during search refinement |
| Regulatory misunderstanding | Explicit "information only, not medical advice" framing throughout UI; follow MedGemma model card guidance on validation/adaptation |
10. Deliverables for HAI-DEF Submission
Video Demo (~5-7 min)
- Patient persona introduction
- Upload β extraction visualization (showing MedGemma in action)
- Agentic search loop (showing query refinement)
- Match results with traffic-light eligibility cards
- Gap-filling iteration (upload biomarker β new matches)
- "Share with doctor" packet generation
Technical Write-up
- Problem + why HAI-DEF models
- Architecture diagram (Parlant journey + MedGemma + Gemini + MCP)
- Data generation pipeline
- Experiments: extraction, retrieval, ranking (tables + ablations)
- Limitations + path to real PHI deployment
Code Repository
data/generate_synthetic_patients.pydata/generate_noisy_pdfs.pymatching/medgemma_extractor.pymatching/agentic_search.py(Parlant + Gemini + MCP)evaluation/run_trec_benchmark.py- Clear README with one-command reproducibility
11. Why This Wins HAI-DEF
Effective Use of Models (20%)
β MedGemma as primary clinical understanding engine (extraction + multimodal)
β Concrete demos showing where non-HAI-DEF models fail (extraction accuracy gaps)
β Plan for task-specific evaluation showing measurable improvement
Problem Domain (15%)
β Clear unmet need (low trial enrollment, manual screening burden)
β Patient-centric storytelling ("Anna's journey")
β Evidence-based magnitude (enrollment stats, screening time data)
Impact Potential (15%)
β Quantified near-term (benchmark improvements) and long-term (enrollment lift) impact
β Clear calculation logic grounded in literature
Product Feasibility (20%)
β Detailed technical architecture (agentic search innovation)
β Realistic synthetic data strategy
β Concrete evaluation plan with baselines
β Deployment considerations (latency, cost, safety)
Execution & Communication (30%)
β Cohesive narrative across video + write-up + code
β Reproducible experiments
β Clear explanation of design choices
β Professional polish (evidence pointers, explanations, UX details)
Timeline: 3 months to PoC demo ready for HAI-DEF submission.
Team needs: 1 ML engineer (MedGemma fine-tuning + evaluation), 1 full-stack engineer (web app + Parlant orchestration), 1 CPO (coordination + submission materials).