# HAI-DEF Pitch: MedGemma Match – Patient Trial Copilot **PoC Goal:** Demonstrate MedGemma + Gemini 3 Pro + Parlant agentic architecture for patient-facing clinical trial matching with **explainable eligibility reasoning** and **iterative gap-filling**. --- ## 1. Problem & Unmet Need ### The Challenge - **Low trial participation:** <5% of adult cancer patients enroll in clinical trials despite potential eligibility - **Complex eligibility criteria:** Free-text criteria mix demographics, biomarkers, labs, imaging findings, and treatment history - **Patient barrier:** Patients receive PDFs/reports but have no way to understand which trials fit their situation - **Manual screening burden:** Clinicians spend hours per patient manually reviewing eligibility; automated tools show mixed real-world performance ### Why AI? Why Now? - Eligibility criteria require synthesis across multiple document types (pathology, labs, imaging, treatment history)—impossible with keyword search alone - Recent LLM-based matching systems (TrialGPT, PRISM) show promise but lack patient-centric design and multimodal medical understanding - HAI-DEF open-weight health models enable privacy-preserving deployment with medical domain expertise --- ## 2. Solution: MedGemma as Clinical Understanding Engine ### Core Concept **"Agentic Search + Multimodal Extraction"** replacing traditional vector-RAG approaches. **Architecture:** - **MedGemma (HAI-DEF):** Extracts structured clinical facts from messy PDFs/reports + understands medical imaging contexts - **Gemini 3 Pro:** Orchestrates agentic search through ClinicalTrials.gov API with iterative query refinement - **Parlant:** Enforces state machine (search → filter → verify) and prevents parameter hallucination - **ClinicalTrials MCP:** Structured API wrapper for trials data (no vector DB needed) ### Why MedGemma is Central (Not Replaceable) 1. **Multimodal medical reasoning:** Designed for radiology reports, pathology, labs—where generic LLMs are weaker 2. **Domain-aligned extraction:** Medical entity recognition with units, dates, and clinical context preservation 3. **Open weights:** Enables VPC deployment for future PHI handling (vs closed-weight alternatives) 4. **Health-safety guardrails:** Model card emphasizes validation/adaptation patterns we follow --- ## 3. User Journey (Patient-Centric) ### Target User (PoC Persona) **"Anna"** – 52-year-old NSCLC patient in Berlin with PDFs from her oncologist but no trial navigation support. ### Journey Flow 1. **Upload Documents** → Clinic letter, pathology report, lab results (synthetic PDFs in PoC) 2. **MedGemma Extraction** → System builds "My Clinical Profile (draft)": Stage IVa, EGFR status unknown, ECOG 1 3. **Agentic Search** → Gemini queries ClinicalTrials.gov via MCP: - Initial: `condition=NSCLC, location=DE, status=RECRUITING, keywords=EGFR` → 47 results - Refines: Adds `phase=PHASE3` → 12 results - Reads summaries, filters to 5 relevant trials 4. **Eligibility Analysis** → For each trial, MedGemma evaluates criteria against extracted facts 5. **Gap Identification** → System highlights: *"You'd likely qualify IF you had EGFR mutation test"* 6. **Iteration** → Anna uploads biomarker report → System re-matches → 3 new trials appear 7. **Share with Doctor** → Generate clinician packet with evidence-linked eligibility ledger ### Key Differentiator: The "Gap Analysis" - We don't just say "No Match" - We say: **"You would match NCT12345 IF you had: recent brain MRI showing no active CNS disease"** - This transforms "rejection" into "actionable next steps" --- ## 4. Technical Innovation: Smart Agentic Search (No Vector DB) ### Traditional Approach (What We're *Not* Doing) ``` Patient text → Embeddings → Vector similarity search → Retrieve top-K trials → LLM re-ranks ``` **Problem:** Vector search is "dumb" about structured constraints (Phase, Location, Status) and negations. ### Our Approach: Iterative Query Refinement ``` MedGemma extracts "Search Anchors" (Condition, Biomarkers, Location) → Gemini formulates API query with filters → ClinicalTrials MCP returns results → Too many (>50)? → Parlant enforces refinement (add phase/keywords) Too few (0)? → Parlant enforces relaxation (remove city filter) Right size (10-30)? → Gemini reads summaries in 2M context window → Shortlist 5 NCT IDs → Deep eligibility verification with MedGemma ``` **Why This is Better:** - **Precision:** Leverages native API filters (Phase, Status, Location) that vectors can't handle - **Transparency:** Every search step is logged and explainable ("I searched X, got Y results, refined to Z") - **Feasibility:** No vector DB infrastructure; uses live API - **Showcases Gemini reasoning:** Demonstrates multi-step planning vs one-shot retrieval --- ## 5. MedGemma Showcase Moments (HAI-DEF "Fullest Potential") ### Use Case 1: Temporal Lab Extraction **Challenge:** Criterion requires "ANC ≥ 1.5 × 10⁹/L within 14 days of enrollment" - **MedGemma extracts:** Value=1.8, Units=10⁹/L, Date=2026-01-28, DocID=labs_jan.pdf - **System verifies:** Current date Feb 4 → 7 days ago → ✓ MEETS criterion - **Evidence link:** User can click to see exact lab table and date ### Use Case 2: Multimodal Imaging Context **Challenge:** Criterion requires "No active CNS metastases" - **MedGemma reads:** Brain MRI report text: *"Stable 3mm left frontal lesion, no enhancement, likely scarring from prior SRS"* - **System interprets:** "Stable" + "no enhancement" + "scarring" → Likely inactive → Flags as ⚠️ UNKNOWN (requires clinician confirmation) - **Evidence link:** Highlights report section for doctor review ### Use Case 3: Treatment Line Reconstruction **Challenge:** Criterion excludes "Prior immune checkpoint inhibitor therapy" - **MedGemma reconstructs:** From medication list and notes → Patient received Pembrolizumab 2024-06 to 2024-11 - **System verifies:** → ✗ EXCLUDED - **Evidence link:** Shows medication timeline with dates and sources --- ## 6. PoC Scope & Data Strategy ### In Scope (3-Month PoC) - **Disease:** NSCLC only (complex biomarkers, high trial volume) - **Data:** Synthetic patients only (no real PHI) - **Deliverables:** - Working web prototype (video demo) - Experimental validation on TREC benchmarks - Technical write-up + public code repo ### Data Sources **Patients (Synthetic):** - Structured ground truth: Synthea FHIR (500 NSCLC patients) - Unstructured artifacts: LLM-generated clinic letters + lab PDFs with controlled noise (abbreviations, OCR errors, missing values) **Trials (Real):** - ClinicalTrials.gov live API via MCP wrapper - Focus on NSCLC recruiting trials in Europe + US **Benchmarking:** - TREC Clinical Trials Track 2021/2022 (75 patient topics + judged relevance) - Custom criterion-extraction test set (labeled synthetic reports) --- ## 7. Success Metrics & Evaluation Plan ### Model Performance | Metric | Target | Baseline | Method | |--------|--------|----------|--------| | **MedGemma Extraction F1** | ≥0.85 | Gemini-only: 0.65-0.75 | Field-level (stage, ECOG, biomarkers, labs) on labeled synthetic reports | | **Trial Retrieval Recall@50** | ≥0.75 | BM25: ~0.60 | TREC 2021 patient topics | | **Trial Ranking NDCG@10** | ≥0.60 | Non-LLM baseline: ~0.45 | TREC judged relevance | | **Criterion Decision Accuracy** | ≥0.85 | Rule-based: ~0.70 | Per-criterion classification on synthetic patient-trial pairs | ### Product Quality - **Latency:** <15s from upload to first match results - **Explainability:** 100% of "met/not met" decisions must include evidence pointer (trial text + patient doc ID) - **Cost:** <$0.50 per patient session (token + GPU usage) ### UX Validation (Small Study) - Task completion: Can lay users identify ≥1 plausible trial from shortlist? - Explanation clarity: SUS-style usability score ≥70 - Reading level: B1/8th-grade equivalent (Flesch-Kincaid) --- ## 8. Impact Potential ### If PoC Succeeds (Quantified) **Near-term (PoC phase):** - Demonstrate 15-25% relative improvement in ranking quality (NDCG) vs non-LLM baselines on TREC benchmarks - Show multimodal extraction advantage: MedGemma F1 ≥0.10 higher than Gemini-only on medical fields **Post-PoC (Real-world projection):** - **Patient impact:** Based on literature showing automated tools can surface 20-30% more eligible trials vs manual search, and considering NSCLC patients often face 50+ active trials but only learn about 2-3 from their oncologist - **Clinician impact:** Trial coordinators report spending 2-4 hours per patient on manual screening; if our tool pre-screens with 85% sensitivity, reduces manual verification by ~60% - **Trial enrollment:** Even a 10% increase in eligible patient identification could improve trial recruitment timelines (major pharma pain point) --- ## 9. Risks & Mitigations | Risk | Mitigation | |------|-----------| | **Synthetic data too clean** | Add controlled noise to PDFs (OCR errors, abbreviations); validate against TREC which uses realistic synthetic cases | | **MedGemma hallucination on edge cases** | Implement evidence-pointer system (every decision must cite doc ID + span); flag low-confidence as "unknown" not "met" | | **API rate limits** | Cache trial protocols; batch requests during search refinement | | **Regulatory misunderstanding** | Explicit "information only, not medical advice" framing throughout UI; follow MedGemma model card guidance on validation/adaptation | --- ## 10. Deliverables for HAI-DEF Submission ### Video Demo (~5-7 min) - Patient persona introduction - Upload → extraction visualization (showing MedGemma in action) - Agentic search loop (showing query refinement) - Match results with traffic-light eligibility cards - Gap-filling iteration (upload biomarker → new matches) - "Share with doctor" packet generation ### Technical Write-up 1. Problem + why HAI-DEF models 2. Architecture diagram (Parlant journey + MedGemma + Gemini + MCP) 3. Data generation pipeline 4. Experiments: extraction, retrieval, ranking (tables + ablations) 5. Limitations + path to real PHI deployment ### Code Repository - `data/generate_synthetic_patients.py` - `data/generate_noisy_pdfs.py` - `matching/medgemma_extractor.py` - `matching/agentic_search.py` (Parlant + Gemini + MCP) - `evaluation/run_trec_benchmark.py` - Clear README with one-command reproducibility --- ## 11. Why This Wins HAI-DEF ### Effective Use of Models (20%) ✓ MedGemma as primary clinical understanding engine (extraction + multimodal) ✓ Concrete demos showing where non-HAI-DEF models fail (extraction accuracy gaps) ✓ Plan for task-specific evaluation showing measurable improvement ### Problem Domain (15%) ✓ Clear unmet need (low trial enrollment, manual screening burden) ✓ Patient-centric storytelling ("Anna's journey") ✓ Evidence-based magnitude (enrollment stats, screening time data) ### Impact Potential (15%) ✓ Quantified near-term (benchmark improvements) and long-term (enrollment lift) impact ✓ Clear calculation logic grounded in literature ### Product Feasibility (20%) ✓ Detailed technical architecture (agentic search innovation) ✓ Realistic synthetic data strategy ✓ Concrete evaluation plan with baselines ✓ Deployment considerations (latency, cost, safety) ### Execution & Communication (30%) ✓ Cohesive narrative across video + write-up + code ✓ Reproducible experiments ✓ Clear explanation of design choices ✓ Professional polish (evidence pointers, explanations, UX details) --- **Timeline:** 3 months to PoC demo ready for HAI-DEF submission. **Team needs:** 1 ML engineer (MedGemma fine-tuning + evaluation), 1 full-stack engineer (web app + Parlant orchestration), 1 CPO (coordination + submission materials).