| # HAI-DEF Pitch: MedGemma Match β Patient Trial Copilot |
|
|
| **PoC Goal:** Demonstrate MedGemma + Gemini 3 Pro + Parlant agentic architecture for patient-facing clinical trial matching with **explainable eligibility reasoning** and **iterative gap-filling**. |
|
|
| --- |
|
|
| ## 1. Problem & Unmet Need |
|
|
| ### The Challenge |
| - **Low trial participation:** <5% of adult cancer patients enroll in clinical trials despite potential eligibility |
| - **Complex eligibility criteria:** Free-text criteria mix demographics, biomarkers, labs, imaging findings, and treatment history |
| - **Patient barrier:** Patients receive PDFs/reports but have no way to understand which trials fit their situation |
| - **Manual screening burden:** Clinicians spend hours per patient manually reviewing eligibility; automated tools show mixed real-world performance |
|
|
| ### Why AI? Why Now? |
| - Eligibility criteria require synthesis across multiple document types (pathology, labs, imaging, treatment history)βimpossible with keyword search alone |
| - Recent LLM-based matching systems (TrialGPT, PRISM) show promise but lack patient-centric design and multimodal medical understanding |
| - HAI-DEF open-weight health models enable privacy-preserving deployment with medical domain expertise |
|
|
| --- |
|
|
| ## 2. Solution: MedGemma as Clinical Understanding Engine |
|
|
| ### Core Concept |
| **"Agentic Search + Multimodal Extraction"** replacing traditional vector-RAG approaches. |
|
|
| **Architecture:** |
| - **MedGemma (HAI-DEF):** Extracts structured clinical facts from messy PDFs/reports + understands medical imaging contexts |
| - **Gemini 3 Pro:** Orchestrates agentic search through ClinicalTrials.gov API with iterative query refinement |
| - **Parlant:** Enforces state machine (search β filter β verify) and prevents parameter hallucination |
| - **ClinicalTrials MCP:** Structured API wrapper for trials data (no vector DB needed) |
|
|
| ### Why MedGemma is Central (Not Replaceable) |
| 1. **Multimodal medical reasoning:** Designed for radiology reports, pathology, labsβwhere generic LLMs are weaker |
| 2. **Domain-aligned extraction:** Medical entity recognition with units, dates, and clinical context preservation |
| 3. **Open weights:** Enables VPC deployment for future PHI handling (vs closed-weight alternatives) |
| 4. **Health-safety guardrails:** Model card emphasizes validation/adaptation patterns we follow |
|
|
| --- |
|
|
| ## 3. User Journey (Patient-Centric) |
|
|
| ### Target User (PoC Persona) |
| **"Anna"** β 52-year-old NSCLC patient in Berlin with PDFs from her oncologist but no trial navigation support. |
|
|
| ### Journey Flow |
| 1. **Upload Documents** β Clinic letter, pathology report, lab results (synthetic PDFs in PoC) |
| 2. **MedGemma Extraction** β System builds "My Clinical Profile (draft)": Stage IVa, EGFR status unknown, ECOG 1 |
| 3. **Agentic Search** β Gemini queries ClinicalTrials.gov via MCP: |
| - Initial: `condition=NSCLC, location=DE, status=RECRUITING, keywords=EGFR` β 47 results |
| - Refines: Adds `phase=PHASE3` β 12 results |
| - Reads summaries, filters to 5 relevant trials |
| 4. **Eligibility Analysis** β For each trial, MedGemma evaluates criteria against extracted facts |
| 5. **Gap Identification** β System highlights: *"You'd likely qualify IF you had EGFR mutation test"* |
| 6. **Iteration** β Anna uploads biomarker report β System re-matches β 3 new trials appear |
| 7. **Share with Doctor** β Generate clinician packet with evidence-linked eligibility ledger |
|
|
| ### Key Differentiator: The "Gap Analysis" |
| - We don't just say "No Match" |
| - We say: **"You would match NCT12345 IF you had: recent brain MRI showing no active CNS disease"** |
| - This transforms "rejection" into "actionable next steps" |
|
|
| --- |
|
|
| ## 4. Technical Innovation: Smart Agentic Search (No Vector DB) |
|
|
| ### Traditional Approach (What We're *Not* Doing) |
| ``` |
| Patient text β Embeddings β Vector similarity search β |
| Retrieve top-K trials β LLM re-ranks |
| ``` |
| **Problem:** Vector search is "dumb" about structured constraints (Phase, Location, Status) and negations. |
|
|
| ### Our Approach: Iterative Query Refinement |
| ``` |
| MedGemma extracts "Search Anchors" (Condition, Biomarkers, Location) β |
| Gemini formulates API query with filters β |
| ClinicalTrials MCP returns results β |
| Too many (>50)? β Parlant enforces refinement (add phase/keywords) |
| Too few (0)? β Parlant enforces relaxation (remove city filter) |
| Right size (10-30)? β Gemini reads summaries in 2M context window β |
| Shortlist 5 NCT IDs β Deep eligibility verification with MedGemma |
| ``` |
|
|
| **Why This is Better:** |
| - **Precision:** Leverages native API filters (Phase, Status, Location) that vectors can't handle |
| - **Transparency:** Every search step is logged and explainable ("I searched X, got Y results, refined to Z") |
| - **Feasibility:** No vector DB infrastructure; uses live API |
| - **Showcases Gemini reasoning:** Demonstrates multi-step planning vs one-shot retrieval |
|
|
| --- |
|
|
| ## 5. MedGemma Showcase Moments (HAI-DEF "Fullest Potential") |
|
|
| ### Use Case 1: Temporal Lab Extraction |
| **Challenge:** Criterion requires "ANC β₯ 1.5 Γ 10βΉ/L within 14 days of enrollment" |
| - **MedGemma extracts:** Value=1.8, Units=10βΉ/L, Date=2026-01-28, DocID=labs_jan.pdf |
| - **System verifies:** Current date Feb 4 β 7 days ago β β MEETS criterion |
| - **Evidence link:** User can click to see exact lab table and date |
| |
| ### Use Case 2: Multimodal Imaging Context |
| **Challenge:** Criterion requires "No active CNS metastases" |
| - **MedGemma reads:** Brain MRI report text: *"Stable 3mm left frontal lesion, no enhancement, likely scarring from prior SRS"* |
| - **System interprets:** "Stable" + "no enhancement" + "scarring" β Likely inactive β Flags as β οΈ UNKNOWN (requires clinician confirmation) |
| - **Evidence link:** Highlights report section for doctor review |
| |
| ### Use Case 3: Treatment Line Reconstruction |
| **Challenge:** Criterion excludes "Prior immune checkpoint inhibitor therapy" |
| - **MedGemma reconstructs:** From medication list and notes β Patient received Pembrolizumab 2024-06 to 2024-11 |
| - **System verifies:** β β EXCLUDED |
| - **Evidence link:** Shows medication timeline with dates and sources |
| |
| --- |
| |
| ## 6. PoC Scope & Data Strategy |
| |
| ### In Scope (3-Month PoC) |
| - **Disease:** NSCLC only (complex biomarkers, high trial volume) |
| - **Data:** Synthetic patients only (no real PHI) |
| - **Deliverables:** |
| - Working web prototype (video demo) |
| - Experimental validation on TREC benchmarks |
| - Technical write-up + public code repo |
| |
| ### Data Sources |
| **Patients (Synthetic):** |
| - Structured ground truth: Synthea FHIR (500 NSCLC patients) |
| - Unstructured artifacts: LLM-generated clinic letters + lab PDFs with controlled noise (abbreviations, OCR errors, missing values) |
| |
| **Trials (Real):** |
| - ClinicalTrials.gov live API via MCP wrapper |
| - Focus on NSCLC recruiting trials in Europe + US |
| |
| **Benchmarking:** |
| - TREC Clinical Trials Track 2021/2022 (75 patient topics + judged relevance) |
| - Custom criterion-extraction test set (labeled synthetic reports) |
| |
| --- |
| |
| ## 7. Success Metrics & Evaluation Plan |
| |
| ### Model Performance |
| | Metric | Target | Baseline | Method | |
| |--------|--------|----------|--------| |
| | **MedGemma Extraction F1** | β₯0.85 | Gemini-only: 0.65-0.75 | Field-level (stage, ECOG, biomarkers, labs) on labeled synthetic reports | |
| | **Trial Retrieval Recall@50** | β₯0.75 | BM25: ~0.60 | TREC 2021 patient topics | |
| | **Trial Ranking NDCG@10** | β₯0.60 | Non-LLM baseline: ~0.45 | TREC judged relevance | |
| | **Criterion Decision Accuracy** | β₯0.85 | Rule-based: ~0.70 | Per-criterion classification on synthetic patient-trial pairs | |
| |
| ### Product Quality |
| - **Latency:** <15s from upload to first match results |
| - **Explainability:** 100% of "met/not met" decisions must include evidence pointer (trial text + patient doc ID) |
| - **Cost:** <$0.50 per patient session (token + GPU usage) |
| |
| ### UX Validation (Small Study) |
| - Task completion: Can lay users identify β₯1 plausible trial from shortlist? |
| - Explanation clarity: SUS-style usability score β₯70 |
| - Reading level: B1/8th-grade equivalent (Flesch-Kincaid) |
| |
| --- |
| |
| ## 8. Impact Potential |
| |
| ### If PoC Succeeds (Quantified) |
| **Near-term (PoC phase):** |
| - Demonstrate 15-25% relative improvement in ranking quality (NDCG) vs non-LLM baselines on TREC benchmarks |
| - Show multimodal extraction advantage: MedGemma F1 β₯0.10 higher than Gemini-only on medical fields |
| |
| **Post-PoC (Real-world projection):** |
| - **Patient impact:** Based on literature showing automated tools can surface 20-30% more eligible trials vs manual search, and considering NSCLC patients often face 50+ active trials but only learn about 2-3 from their oncologist |
| - **Clinician impact:** Trial coordinators report spending 2-4 hours per patient on manual screening; if our tool pre-screens with 85% sensitivity, reduces manual verification by ~60% |
| - **Trial enrollment:** Even a 10% increase in eligible patient identification could improve trial recruitment timelines (major pharma pain point) |
| |
| --- |
| |
| ## 9. Risks & Mitigations |
| |
| | Risk | Mitigation | |
| |------|-----------| |
| | **Synthetic data too clean** | Add controlled noise to PDFs (OCR errors, abbreviations); validate against TREC which uses realistic synthetic cases | |
| | **MedGemma hallucination on edge cases** | Implement evidence-pointer system (every decision must cite doc ID + span); flag low-confidence as "unknown" not "met" | |
| | **API rate limits** | Cache trial protocols; batch requests during search refinement | |
| | **Regulatory misunderstanding** | Explicit "information only, not medical advice" framing throughout UI; follow MedGemma model card guidance on validation/adaptation | |
| |
| --- |
| |
| ## 10. Deliverables for HAI-DEF Submission |
| |
| ### Video Demo (~5-7 min) |
| - Patient persona introduction |
| - Upload β extraction visualization (showing MedGemma in action) |
| - Agentic search loop (showing query refinement) |
| - Match results with traffic-light eligibility cards |
| - Gap-filling iteration (upload biomarker β new matches) |
| - "Share with doctor" packet generation |
| |
| ### Technical Write-up |
| 1. Problem + why HAI-DEF models |
| 2. Architecture diagram (Parlant journey + MedGemma + Gemini + MCP) |
| 3. Data generation pipeline |
| 4. Experiments: extraction, retrieval, ranking (tables + ablations) |
| 5. Limitations + path to real PHI deployment |
| |
| ### Code Repository |
| - `data/generate_synthetic_patients.py` |
| - `data/generate_noisy_pdfs.py` |
| - `matching/medgemma_extractor.py` |
| - `matching/agentic_search.py` (Parlant + Gemini + MCP) |
| - `evaluation/run_trec_benchmark.py` |
| - Clear README with one-command reproducibility |
|
|
| --- |
|
|
| ## 11. Why This Wins HAI-DEF |
|
|
| ### Effective Use of Models (20%) |
| β MedGemma as primary clinical understanding engine (extraction + multimodal) |
| β Concrete demos showing where non-HAI-DEF models fail (extraction accuracy gaps) |
| β Plan for task-specific evaluation showing measurable improvement |
|
|
| ### Problem Domain (15%) |
| β Clear unmet need (low trial enrollment, manual screening burden) |
| β Patient-centric storytelling ("Anna's journey") |
| β Evidence-based magnitude (enrollment stats, screening time data) |
|
|
| ### Impact Potential (15%) |
| β Quantified near-term (benchmark improvements) and long-term (enrollment lift) impact |
| β Clear calculation logic grounded in literature |
|
|
| ### Product Feasibility (20%) |
| β Detailed technical architecture (agentic search innovation) |
| β Realistic synthetic data strategy |
| β Concrete evaluation plan with baselines |
| β Deployment considerations (latency, cost, safety) |
|
|
| ### Execution & Communication (30%) |
| β Cohesive narrative across video + write-up + code |
| β Reproducible experiments |
| β Clear explanation of design choices |
| β Professional polish (evidence pointers, explanations, UX details) |
|
|
| --- |
|
|
| **Timeline:** 3 months to PoC demo ready for HAI-DEF submission. |
|
|
| **Team needs:** 1 ML engineer (MedGemma fine-tuning + evaluation), 1 full-stack engineer (web app + Parlant orchestration), 1 CPO (coordination + submission materials). |
|
|