# HAI-DEF Pitch: MedGemma Match – Patient Trial Copilot

**PoC Goal:** Demonstrate MedGemma + Gemini 3 Pro + Parlant agentic architecture for patient-facing clinical trial matching with **explainable eligibility reasoning** and **iterative gap-filling**.

---

## 1. Problem & Unmet Need

### The Challenge
- **Low trial participation:** <5% of adult cancer patients enroll in clinical trials despite potential eligibility
- **Complex eligibility criteria:** Free-text criteria mix demographics, biomarkers, labs, imaging findings, and treatment history
- **Patient barrier:** Patients receive PDFs/reports but have no way to understand which trials fit their situation
- **Manual screening burden:** Clinicians spend hours per patient manually reviewing eligibility; automated tools show mixed real-world performance

### Why AI? Why Now?
- Eligibility criteria require synthesis across multiple document types (pathology, labs, imaging, treatment history)—impossible with keyword search alone
- Recent LLM-based matching systems (TrialGPT, PRISM) show promise but lack patient-centric design and multimodal medical understanding
- HAI-DEF open-weight health models enable privacy-preserving deployment with medical domain expertise

---

## 2. Solution: MedGemma as Clinical Understanding Engine

### Core Concept
**"Agentic Search + Multimodal Extraction"** replacing traditional vector-RAG approaches.

**Architecture:**
- **MedGemma (HAI-DEF):** Extracts structured clinical facts from messy PDFs/reports + understands medical imaging contexts
- **Gemini 3 Pro:** Orchestrates agentic search through ClinicalTrials.gov API with iterative query refinement
- **Parlant:** Enforces state machine (search → filter → verify) and prevents parameter hallucination
- **ClinicalTrials MCP:** Structured API wrapper for trials data (no vector DB needed)

### Why MedGemma is Central (Not Replaceable)
1. **Multimodal medical reasoning:** Designed for radiology reports, pathology, labs—where generic LLMs are weaker
2. **Domain-aligned extraction:** Medical entity recognition with units, dates, and clinical context preservation
3. **Open weights:** Enables VPC deployment for future PHI handling (vs closed-weight alternatives)
4. **Health-safety guardrails:** Model card emphasizes validation/adaptation patterns we follow

---

## 3. User Journey (Patient-Centric)

### Target User (PoC Persona)
**"Anna"** – 52-year-old NSCLC patient in Berlin with PDFs from her oncologist but no trial navigation support.

### Journey Flow
1. **Upload Documents** → Clinic letter, pathology report, lab results (synthetic PDFs in PoC)
2. **MedGemma Extraction** → System builds "My Clinical Profile (draft)": Stage IVa, EGFR status unknown, ECOG 1
3. **Agentic Search** → Gemini queries ClinicalTrials.gov via MCP:
   - Initial: `condition=NSCLC, location=DE, status=RECRUITING, keywords=EGFR` → 47 results
   - Refines: Adds `phase=PHASE3` → 12 results
   - Reads summaries, filters to 5 relevant trials
4. **Eligibility Analysis** → For each trial, MedGemma evaluates criteria against extracted facts
5. **Gap Identification** → System highlights: *"You'd likely qualify IF you had EGFR mutation test"*
6. **Iteration** → Anna uploads biomarker report → System re-matches → 3 new trials appear
7. **Share with Doctor** → Generate clinician packet with evidence-linked eligibility ledger

### Key Differentiator: The "Gap Analysis"
- We don't just say "No Match"
- We say: **"You would match NCT12345 IF you had: recent brain MRI showing no active CNS disease"**
- This transforms "rejection" into "actionable next steps"

---

## 4. Technical Innovation: Smart Agentic Search (No Vector DB)

### Traditional Approach (What We're *Not* Doing)
```
Patient text → Embeddings → Vector similarity search → 
Retrieve top-K trials → LLM re-ranks
```
**Problem:** Vector search is "dumb" about structured constraints (Phase, Location, Status) and negations.

### Our Approach: Iterative Query Refinement
```
MedGemma extracts "Search Anchors" (Condition, Biomarkers, Location) →
Gemini formulates API query with filters →
ClinicalTrials MCP returns results →
Too many (>50)? → Parlant enforces refinement (add phase/keywords)
Too few (0)? → Parlant enforces relaxation (remove city filter)
Right size (10-30)? → Gemini reads summaries in 2M context window →
Shortlist 5 NCT IDs → Deep eligibility verification with MedGemma
```

**Why This is Better:**
- **Precision:** Leverages native API filters (Phase, Status, Location) that vectors can't handle
- **Transparency:** Every search step is logged and explainable ("I searched X, got Y results, refined to Z")
- **Feasibility:** No vector DB infrastructure; uses live API
- **Showcases Gemini reasoning:** Demonstrates multi-step planning vs one-shot retrieval

---

## 5. MedGemma Showcase Moments (HAI-DEF "Fullest Potential")

### Use Case 1: Temporal Lab Extraction
**Challenge:** Criterion requires "ANC ≥ 1.5 × 10⁹/L within 14 days of enrollment"
- **MedGemma extracts:** Value=1.8, Units=10⁹/L, Date=2026-01-28, DocID=labs_jan.pdf
- **System verifies:** Current date Feb 4 → 7 days ago → ✓ MEETS criterion
- **Evidence link:** User can click to see exact lab table and date

### Use Case 2: Multimodal Imaging Context
**Challenge:** Criterion requires "No active CNS metastases"
- **MedGemma reads:** Brain MRI report text: *"Stable 3mm left frontal lesion, no enhancement, likely scarring from prior SRS"*
- **System interprets:** "Stable" + "no enhancement" + "scarring" → Likely inactive → Flags as ⚠️ UNKNOWN (requires clinician confirmation)
- **Evidence link:** Highlights report section for doctor review

### Use Case 3: Treatment Line Reconstruction
**Challenge:** Criterion excludes "Prior immune checkpoint inhibitor therapy"
- **MedGemma reconstructs:** From medication list and notes → Patient received Pembrolizumab 2024-06 to 2024-11
- **System verifies:** → ✗ EXCLUDED
- **Evidence link:** Shows medication timeline with dates and sources

---

## 6. PoC Scope & Data Strategy

### In Scope (3-Month PoC)
- **Disease:** NSCLC only (complex biomarkers, high trial volume)
- **Data:** Synthetic patients only (no real PHI)
- **Deliverables:** 
  - Working web prototype (video demo)
  - Experimental validation on TREC benchmarks
  - Technical write-up + public code repo

### Data Sources
**Patients (Synthetic):**
- Structured ground truth: Synthea FHIR (500 NSCLC patients)
- Unstructured artifacts: LLM-generated clinic letters + lab PDFs with controlled noise (abbreviations, OCR errors, missing values)

**Trials (Real):**
- ClinicalTrials.gov live API via MCP wrapper
- Focus on NSCLC recruiting trials in Europe + US

**Benchmarking:**
- TREC Clinical Trials Track 2021/2022 (75 patient topics + judged relevance)
- Custom criterion-extraction test set (labeled synthetic reports)

---

## 7. Success Metrics & Evaluation Plan

### Model Performance
| Metric | Target | Baseline | Method |
|--------|--------|----------|--------|
| **MedGemma Extraction F1** | ≥0.85 | Gemini-only: 0.65-0.75 | Field-level (stage, ECOG, biomarkers, labs) on labeled synthetic reports |
| **Trial Retrieval Recall@50** | ≥0.75 | BM25: ~0.60 | TREC 2021 patient topics |
| **Trial Ranking NDCG@10** | ≥0.60 | Non-LLM baseline: ~0.45 | TREC judged relevance |
| **Criterion Decision Accuracy** | ≥0.85 | Rule-based: ~0.70 | Per-criterion classification on synthetic patient-trial pairs |

### Product Quality
- **Latency:** <15s from upload to first match results
- **Explainability:** 100% of "met/not met" decisions must include evidence pointer (trial text + patient doc ID)
- **Cost:** <$0.50 per patient session (token + GPU usage)

### UX Validation (Small Study)
- Task completion: Can lay users identify ≥1 plausible trial from shortlist?
- Explanation clarity: SUS-style usability score ≥70
- Reading level: B1/8th-grade equivalent (Flesch-Kincaid)

---

## 8. Impact Potential

### If PoC Succeeds (Quantified)
**Near-term (PoC phase):**
- Demonstrate 15-25% relative improvement in ranking quality (NDCG) vs non-LLM baselines on TREC benchmarks
- Show multimodal extraction advantage: MedGemma F1 ≥0.10 higher than Gemini-only on medical fields

**Post-PoC (Real-world projection):**
- **Patient impact:** Based on literature showing automated tools can surface 20-30% more eligible trials vs manual search, and considering NSCLC patients often face 50+ active trials but only learn about 2-3 from their oncologist
- **Clinician impact:** Trial coordinators report spending 2-4 hours per patient on manual screening; if our tool pre-screens with 85% sensitivity, reduces manual verification by ~60%
- **Trial enrollment:** Even a 10% increase in eligible patient identification could improve trial recruitment timelines (major pharma pain point)

---

## 9. Risks & Mitigations

| Risk | Mitigation |
|------|-----------|
| **Synthetic data too clean** | Add controlled noise to PDFs (OCR errors, abbreviations); validate against TREC which uses realistic synthetic cases |
| **MedGemma hallucination on edge cases** | Implement evidence-pointer system (every decision must cite doc ID + span); flag low-confidence as "unknown" not "met" |
| **API rate limits** | Cache trial protocols; batch requests during search refinement |
| **Regulatory misunderstanding** | Explicit "information only, not medical advice" framing throughout UI; follow MedGemma model card guidance on validation/adaptation |

---

## 10. Deliverables for HAI-DEF Submission

### Video Demo (~5-7 min)
- Patient persona introduction
- Upload → extraction visualization (showing MedGemma in action)
- Agentic search loop (showing query refinement)
- Match results with traffic-light eligibility cards
- Gap-filling iteration (upload biomarker → new matches)
- "Share with doctor" packet generation

### Technical Write-up
1. Problem + why HAI-DEF models
2. Architecture diagram (Parlant journey + MedGemma + Gemini + MCP)
3. Data generation pipeline
4. Experiments: extraction, retrieval, ranking (tables + ablations)
5. Limitations + path to real PHI deployment

### Code Repository
- `data/generate_synthetic_patients.py`
- `data/generate_noisy_pdfs.py`
- `matching/medgemma_extractor.py`
- `matching/agentic_search.py` (Parlant + Gemini + MCP)
- `evaluation/run_trec_benchmark.py`
- Clear README with one-command reproducibility

---

## 11. Why This Wins HAI-DEF

### Effective Use of Models (20%)
✓ MedGemma as primary clinical understanding engine (extraction + multimodal)  
✓ Concrete demos showing where non-HAI-DEF models fail (extraction accuracy gaps)  
✓ Plan for task-specific evaluation showing measurable improvement

### Problem Domain (15%)
✓ Clear unmet need (low trial enrollment, manual screening burden)  
✓ Patient-centric storytelling ("Anna's journey")  
✓ Evidence-based magnitude (enrollment stats, screening time data)

### Impact Potential (15%)
✓ Quantified near-term (benchmark improvements) and long-term (enrollment lift) impact  
✓ Clear calculation logic grounded in literature

### Product Feasibility (20%)
✓ Detailed technical architecture (agentic search innovation)  
✓ Realistic synthetic data strategy  
✓ Concrete evaluation plan with baselines  
✓ Deployment considerations (latency, cost, safety)

### Execution & Communication (30%)
✓ Cohesive narrative across video + write-up + code  
✓ Reproducible experiments  
✓ Clear explanation of design choices  
✓ Professional polish (evidence pointers, explanations, UX details)

---

**Timeline:** 3 months to PoC demo ready for HAI-DEF submission.

**Team needs:** 1 ML engineer (MedGemma fine-tuning + evaluation), 1 full-stack engineer (web app + Parlant orchestration), 1 CPO (coordination + submission materials).