Tracer — MedGemma Fine-tuned for Diagnostic Hypothesis Extraction

Fine-tuned adapter for MedGemma 1.5 4B-it trained to extract structured diagnostic hypotheses from clinical notes.

What it does

Given a physician's clinical note, extracts 6 structured fields:

Primary Hypothesis — most likely diagnosis
Differential Diagnoses — alternative diagnoses to consider
Key Supporting Evidence — clinical findings supporting the hypothesis
Urgency Level — high / medium / low
Tests Ordered — diagnostic workup
Clinical Reasoning — explanation of clinical logic

Training

Base model: google/medgemma-1.5-4b-it
Method: LoRA fine-tuning (r=16, alpha=32)
Dataset: 421 high-quality clinical note examples across 20+ medical specialties
Validation loss: 0.866 (19.4% improvement over base)
Platform: Kaggle T4 GPU

Performance

Evaluated against 7 comparable open-source models (all ≤7B, zero-shot):

100% valid structure rate (≥5/6 fields present)
90% urgency accuracy
Outperforms Base MedGemma 4B zero-shot on structured output reliability

Model	Size	Valid Structure	Urgency Accuracy
Tracer (fine-tuned MedGemma)	4B	100%	90%
Base MedGemma 4B (zero-shot)	4B	50%	90%
DeepSeek-R1 7B (zero-shot)	7B	100%	90%
Gemma 2 2B (zero-shot)	2B	100%	60%
BioMistral 7B (zero-shot)	7B	0%	0%
Meditron 7B (zero-shot)	7B	0%	0%

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained(
    "google/medgemma-1.5-4b-it",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("google/medgemma-1.5-4b-it")
model = PeftModel.from_pretrained(base, "adishofwhat/tracer-medgemma-v2")
model.eval()

note = "Your clinical note text here..."
prompt = (
    "<start_of_turn>user\n"
    "Extract diagnostic information from this clinical note.\n\n"
    f"Clinical Note:\n{note}\n\n"
    "Output ONLY these 6 fields:\n"
    "PRIMARY HYPOTHESIS: [main diagnosis]\n"
    "DIFFERENTIAL DIAGNOSES: [comma-separated alternatives]\n"
    "KEY SUPPORTING EVIDENCE: [comma-separated findings]\n"
    "URGENCY LEVEL: [high/medium/low]\n"
    "TESTS ORDERED: [comma-separated tests]\n"
    "CLINICAL REASONING: [brief explanation]"
    "<end_of_turn>\n<start_of_turn>model\n"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=600, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True).split("model\n")[-1])

Part of

Tracer — MedGemma Impact Challenge submission. Addresses ambulatory diagnostic errors (40,000–80,000 deaths/year in the US). Tracer uses a 4-agent pipeline to extract diagnostic hypotheses, monitor test results, and alert physicians to open diagnostic loops before patients fall through the cracks.

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for adishofwhat/tracer-medgemma-v2

Base model

google/medgemma-1.5-4b-it

Adapter

(58)

this model