Tracer β€” MedGemma Fine-tuned for Diagnostic Hypothesis Extraction

Fine-tuned adapter for MedGemma 1.5 4B-it trained to extract structured diagnostic hypotheses from clinical notes.

What it does

Given a physician's clinical note, extracts 6 structured fields:

  • Primary Hypothesis β€” most likely diagnosis
  • Differential Diagnoses β€” alternative diagnoses to consider
  • Key Supporting Evidence β€” clinical findings supporting the hypothesis
  • Urgency Level β€” high / medium / low
  • Tests Ordered β€” diagnostic workup
  • Clinical Reasoning β€” explanation of clinical logic

Training

  • Base model: google/medgemma-1.5-4b-it
  • Method: LoRA fine-tuning (r=16, alpha=32)
  • Dataset: 421 high-quality clinical note examples across 20+ medical specialties
  • Validation loss: 0.866 (19.4% improvement over base)
  • Platform: Kaggle T4 GPU

Performance

Evaluated against 7 comparable open-source models (all ≀7B, zero-shot):

  • 100% valid structure rate (β‰₯5/6 fields present)
  • 90% urgency accuracy
  • Outperforms Base MedGemma 4B zero-shot on structured output reliability
Model Size Valid Structure Urgency Accuracy
Tracer (fine-tuned MedGemma) 4B 100% 90%
Base MedGemma 4B (zero-shot) 4B 50% 90%
DeepSeek-R1 7B (zero-shot) 7B 100% 90%
Gemma 2 2B (zero-shot) 2B 100% 60%
BioMistral 7B (zero-shot) 7B 0% 0%
Meditron 7B (zero-shot) 7B 0% 0%

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained(
    "google/medgemma-1.5-4b-it",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("google/medgemma-1.5-4b-it")
model = PeftModel.from_pretrained(base, "adishofwhat/tracer-medgemma-v2")
model.eval()

note = "Your clinical note text here..."
prompt = (
    "<start_of_turn>user\n"
    "Extract diagnostic information from this clinical note.\n\n"
    f"Clinical Note:\n{note}\n\n"
    "Output ONLY these 6 fields:\n"
    "PRIMARY HYPOTHESIS: [main diagnosis]\n"
    "DIFFERENTIAL DIAGNOSES: [comma-separated alternatives]\n"
    "KEY SUPPORTING EVIDENCE: [comma-separated findings]\n"
    "URGENCY LEVEL: [high/medium/low]\n"
    "TESTS ORDERED: [comma-separated tests]\n"
    "CLINICAL REASONING: [brief explanation]"
    "<end_of_turn>\n<start_of_turn>model\n"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=600, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True).split("model\n")[-1])

Part of

Tracer β€” MedGemma Impact Challenge submission. Addresses ambulatory diagnostic errors (40,000–80,000 deaths/year in the US). Tracer uses a 4-agent pipeline to extract diagnostic hypotheses, monitor test results, and alert physicians to open diagnostic loops before patients fall through the cracks.

Downloads last month
7
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for adishofwhat/tracer-medgemma-v2

Adapter
(36)
this model