Tracer β MedGemma Fine-tuned for Diagnostic Hypothesis Extraction
Fine-tuned adapter for MedGemma 1.5 4B-it trained to extract structured diagnostic hypotheses from clinical notes.
What it does
Given a physician's clinical note, extracts 6 structured fields:
- Primary Hypothesis β most likely diagnosis
- Differential Diagnoses β alternative diagnoses to consider
- Key Supporting Evidence β clinical findings supporting the hypothesis
- Urgency Level β high / medium / low
- Tests Ordered β diagnostic workup
- Clinical Reasoning β explanation of clinical logic
Training
- Base model: google/medgemma-1.5-4b-it
- Method: LoRA fine-tuning (r=16, alpha=32)
- Dataset: 421 high-quality clinical note examples across 20+ medical specialties
- Validation loss: 0.866 (19.4% improvement over base)
- Platform: Kaggle T4 GPU
Performance
Evaluated against 7 comparable open-source models (all β€7B, zero-shot):
- 100% valid structure rate (β₯5/6 fields present)
- 90% urgency accuracy
- Outperforms Base MedGemma 4B zero-shot on structured output reliability
| Model | Size | Valid Structure | Urgency Accuracy |
|---|---|---|---|
| Tracer (fine-tuned MedGemma) | 4B | 100% | 90% |
| Base MedGemma 4B (zero-shot) | 4B | 50% | 90% |
| DeepSeek-R1 7B (zero-shot) | 7B | 100% | 90% |
| Gemma 2 2B (zero-shot) | 2B | 100% | 60% |
| BioMistral 7B (zero-shot) | 7B | 0% | 0% |
| Meditron 7B (zero-shot) | 7B | 0% | 0% |
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch
base = AutoModelForCausalLM.from_pretrained(
"google/medgemma-1.5-4b-it",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("google/medgemma-1.5-4b-it")
model = PeftModel.from_pretrained(base, "adishofwhat/tracer-medgemma-v2")
model.eval()
note = "Your clinical note text here..."
prompt = (
"<start_of_turn>user\n"
"Extract diagnostic information from this clinical note.\n\n"
f"Clinical Note:\n{note}\n\n"
"Output ONLY these 6 fields:\n"
"PRIMARY HYPOTHESIS: [main diagnosis]\n"
"DIFFERENTIAL DIAGNOSES: [comma-separated alternatives]\n"
"KEY SUPPORTING EVIDENCE: [comma-separated findings]\n"
"URGENCY LEVEL: [high/medium/low]\n"
"TESTS ORDERED: [comma-separated tests]\n"
"CLINICAL REASONING: [brief explanation]"
"<end_of_turn>\n<start_of_turn>model\n"
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=600, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True).split("model\n")[-1])
Part of
Tracer β MedGemma Impact Challenge submission. Addresses ambulatory diagnostic errors (40,000β80,000 deaths/year in the US). Tracer uses a 4-agent pipeline to extract diagnostic hypotheses, monitor test results, and alert physicians to open diagnostic loops before patients fall through the cracks.
- Downloads last month
- 7
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for adishofwhat/tracer-medgemma-v2
Base model
google/medgemma-1.5-4b-it