MedGemma 4B - DIPG Safety v2 (GRPO-Trained)

Fine-tuned MedGemma 4B using GRPO (Group Relative Policy Optimization) on the DIPG Safety Gym benchmark.

Training Details

  • Base Model: MedGemma 4B IT
  • Method: GRPO with LoRA (rank=64, alpha=64)
  • Training Steps: 100
  • Focus: Medical safety — hallucination reduction, evidence-grounded responses

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("surfiniaburger/medgemma-4b-dipg-safety-v2")
model = AutoModelForCausalLM.from_pretrained(
    "surfiniaburger/medgemma-4b-dipg-safety-v2",
    torch_dtype="bfloat16",
    device_map="auto"
)

SafeClaw Project

Part of the DIPG Safety Gym ecosystem.

Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for surfiniaburger/medgemma-4b-dipg-safety-v2

Finetuned
(592)
this model