MedGemma 4B - DIPG Safety v2 (GRPO-Trained)

Fine-tuned MedGemma 4B using GRPO (Group Relative Policy Optimization) on the DIPG Safety Gym benchmark.

Training Details

Base Model: MedGemma 4B IT
Method: GRPO with LoRA (rank=64, alpha=64)
Training Steps: 100
Focus: Medical safety — hallucination reduction, evidence-grounded responses

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("surfiniaburger/medgemma-4b-dipg-safety-v2")
model = AutoModelForCausalLM.from_pretrained(
    "surfiniaburger/medgemma-4b-dipg-safety-v2",
    torch_dtype="bfloat16",
    device_map="auto"
)

SafeClaw Project

Part of the DIPG Safety Gym ecosystem.

Downloads last month: 3

Model tree for surfiniaburger/medgemma-4b-dipg-safety-v2

Base model

google/gemma-3-4b-pt

Finetuned

google/medgemma-4b-pt

Finetuned

google/medgemma-4b-it

Finetuned

(592)

this model