diffusion-p / README.md

nielsr HF Staff

Improve model card and add metadata

bd6e0c6 verified about 1 month ago

2.29 kB

license: apache-2.0
library_name: diffusers
pipeline_tag: text-to-image

SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training

SafeDiffusion-R1 is a safety post-training framework for Stable Diffusion based on Group Relative Policy Optimization (GRPO). It uses a closed-form, CLIP-based steering reward to bake safety priors directly into the UNet weights, eliminating the need for separately trained safety classifiers or inference-time interventions.

Project Page | GitHub | Paper

Model Variants

The models are released as full Diffusers pipelines in different subfolders:

Subfolder	Description
`scaled`	Main paper checkpoint. Best balance of safety and utility (Default).
`compact`	Optimized for lowest MMA-Diffusion ASR (adversarial robustness).
`empty-positive`	Ablation variant trained without safe anchors.

Sample Usage

You can load and use the model variants using the diffusers library. Since the repository uses subfolders for different variants, we recommend using snapshot_download to load the specific version you need.

from huggingface_hub import snapshot_download
from diffusers import StableDiffusionPipeline
import os, torch

# Download the variant you want (e.g., "scaled")
local_root = snapshot_download(
    "ItsMaxNorm/SafeDiffusion-R1",
    allow_patterns="scaled/*",           # or "compact/*" / "empty-positive/*"
)

# Load the pipeline
pipe = StableDiffusionPipeline.from_pretrained(
    os.path.join(local_root, "scaled"),
    torch_dtype=torch.float16,
).to("cuda")

# Generate an image
prompt = "a photo of a cat sleeping on a couch"
img = pipe(prompt).images[0]
img.save("out.png")

Citation

@misc{kumar2026safediffusionr1,
      title={SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training}, 
      author={Komal Kumar and Ankan Deria and Abhishek Basu and Fahad Shamshad and Hisham Cholakkal and Karthik Nandakumar},
      year={2026},
      eprint={2605.18719},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.18719}, 
}