--- library_name: diffusers pipeline_tag: unconditional-image-generation license: mit tags: - diffusers - rae - rae-dit - diffusion-transformer - imagenet-256 - arxiv:2510.11690 --- # RAE-DiT-S ep14 Diffusers conversion This is a Diffusers-format conversion of the public RAE Stage-2 ImageNet-256 checkpoint `DiTDH-S_ep14`, bundled with the public Stage-1 RAE `nyu-visionx/RAE-dinov2-wReg-base-ViTXL-n08`. It is intended as a lightweight test artifact for the Diffusers RAE-DiT PR: https://github.com/huggingface/diffusers/pull/13231 ## Source assets - Stage-1 RAE: `nyu-visionx/RAE-dinov2-wReg-base-ViTXL-n08` - Stage-2 upstream weights: `nyu-visionx/RAE-collections`, file `DiTs/Dinov2/wReg_base/ImageNet256/DiTDH-S_ep14/stage2_model.pt` - Upstream code/configs: https://github.com/bytetriper/RAE, config `configs/stage2/training/ImageNet256/DiTDH-S_DINOv2-B.yaml` ## Usage Until PR #13231 is merged, install Diffusers from the PR branch first: ```bash pip install git+https://github.com/plugyawn/diffusers.git@rae-dit-training ``` Then run: ```python import torch from diffusers import RAEDiTPipeline repo_id = "plugyawn/rae-dit-s-ep14-diffusers" pipe = RAEDiTPipeline.from_pretrained(repo_id, torch_dtype=torch.bfloat16).to("cuda") generator = torch.Generator(device="cuda").manual_seed(0) image = pipe( class_labels=207, num_inference_steps=25, guidance_scale=1.0, generator=generator, ).images[0] image.save("rae_dit_class207.png") ``` `class_labels` are ImageNet-1k class ids. ## Validation The conversion was validated against the upstream implementation on an A100. With matched initial latent noise, class label, and schedule, the converted model matched upstream with approximately `max_abs_error=1.10e-5` on transformer outputs and `max_abs_error=6.46e-5` on a fixed-seed 25-step decoded sample.