plugyawn's picture
Upload converted RAE-DiT-S ep14 Diffusers pipeline
84f7e37 verified
---
library_name: diffusers
pipeline_tag: unconditional-image-generation
license: mit
tags:
- diffusers
- rae
- rae-dit
- diffusion-transformer
- imagenet-256
- arxiv:2510.11690
---
# RAE-DiT-S ep14 Diffusers conversion
This is a Diffusers-format conversion of the public RAE Stage-2 ImageNet-256 checkpoint `DiTDH-S_ep14`, bundled with the public Stage-1 RAE `nyu-visionx/RAE-dinov2-wReg-base-ViTXL-n08`.
It is intended as a lightweight test artifact for the Diffusers RAE-DiT PR: https://github.com/huggingface/diffusers/pull/13231
## Source assets
- Stage-1 RAE: `nyu-visionx/RAE-dinov2-wReg-base-ViTXL-n08`
- Stage-2 upstream weights: `nyu-visionx/RAE-collections`, file `DiTs/Dinov2/wReg_base/ImageNet256/DiTDH-S_ep14/stage2_model.pt`
- Upstream code/configs: https://github.com/bytetriper/RAE, config `configs/stage2/training/ImageNet256/DiTDH-S_DINOv2-B.yaml`
## Usage
Until PR #13231 is merged, install Diffusers from the PR branch first:
```bash
pip install git+https://github.com/plugyawn/diffusers.git@rae-dit-training
```
Then run:
```python
import torch
from diffusers import RAEDiTPipeline
repo_id = "plugyawn/rae-dit-s-ep14-diffusers"
pipe = RAEDiTPipeline.from_pretrained(repo_id, torch_dtype=torch.bfloat16).to("cuda")
generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(
class_labels=207,
num_inference_steps=25,
guidance_scale=1.0,
generator=generator,
).images[0]
image.save("rae_dit_class207.png")
```
`class_labels` are ImageNet-1k class ids.
## Validation
The conversion was validated against the upstream implementation on an A100. With matched initial latent noise, class label, and schedule, the converted model matched upstream with approximately `max_abs_error=1.10e-5` on transformer outputs and `max_abs_error=6.46e-5` on a fixed-seed 25-step decoded sample.