Unconditional Image Generation
Diffusers
Safetensors
RAEDiTPipeline
rae
rae-dit
diffusion-transformer
imagenet-256
arxiv:2510.11690
Instructions to use plugyawn/rae-dit-s-ep14-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use plugyawn/rae-dit-s-ep14-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("plugyawn/rae-dit-s-ep14-diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| library_name: diffusers | |
| pipeline_tag: unconditional-image-generation | |
| license: mit | |
| tags: | |
| - diffusers | |
| - rae | |
| - rae-dit | |
| - diffusion-transformer | |
| - imagenet-256 | |
| - arxiv:2510.11690 | |
| # RAE-DiT-S ep14 Diffusers conversion | |
| This is a Diffusers-format conversion of the public RAE Stage-2 ImageNet-256 checkpoint `DiTDH-S_ep14`, bundled with the public Stage-1 RAE `nyu-visionx/RAE-dinov2-wReg-base-ViTXL-n08`. | |
| It is intended as a lightweight test artifact for the Diffusers RAE-DiT PR: https://github.com/huggingface/diffusers/pull/13231 | |
| ## Source assets | |
| - Stage-1 RAE: `nyu-visionx/RAE-dinov2-wReg-base-ViTXL-n08` | |
| - Stage-2 upstream weights: `nyu-visionx/RAE-collections`, file `DiTs/Dinov2/wReg_base/ImageNet256/DiTDH-S_ep14/stage2_model.pt` | |
| - Upstream code/configs: https://github.com/bytetriper/RAE, config `configs/stage2/training/ImageNet256/DiTDH-S_DINOv2-B.yaml` | |
| ## Usage | |
| Until PR #13231 is merged, install Diffusers from the PR branch first: | |
| ```bash | |
| pip install git+https://github.com/plugyawn/diffusers.git@rae-dit-training | |
| ``` | |
| Then run: | |
| ```python | |
| import torch | |
| from diffusers import RAEDiTPipeline | |
| repo_id = "plugyawn/rae-dit-s-ep14-diffusers" | |
| pipe = RAEDiTPipeline.from_pretrained(repo_id, torch_dtype=torch.bfloat16).to("cuda") | |
| generator = torch.Generator(device="cuda").manual_seed(0) | |
| image = pipe( | |
| class_labels=207, | |
| num_inference_steps=25, | |
| guidance_scale=1.0, | |
| generator=generator, | |
| ).images[0] | |
| image.save("rae_dit_class207.png") | |
| ``` | |
| `class_labels` are ImageNet-1k class ids. | |
| ## Validation | |
| The conversion was validated against the upstream implementation on an A100. With matched initial latent noise, class label, and schedule, the converted model matched upstream with approximately `max_abs_error=1.10e-5` on transformer outputs and `max_abs_error=6.46e-5` on a fixed-seed 25-step decoded sample. | |