Upload converted RAE-DiT-S ep14 Diffusers pipeline

84f7e37 verified 8 days ago

1.84 kB

	---
	library_name: diffusers
	pipeline_tag: unconditional-image-generation
	license: mit
	tags:
	- diffusers
	- rae
	- rae-dit
	- diffusion-transformer
	- imagenet-256
	- arxiv:2510.11690
	---

	# RAE-DiT-S ep14 Diffusers conversion

	This is a Diffusers-format conversion of the public RAE Stage-2 ImageNet-256 checkpoint `DiTDH-S_ep14`, bundled with the public Stage-1 RAE `nyu-visionx/RAE-dinov2-wReg-base-ViTXL-n08`.

	It is intended as a lightweight test artifact for the Diffusers RAE-DiT PR: https://github.com/huggingface/diffusers/pull/13231

	## Source assets

	- Stage-1 RAE: `nyu-visionx/RAE-dinov2-wReg-base-ViTXL-n08`
	- Stage-2 upstream weights: `nyu-visionx/RAE-collections`, file `DiTs/Dinov2/wReg_base/ImageNet256/DiTDH-S_ep14/stage2_model.pt`
	- Upstream code/configs: https://github.com/bytetriper/RAE, config `configs/stage2/training/ImageNet256/DiTDH-S_DINOv2-B.yaml`

	## Usage

	Until PR #13231 is merged, install Diffusers from the PR branch first:

	```bash
	pip install git+https://github.com/plugyawn/diffusers.git@rae-dit-training
	```

	Then run:

	```python
	import torch
	from diffusers import RAEDiTPipeline

	repo_id = "plugyawn/rae-dit-s-ep14-diffusers"
	pipe = RAEDiTPipeline.from_pretrained(repo_id, torch_dtype=torch.bfloat16).to("cuda")

	generator = torch.Generator(device="cuda").manual_seed(0)
	image = pipe(
	class_labels=207,
	num_inference_steps=25,
	guidance_scale=1.0,
	generator=generator,
	).images[0]
	image.save("rae_dit_class207.png")
	```

	`class_labels` are ImageNet-1k class ids.

	## Validation

	The conversion was validated against the upstream implementation on an A100. With matched initial latent noise, class label, and schedule, the converted model matched upstream with approximately `max_abs_error=1.10e-5` on transformer outputs and `max_abs_error=6.46e-5` on a fixed-seed 25-step decoded sample.