Add files using upload-large-folder tool

4c42d10 verified 1 day ago

3.97 kB

	---
	library_name: diffusers
	pipeline_tag: unconditional-image-generation
	tags:
	- diffusers
	- sit
	- image-generation
	- class-conditional
	- imagenet
	license: mit
	inference: true
	---

	# SiT-diffusers

	Diffusers-ready checkpoints for Scalable Interpolant Transformers (SiT), converted for local/offline use.

	This root folder is a model collection that contains:

	- `SiT-S-2-256-diffusers`
	- `SiT-B-2-256-diffusers`
	- `SiT-L-2-256-diffusers`
	- `SiT-XL-2-256-diffusers`
	- `SiT-XL-2-512-diffusers`

	Each subfolder is a self-contained Diffusers model repo with:

	- `pipeline.py`
	- `transformer/transformer_sit.py`
	- `scheduler/scheduling_flow_match_sit.py`
	- `transformer/diffusion_pytorch_model.safetensors`
	- `vae/diffusion_pytorch_model.safetensors`

	## Model Paths

	Use paths relative to this root README:

	\| Model \| Resolution \| Local path \|
	\|---\|---:\|---\|
	\| SiT-S/2 \| 256x256 \| `./SiT-S-2-256-diffusers` \|
	\| SiT-B/2 \| 256x256 \| `./SiT-B-2-256-diffusers` \|
	\| SiT-L/2 \| 256x256 \| `./SiT-L-2-256-diffusers` \|
	\| SiT-XL/2 \| 256x256 \| `./SiT-XL-2-256-diffusers` \|
	\| SiT-XL/2 \| 512x512 \| `./SiT-XL-2-512-diffusers` \|

	## Inference Demo (Diffusers)

	### 1) Load a local subfolder checkpoint

	```python
	import torch
	from diffusers import DiffusionPipeline

	model_path = "./SiT-XL-2-512-diffusers" # change to any path in the table above
	device = "cuda" if torch.cuda.is_available() else "cpu"

	pipe = DiffusionPipeline.from_pretrained(
	model_path,
	trust_remote_code=True,
	).to(device)

	generator = torch.Generator(device=device).manual_seed(0)

	# ImageNet class example: 207 = golden retriever
	result = pipe(
	class_labels=207,
	height=512,
	width=512,
	num_inference_steps=250, # official SiT comparisons commonly use 250 steps
	guidance_scale=4.0,
	generator=generator,
	)

	image = result.images[0]
	image.save("sit_xl_512_demo.png")
	```

	### 2) Quick variant switch (256 models)

	```python
	model_path = "./SiT-S-2-256-diffusers"
	# model_path = "./SiT-B-2-256-diffusers"
	# model_path = "./SiT-L-2-256-diffusers"
	# model_path = "./SiT-XL-2-256-diffusers"

	pipe = DiffusionPipeline.from_pretrained(model_path, trust_remote_code=True).to(device)
	image = pipe(
	class_labels=207,
	height=256,
	width=256,
	num_inference_steps=250,
	guidance_scale=4.0,
	generator=generator,
	).images[0]
	image.save("sit_256_demo.png")
	```

	## FID Reference (from Official SiT Results)

	The table below summarizes widely cited SiT numbers from the official project materials for class-conditional ImageNet generation.

	\| Model / setting \| Resolution \| FID-50K (lower is better) \|
	\|---\|---:\|---:\|
	\| SiT-S (400K steps) \| 256x256 \| 57.6 \|
	\| SiT-B (400K steps) \| 256x256 \| 33.5 \|
	\| SiT-L (400K steps) \| 256x256 \| 17.2 \|
	\| SiT-XL (400K steps) \| 256x256 \| 8.6 \|
	\| SiT-XL (cfg=1.5, ODE) \| 256x256 \| 2.15 \|
	\| SiT-XL (cfg=1.5, SDE, `w(t)=sigma_t`) \| 256x256 \| 2.06 \|
	\| SiT-XL (sample showcase) \| 512x512 \| Not reported in the same benchmark table \|

	> Note: FID depends on training recipe, sampler choice (ODE/SDE), guidance scale, and evaluation protocol. Treat this table as a reference to official SiT reports, not as guaranteed reproducibility for every conversion/export.

	## Source and Paper

	- Official SiT code: [willisma/SiT](https://github.com/willisma/SiT)
	- Project page: [scalable-interpolant.github.io](https://scalable-interpolant.github.io/)
	- Paper (arXiv): [SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers](https://arxiv.org/abs/2401.08740)

	## Citation

	If you use SiT in your work, please cite:

	```bibtex
	@inproceedings{ma2024sit,
	title={SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers},
	author={Ma, Nanye and Goldstein, Mark and Albergo, Michael S. and Boffi, Nicholas M. and Vanden-Eijnden, Eric and Xie, Saining},
	booktitle={European Conference on Computer Vision (ECCV)},
	year={2024},
	note={Accepted to ECCV 2024}
	}
	```