Upload folder using huggingface_hub

e51b0dd verified 3 days ago

3.81 kB

	---
	license: apache-2.0
	library_name: diffusers
	pipeline_tag: unconditional-image-generation
	tags:
	- diffusers
	- fit
	- image-generation
	- class-conditional
	- imagenet
	inference: true
	widget:
	- output:
	url: FiTv1-XL-2-256/demo.png
	language:
	- en
	---

	# FiT-diffusers

	Diffusers-ready checkpoints for Flexible Vision Transformer (FiT) and FiTv2, converted from [`InfImagine/FiT`](https://huggingface.co/InfImagine/FiT) / [`InfImagine/FiTv2`](https://huggingface.co/InfImagine/FiTv2).

	> Re-distribution notice: weights and configs in this repo are re-packaged from the official InfImagine releases. Original work: [FiT (arXiv:2402.12376)](https://arxiv.org/pdf/2402.12376.pdf), [FiTv2 (arXiv:2410.13925)](https://arxiv.org/pdf/2410.13925). License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).

	This repo is derived from the development bundle in [Visual-Generative-Foundation-Model-Collection](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection). Inference only needs:

	- This model repo (`BiliSakura/FiT-diffusers`)
	- PyPI `diffusers`, `torch`, `safetensors`

	Each variant subfolder is a self-contained Diffusers model repo with:

	- `model_index.json` (includes ImageNet `id2label`)
	- `pipeline.py` (`FiTPipeline` for FiTv1, `FiTv2Pipeline` for FiTv2)
	- `transformer/fit_transformer_2d.py` and weights
	- `scheduler/scheduler_config.json`
	- `vae/diffusion_pytorch_model.safetensors` (`stabilityai/sd-vae-ft-ema`)

	## Demo

	![FiTv1-XL-2-256 demo](FiTv1-XL-2-256/demo.png)

	Class-conditional sample (ImageNet class 207, golden retriever), `FiTv1-XL/2` at 256×256, 250 steps, CFG 1.5, seed 42.

	## Available checkpoints

	\| Checkpoint \| Path \| Resolution \| Sampler \| Steps \| CFG \| FID (native res) \|
	\| --- \| --- \| --- \| --- \| --- \| --- \| --- \|
	\| FiTv1-XL/2 \| [`FiTv1-XL-2-256/`](FiTv1-XL-2-256/) \| 256×256 \| improved diffusion (DDPM respaced) \| 250 \| 1.5 \| 4.21 \|
	\| FiTv2-XL/2 \| [`FiTv2-XL-2-256/`](FiTv2-XL-2-256/) \| 256×256 \| flow matching (velocity ODE) \| 250 \| 1.5 \| 2.26 \|
	\| FiTv2-3B/2 \| [`FiTv2-3B-2-256/`](FiTv2-3B-2-256/) \| 256×256 \| flow matching (velocity ODE) \| 250 \| 1.5 \| 2.15 \|
	\| FiTv2-HR-XL/2 \| [`FiTv2-XL-2-512/`](FiTv2-XL-2-512/) \| 512×512 \| flow matching (velocity ODE) \| 250 \| 1.5 \| 2.90 \|
	\| FiTv2-HR-3B/2 \| [`FiTv2-3B-2-512/`](FiTv2-3B-2-512/) \| 512×512 \| flow matching (velocity ODE) \| 250 \| 1.5 \| 2.41 \|

	## Inference

	```python
	from pathlib import Path
	import torch
	from diffusers import DiffusionPipeline

	model_dir = Path("./FiTv2-XL-2-256")
	pipe = DiffusionPipeline.from_pretrained(
	str(model_dir),
	local_files_only=True,
	custom_pipeline=str(model_dir / "pipeline.py"),
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	).to("cuda")

	generator = torch.Generator(device="cuda").manual_seed(42)
	image = pipe(
	class_labels="golden retriever",
	height=256,
	width=256,
	num_inference_steps=250,
	guidance_scale=1.5,
	generator=generator,
	).images[0]
	image.save("demo.png")
	```

	Load a variant subfolder (e.g. `./FiTv2-XL-2-256`), not the repo root. For FiTv1, use `./FiTv1-XL-2-256` with the same call pattern (`FiTPipeline` + DDPM scheduler).

	## Repo layout

	```text
	BiliSakura/FiT-diffusers/
	├── README.md
	├── FiTv1-XL-2-256/
	├── FiTv2-XL-2-256/
	├── FiTv2-3B-2-256/
	├── FiTv2-XL-2-512/
	└── FiTv2-3B-2-512/
	├── README.md
	├── model_index.json
	├── pipeline.py
	├── demo.png
	├── transformer/
	│ ├── config.json
	│ ├── fit_transformer_2d.py
	│ └── diffusion_pytorch_model.safetensors
	├── vae/
	│ ├── config.json
	│ └── diffusion_pytorch_model.safetensors
	└── scheduler/
	└── scheduler_config.json
	```