Upload folder using huggingface_hub

28463c6 verified 10 days ago

4.52 kB

	---
	license: apache-2.0
	library_name: diffusers
	pipeline_tag: unconditional-image-generation
	tags:
	- diffusers
	- image-generation
	- class-conditional
	- imagenet
	- dico
	- latent-diffusion
	- convnet
	widget:
	- text: golden retriever
	output:
	url: DiCo-XL-256/demo.png
	inference: true
	---

	# BiliSakura/DiCo-diffusers

	Self-contained DiCo checkpoints for Hugging Face diffusers. Each variant folder ships its own `pipeline.py`, component modules, and weights.

	Converted from [shallowdream204/DiCo](https://huggingface.co/shallowdream204/DiCo) using [DiCo-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/DiCo-diffusers).

	## Available checkpoints

	\| Subfolder \| Pipeline \| Resolution \| Source checkpoint \| CFG \| FID \| IS \| Params \|
	\| --- \| --- \| ---: \| --- \| ---: \| ---: \| ---: \| ---: \|
	\| [`DiCo-S-256/`](DiCo-S-256/) \| `DiCoPipeline` \| 256×256 \| `DiCo-S-400K-256x256.pt` \| 1.0 \| 49.97 \| 31.38 \| 33M \|
	\| [`DiCo-B-256/`](DiCo-B-256/) \| `DiCoPipeline` \| 256×256 \| `DiCo-B-400K-256x256.pt` \| 1.0 \| 27.20 \| 56.52 \| 130M \|
	\| [`DiCo-L-256/`](DiCo-L-256/) \| `DiCoPipeline` \| 256×256 \| `DiCo-L-400K-256x256.pt` \| 1.0 \| 13.66 \| 91.37 \| 464M \|
	\| [`DiCo-XL-256/`](DiCo-XL-256/) \| `DiCoPipeline` \| 256×256 \| `DiCo-XL-3750K-256x256.pt` \| 1.4 \| 2.05 \| 282.17 \| 701M \|

	DiCo denoises VAE latents (4 channels, 32×32 for 256×256 images) with a ConvNet U-Net and multi-scale adaLN conditioning. VAE: `stabilityai/sd-vae-ft-ema`. Scheduler: `DDIMScheduler` (1000 train steps, linear betas).

	## Repo layout

	```text
	BiliSakura/DiCo-diffusers/
	├── README.md
	├── demo_inference.py
	├── DiCo-S-256/
	├── DiCo-B-256/
	├── DiCo-L-256/
	└── DiCo-XL-256/
	├── pipeline.py
	├── model_index.json
	├── demo.png
	├── scheduler/scheduler_config.json
	├── transformer/
	└── vae/
	```

	Each variant is self-contained. The `scheduler/` folder uses built-in `DDIMScheduler` from PyPI diffusers.

	## ImageNet class labels

	`id2label` is embedded in each variant's `model_index.json` (DiT-style).

	- `pipe.id2label` — inspect id → English label correspondence
	- `pipe.labels` — reverse map (English synonym → id)
	- `pipe.get_label_ids("golden retriever")`
	- `pipe(class_labels="golden retriever", ...)` — string labels resolved automatically

	## Demo

	![DiCo-XL-256 demo](DiCo-XL-256/demo.png)

	Class 207 — golden retriever, 256×256, 250 steps, `guidance_scale=1.4`.

	```bash
	python demo_inference.py
	python demo_inference.py --variant s # DiCo-S-256, CFG 1.0
	```

	## Load from a local clone

	### ImageNet 256×256 (`DiCo-XL-256`)

	```python
	from pathlib import Path
	import torch
	from diffusers import DiffusionPipeline

	model_dir = Path("./DiCo-XL-256").resolve()
	pipe = DiffusionPipeline.from_pretrained(
	str(model_dir),
	local_files_only=True,
	custom_pipeline=str(model_dir / "pipeline.py"),
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	)
	pipe.to("cuda")

	print(pipe.id2label[207])
	print(pipe.get_label_ids("golden retriever"))

	generator = torch.Generator(device="cuda").manual_seed(0)
	image = pipe(
	class_labels="golden retriever",
	height=256,
	width=256,
	num_inference_steps=250,
	guidance_scale=1.4,
	generator=generator,
	).images[0]
	image.save("demo.png")
	```

	## Recommended inference settings

	\| Variant \| Steps \| CFG scale \|
	\| --- \| ---: \| ---: \|
	\| `DiCo-S-256` \| 250 \| 1.0 \|
	\| `DiCo-B-256` \| 250 \| 1.0 \|
	\| `DiCo-L-256` \| 250 \| 1.0 \|
	\| `DiCo-XL-256` \| 250 \| 1.4 \|

	Classifier-free guidance applies to the first 3 latent channels only (DiT reproducibility convention).

	## Conversion

	```bash
	cd libs/DiCo-diffusers

	python scripts/convert_dico_to_diffusers.py \
	--checkpoint /path/to/DiCo-XL-3750K-256x256.pt \
	--output /path/to/DiCo-XL-256 \
	--model-type DiCo-XL \
	--weights ema \
	--safe-serialization \
	--id2label ../../src/labels/id2label_en.json
	```

	## Citation

	```bibtex
	@inproceedings{ai2025dico,
	title={DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling},
	author={Yuang Ai and Qihang Fan and Xuefeng Hu and Zhenheng Yang and Ran He and Huaibo Huang},
	booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
	year={2025},
	url={https://openreview.net/forum?id=UnslcaZSnb}
	}
	```

	## License

	Weights are converted from checkpoints released under the [Apache 2.0 license](https://huggingface.co/shallowdream204/DiCo).