Unconditional Image Generation
Diffusers
Safetensors
image-generation
class-conditional
imagenet
dico
latent-diffusion
convnet
Instructions to use BiliSakura/DiCo-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use BiliSakura/DiCo-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("BiliSakura/DiCo-diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "golden retriever" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: diffusers | |
| pipeline_tag: unconditional-image-generation | |
| tags: | |
| - diffusers | |
| - image-generation | |
| - class-conditional | |
| - imagenet | |
| - dico | |
| - latent-diffusion | |
| - convnet | |
| widget: | |
| - text: golden retriever | |
| output: | |
| url: DiCo-XL-256/demo.png | |
| inference: true | |
| # BiliSakura/DiCo-diffusers | |
| Self-contained DiCo checkpoints for Hugging Face diffusers. Each variant folder ships its own `pipeline.py`, component modules, and weights. | |
| Converted from [shallowdream204/DiCo](https://huggingface.co/shallowdream204/DiCo) using [DiCo-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/DiCo-diffusers). | |
| ## Available checkpoints | |
| | Subfolder | Pipeline | Resolution | Source checkpoint | CFG | FID | IS | Params | | |
| | --- | --- | ---: | --- | ---: | ---: | ---: | ---: | | |
| | [`DiCo-S-256/`](DiCo-S-256/) | `DiCoPipeline` | 256Γ256 | `DiCo-S-400K-256x256.pt` | 1.0 | 49.97 | 31.38 | 33M | | |
| | [`DiCo-B-256/`](DiCo-B-256/) | `DiCoPipeline` | 256Γ256 | `DiCo-B-400K-256x256.pt` | 1.0 | 27.20 | 56.52 | 130M | | |
| | [`DiCo-L-256/`](DiCo-L-256/) | `DiCoPipeline` | 256Γ256 | `DiCo-L-400K-256x256.pt` | 1.0 | 13.66 | 91.37 | 464M | | |
| | [`DiCo-XL-256/`](DiCo-XL-256/) | `DiCoPipeline` | 256Γ256 | `DiCo-XL-3750K-256x256.pt` | 1.4 | 2.05 | 282.17 | 701M | | |
| DiCo denoises **VAE latents** (4 channels, 32Γ32 for 256Γ256 images) with a ConvNet U-Net and multi-scale adaLN conditioning. VAE: `stabilityai/sd-vae-ft-ema`. Scheduler: `DDIMScheduler` (1000 train steps, linear betas). | |
| ## Repo layout | |
| ```text | |
| BiliSakura/DiCo-diffusers/ | |
| βββ README.md | |
| βββ demo_inference.py | |
| βββ DiCo-S-256/ | |
| βββ DiCo-B-256/ | |
| βββ DiCo-L-256/ | |
| βββ DiCo-XL-256/ | |
| βββ pipeline.py | |
| βββ model_index.json | |
| βββ demo.png | |
| βββ scheduler/scheduler_config.json | |
| βββ transformer/ | |
| βββ vae/ | |
| ``` | |
| Each variant is self-contained. The `scheduler/` folder uses built-in `DDIMScheduler` from PyPI diffusers. | |
| ## ImageNet class labels | |
| `id2label` is embedded in each variant's `model_index.json` (DiT-style). | |
| - `pipe.id2label` β inspect id β English label correspondence | |
| - `pipe.labels` β reverse map (English synonym β id) | |
| - `pipe.get_label_ids("golden retriever")` | |
| - `pipe(class_labels="golden retriever", ...)` β string labels resolved automatically | |
| ## Demo | |
|  | |
| Class 207 β golden retriever, 256Γ256, 250 steps, `guidance_scale=1.4`. | |
| ```bash | |
| python demo_inference.py | |
| python demo_inference.py --variant s # DiCo-S-256, CFG 1.0 | |
| ``` | |
| ## Load from a local clone | |
| ### ImageNet 256Γ256 (`DiCo-XL-256`) | |
| ```python | |
| from pathlib import Path | |
| import torch | |
| from diffusers import DiffusionPipeline | |
| model_dir = Path("./DiCo-XL-256").resolve() | |
| pipe = DiffusionPipeline.from_pretrained( | |
| str(model_dir), | |
| local_files_only=True, | |
| custom_pipeline=str(model_dir / "pipeline.py"), | |
| trust_remote_code=True, | |
| torch_dtype=torch.bfloat16, | |
| ) | |
| pipe.to("cuda") | |
| print(pipe.id2label[207]) | |
| print(pipe.get_label_ids("golden retriever")) | |
| generator = torch.Generator(device="cuda").manual_seed(0) | |
| image = pipe( | |
| class_labels="golden retriever", | |
| height=256, | |
| width=256, | |
| num_inference_steps=250, | |
| guidance_scale=1.4, | |
| generator=generator, | |
| ).images[0] | |
| image.save("demo.png") | |
| ``` | |
| ## Recommended inference settings | |
| | Variant | Steps | CFG scale | | |
| | --- | ---: | ---: | | |
| | `DiCo-S-256` | 250 | 1.0 | | |
| | `DiCo-B-256` | 250 | 1.0 | | |
| | `DiCo-L-256` | 250 | 1.0 | | |
| | `DiCo-XL-256` | 250 | 1.4 | | |
| Classifier-free guidance applies to the first 3 latent channels only (DiT reproducibility convention). | |
| ## Conversion | |
| ```bash | |
| cd libs/DiCo-diffusers | |
| python scripts/convert_dico_to_diffusers.py \ | |
| --checkpoint /path/to/DiCo-XL-3750K-256x256.pt \ | |
| --output /path/to/DiCo-XL-256 \ | |
| --model-type DiCo-XL \ | |
| --weights ema \ | |
| --safe-serialization \ | |
| --id2label ../../src/labels/id2label_en.json | |
| ``` | |
| ## Citation | |
| ```bibtex | |
| @inproceedings{ai2025dico, | |
| title={DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling}, | |
| author={Yuang Ai and Qihang Fan and Xuefeng Hu and Zhenheng Yang and Ran He and Huaibo Huang}, | |
| booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, | |
| year={2025}, | |
| url={https://openreview.net/forum?id=UnslcaZSnb} | |
| } | |
| ``` | |
| ## License | |
| Weights are converted from checkpoints released under the [Apache 2.0 license](https://huggingface.co/shallowdream204/DiCo). | |