--- license: apache-2.0 library_name: diffusers pipeline_tag: unconditional-image-generation tags: - diffusers - image-generation - class-conditional - imagenet - dico - latent-diffusion - convnet widget: - text: golden retriever output: url: DiCo-XL-256/demo.png inference: true --- # BiliSakura/DiCo-diffusers Self-contained DiCo checkpoints for Hugging Face diffusers. Each variant folder ships its own `pipeline.py`, component modules, and weights. Converted from [shallowdream204/DiCo](https://huggingface.co/shallowdream204/DiCo) using [DiCo-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/DiCo-diffusers). ## Available checkpoints | Subfolder | Pipeline | Resolution | Source checkpoint | CFG | FID | IS | Params | | --- | --- | ---: | --- | ---: | ---: | ---: | ---: | | [`DiCo-S-256/`](DiCo-S-256/) | `DiCoPipeline` | 256×256 | `DiCo-S-400K-256x256.pt` | 1.0 | 49.97 | 31.38 | 33M | | [`DiCo-B-256/`](DiCo-B-256/) | `DiCoPipeline` | 256×256 | `DiCo-B-400K-256x256.pt` | 1.0 | 27.20 | 56.52 | 130M | | [`DiCo-L-256/`](DiCo-L-256/) | `DiCoPipeline` | 256×256 | `DiCo-L-400K-256x256.pt` | 1.0 | 13.66 | 91.37 | 464M | | [`DiCo-XL-256/`](DiCo-XL-256/) | `DiCoPipeline` | 256×256 | `DiCo-XL-3750K-256x256.pt` | 1.4 | 2.05 | 282.17 | 701M | DiCo denoises **VAE latents** (4 channels, 32×32 for 256×256 images) with a ConvNet U-Net and multi-scale adaLN conditioning. VAE: `stabilityai/sd-vae-ft-ema`. Scheduler: `DDIMScheduler` (1000 train steps, linear betas). ## Repo layout ```text BiliSakura/DiCo-diffusers/ ├── README.md ├── demo_inference.py ├── DiCo-S-256/ ├── DiCo-B-256/ ├── DiCo-L-256/ └── DiCo-XL-256/ ├── pipeline.py ├── model_index.json ├── demo.png ├── scheduler/scheduler_config.json ├── transformer/ └── vae/ ``` Each variant is self-contained. The `scheduler/` folder uses built-in `DDIMScheduler` from PyPI diffusers. ## ImageNet class labels `id2label` is embedded in each variant's `model_index.json` (DiT-style). - `pipe.id2label` — inspect id → English label correspondence - `pipe.labels` — reverse map (English synonym → id) - `pipe.get_label_ids("golden retriever")` - `pipe(class_labels="golden retriever", ...)` — string labels resolved automatically ## Demo ![DiCo-XL-256 demo](DiCo-XL-256/demo.png) Class 207 — golden retriever, 256×256, 250 steps, `guidance_scale=1.4`. ```bash python demo_inference.py python demo_inference.py --variant s # DiCo-S-256, CFG 1.0 ``` ## Load from a local clone ### ImageNet 256×256 (`DiCo-XL-256`) ```python from pathlib import Path import torch from diffusers import DiffusionPipeline model_dir = Path("./DiCo-XL-256").resolve() pipe = DiffusionPipeline.from_pretrained( str(model_dir), local_files_only=True, custom_pipeline=str(model_dir / "pipeline.py"), trust_remote_code=True, torch_dtype=torch.bfloat16, ) pipe.to("cuda") print(pipe.id2label[207]) print(pipe.get_label_ids("golden retriever")) generator = torch.Generator(device="cuda").manual_seed(0) image = pipe( class_labels="golden retriever", height=256, width=256, num_inference_steps=250, guidance_scale=1.4, generator=generator, ).images[0] image.save("demo.png") ``` ## Recommended inference settings | Variant | Steps | CFG scale | | --- | ---: | ---: | | `DiCo-S-256` | 250 | 1.0 | | `DiCo-B-256` | 250 | 1.0 | | `DiCo-L-256` | 250 | 1.0 | | `DiCo-XL-256` | 250 | 1.4 | Classifier-free guidance applies to the first 3 latent channels only (DiT reproducibility convention). ## Conversion ```bash cd libs/DiCo-diffusers python scripts/convert_dico_to_diffusers.py \ --checkpoint /path/to/DiCo-XL-3750K-256x256.pt \ --output /path/to/DiCo-XL-256 \ --model-type DiCo-XL \ --weights ema \ --safe-serialization \ --id2label ../../src/labels/id2label_en.json ``` ## Citation ```bibtex @inproceedings{ai2025dico, title={DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling}, author={Yuang Ai and Qihang Fan and Xuefeng Hu and Zhenheng Yang and Ran He and Huaibo Huang}, booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}, year={2025}, url={https://openreview.net/forum?id=UnslcaZSnb} } ``` ## License Weights are converted from checkpoints released under the [Apache 2.0 license](https://huggingface.co/shallowdream204/DiCo).