DiCo-diffusers / README.md
BiliSakura's picture
Upload folder using huggingface_hub
28463c6 verified
|
Raw
History Blame Contribute Delete
4.52 kB
---
license: apache-2.0
library_name: diffusers
pipeline_tag: unconditional-image-generation
tags:
- diffusers
- image-generation
- class-conditional
- imagenet
- dico
- latent-diffusion
- convnet
widget:
- text: golden retriever
output:
url: DiCo-XL-256/demo.png
inference: true
---
# BiliSakura/DiCo-diffusers
Self-contained DiCo checkpoints for Hugging Face diffusers. Each variant folder ships its own `pipeline.py`, component modules, and weights.
Converted from [shallowdream204/DiCo](https://huggingface.co/shallowdream204/DiCo) using [DiCo-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/DiCo-diffusers).
## Available checkpoints
| Subfolder | Pipeline | Resolution | Source checkpoint | CFG | FID | IS | Params |
| --- | --- | ---: | --- | ---: | ---: | ---: | ---: |
| [`DiCo-S-256/`](DiCo-S-256/) | `DiCoPipeline` | 256Γ—256 | `DiCo-S-400K-256x256.pt` | 1.0 | 49.97 | 31.38 | 33M |
| [`DiCo-B-256/`](DiCo-B-256/) | `DiCoPipeline` | 256Γ—256 | `DiCo-B-400K-256x256.pt` | 1.0 | 27.20 | 56.52 | 130M |
| [`DiCo-L-256/`](DiCo-L-256/) | `DiCoPipeline` | 256Γ—256 | `DiCo-L-400K-256x256.pt` | 1.0 | 13.66 | 91.37 | 464M |
| [`DiCo-XL-256/`](DiCo-XL-256/) | `DiCoPipeline` | 256Γ—256 | `DiCo-XL-3750K-256x256.pt` | 1.4 | 2.05 | 282.17 | 701M |
DiCo denoises **VAE latents** (4 channels, 32Γ—32 for 256Γ—256 images) with a ConvNet U-Net and multi-scale adaLN conditioning. VAE: `stabilityai/sd-vae-ft-ema`. Scheduler: `DDIMScheduler` (1000 train steps, linear betas).
## Repo layout
```text
BiliSakura/DiCo-diffusers/
β”œβ”€β”€ README.md
β”œβ”€β”€ demo_inference.py
β”œβ”€β”€ DiCo-S-256/
β”œβ”€β”€ DiCo-B-256/
β”œβ”€β”€ DiCo-L-256/
└── DiCo-XL-256/
β”œβ”€β”€ pipeline.py
β”œβ”€β”€ model_index.json
β”œβ”€β”€ demo.png
β”œβ”€β”€ scheduler/scheduler_config.json
β”œβ”€β”€ transformer/
└── vae/
```
Each variant is self-contained. The `scheduler/` folder uses built-in `DDIMScheduler` from PyPI diffusers.
## ImageNet class labels
`id2label` is embedded in each variant's `model_index.json` (DiT-style).
- `pipe.id2label` β€” inspect id β†’ English label correspondence
- `pipe.labels` β€” reverse map (English synonym β†’ id)
- `pipe.get_label_ids("golden retriever")`
- `pipe(class_labels="golden retriever", ...)` β€” string labels resolved automatically
## Demo
![DiCo-XL-256 demo](DiCo-XL-256/demo.png)
Class 207 β€” golden retriever, 256Γ—256, 250 steps, `guidance_scale=1.4`.
```bash
python demo_inference.py
python demo_inference.py --variant s # DiCo-S-256, CFG 1.0
```
## Load from a local clone
### ImageNet 256Γ—256 (`DiCo-XL-256`)
```python
from pathlib import Path
import torch
from diffusers import DiffusionPipeline
model_dir = Path("./DiCo-XL-256").resolve()
pipe = DiffusionPipeline.from_pretrained(
str(model_dir),
local_files_only=True,
custom_pipeline=str(model_dir / "pipeline.py"),
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))
generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(
class_labels="golden retriever",
height=256,
width=256,
num_inference_steps=250,
guidance_scale=1.4,
generator=generator,
).images[0]
image.save("demo.png")
```
## Recommended inference settings
| Variant | Steps | CFG scale |
| --- | ---: | ---: |
| `DiCo-S-256` | 250 | 1.0 |
| `DiCo-B-256` | 250 | 1.0 |
| `DiCo-L-256` | 250 | 1.0 |
| `DiCo-XL-256` | 250 | 1.4 |
Classifier-free guidance applies to the first 3 latent channels only (DiT reproducibility convention).
## Conversion
```bash
cd libs/DiCo-diffusers
python scripts/convert_dico_to_diffusers.py \
--checkpoint /path/to/DiCo-XL-3750K-256x256.pt \
--output /path/to/DiCo-XL-256 \
--model-type DiCo-XL \
--weights ema \
--safe-serialization \
--id2label ../../src/labels/id2label_en.json
```
## Citation
```bibtex
@inproceedings{ai2025dico,
title={DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling},
author={Yuang Ai and Qihang Fan and Xuefeng Hu and Zhenheng Yang and Ran He and Huaibo Huang},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=UnslcaZSnb}
}
```
## License
Weights are converted from checkpoints released under the [Apache 2.0 license](https://huggingface.co/shallowdream204/DiCo).