File size: 4,517 Bytes

28463c6

---
license: apache-2.0
library_name: diffusers
pipeline_tag: unconditional-image-generation
tags:
  - diffusers
  - image-generation
  - class-conditional
  - imagenet
  - dico
  - latent-diffusion
  - convnet
widget:
  - text: golden retriever
    output:
      url: DiCo-XL-256/demo.png
inference: true
---

# BiliSakura/DiCo-diffusers

Self-contained DiCo checkpoints for Hugging Face diffusers. Each variant folder ships its own `pipeline.py`, component modules, and weights.

Converted from [shallowdream204/DiCo](https://huggingface.co/shallowdream204/DiCo) using [DiCo-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/DiCo-diffusers).

## Available checkpoints

| Subfolder | Pipeline | Resolution | Source checkpoint | CFG | FID | IS | Params |
| --- | --- | ---: | --- | ---: | ---: | ---: | ---: |
| [`DiCo-S-256/`](DiCo-S-256/) | `DiCoPipeline` | 256×256 | `DiCo-S-400K-256x256.pt` | 1.0 | 49.97 | 31.38 | 33M |
| [`DiCo-B-256/`](DiCo-B-256/) | `DiCoPipeline` | 256×256 | `DiCo-B-400K-256x256.pt` | 1.0 | 27.20 | 56.52 | 130M |
| [`DiCo-L-256/`](DiCo-L-256/) | `DiCoPipeline` | 256×256 | `DiCo-L-400K-256x256.pt` | 1.0 | 13.66 | 91.37 | 464M |
| [`DiCo-XL-256/`](DiCo-XL-256/) | `DiCoPipeline` | 256×256 | `DiCo-XL-3750K-256x256.pt` | 1.4 | 2.05 | 282.17 | 701M |

DiCo denoises **VAE latents** (4 channels, 32×32 for 256×256 images) with a ConvNet U-Net and multi-scale adaLN conditioning. VAE: `stabilityai/sd-vae-ft-ema`. Scheduler: `DDIMScheduler` (1000 train steps, linear betas).

## Repo layout

```text
BiliSakura/DiCo-diffusers/
├── README.md
├── demo_inference.py
├── DiCo-S-256/
├── DiCo-B-256/
├── DiCo-L-256/
└── DiCo-XL-256/
    ├── pipeline.py
    ├── model_index.json
    ├── demo.png
    ├── scheduler/scheduler_config.json
    ├── transformer/
    └── vae/
```

Each variant is self-contained. The `scheduler/` folder uses built-in `DDIMScheduler` from PyPI diffusers.

## ImageNet class labels

`id2label` is embedded in each variant's `model_index.json` (DiT-style).

- `pipe.id2label` — inspect id → English label correspondence
- `pipe.labels` — reverse map (English synonym → id)
- `pipe.get_label_ids("golden retriever")`
- `pipe(class_labels="golden retriever", ...)` — string labels resolved automatically

## Demo

![DiCo-XL-256 demo](DiCo-XL-256/demo.png)

Class 207 — golden retriever, 256×256, 250 steps, `guidance_scale=1.4`.

```bash
python demo_inference.py
python demo_inference.py --variant s   # DiCo-S-256, CFG 1.0
```

## Load from a local clone

### ImageNet 256×256 (`DiCo-XL-256`)

```python
from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path("./DiCo-XL-256").resolve()
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))

generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(
    class_labels="golden retriever",
    height=256,
    width=256,
    num_inference_steps=250,
    guidance_scale=1.4,
    generator=generator,
).images[0]
image.save("demo.png")
```

## Recommended inference settings

| Variant | Steps | CFG scale |
| --- | ---: | ---: |
| `DiCo-S-256` | 250 | 1.0 |
| `DiCo-B-256` | 250 | 1.0 |
| `DiCo-L-256` | 250 | 1.0 |
| `DiCo-XL-256` | 250 | 1.4 |

Classifier-free guidance applies to the first 3 latent channels only (DiT reproducibility convention).

## Conversion

```bash
cd libs/DiCo-diffusers

python scripts/convert_dico_to_diffusers.py \
  --checkpoint /path/to/DiCo-XL-3750K-256x256.pt \
  --output /path/to/DiCo-XL-256 \
  --model-type DiCo-XL \
  --weights ema \
  --safe-serialization \
  --id2label ../../src/labels/id2label_en.json
```

## Citation

```bibtex
@inproceedings{ai2025dico,
    title={DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling},
    author={Yuang Ai and Qihang Fan and Xuefeng Hu and Zhenheng Yang and Ran He and Huaibo Huang},
    booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
    year={2025},
    url={https://openreview.net/forum?id=UnslcaZSnb}
}
```

## License

Weights are converted from checkpoints released under the [Apache 2.0 license](https://huggingface.co/shallowdream204/DiCo).