File size: 4,517 Bytes
28463c6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
license: apache-2.0
library_name: diffusers
pipeline_tag: unconditional-image-generation
tags:
  - diffusers
  - image-generation
  - class-conditional
  - imagenet
  - dico
  - latent-diffusion
  - convnet
widget:
  - text: golden retriever
    output:
      url: DiCo-XL-256/demo.png
inference: true
---

# BiliSakura/DiCo-diffusers

Self-contained DiCo checkpoints for Hugging Face diffusers. Each variant folder ships its own `pipeline.py`, component modules, and weights.

Converted from [shallowdream204/DiCo](https://huggingface.co/shallowdream204/DiCo) using [DiCo-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/DiCo-diffusers).

## Available checkpoints

| Subfolder | Pipeline | Resolution | Source checkpoint | CFG | FID | IS | Params |
| --- | --- | ---: | --- | ---: | ---: | ---: | ---: |
| [`DiCo-S-256/`](DiCo-S-256/) | `DiCoPipeline` | 256Γ—256 | `DiCo-S-400K-256x256.pt` | 1.0 | 49.97 | 31.38 | 33M |
| [`DiCo-B-256/`](DiCo-B-256/) | `DiCoPipeline` | 256Γ—256 | `DiCo-B-400K-256x256.pt` | 1.0 | 27.20 | 56.52 | 130M |
| [`DiCo-L-256/`](DiCo-L-256/) | `DiCoPipeline` | 256Γ—256 | `DiCo-L-400K-256x256.pt` | 1.0 | 13.66 | 91.37 | 464M |
| [`DiCo-XL-256/`](DiCo-XL-256/) | `DiCoPipeline` | 256Γ—256 | `DiCo-XL-3750K-256x256.pt` | 1.4 | 2.05 | 282.17 | 701M |

DiCo denoises **VAE latents** (4 channels, 32Γ—32 for 256Γ—256 images) with a ConvNet U-Net and multi-scale adaLN conditioning. VAE: `stabilityai/sd-vae-ft-ema`. Scheduler: `DDIMScheduler` (1000 train steps, linear betas).

## Repo layout

```text
BiliSakura/DiCo-diffusers/
β”œβ”€β”€ README.md
β”œβ”€β”€ demo_inference.py
β”œβ”€β”€ DiCo-S-256/
β”œβ”€β”€ DiCo-B-256/
β”œβ”€β”€ DiCo-L-256/
└── DiCo-XL-256/
    β”œβ”€β”€ pipeline.py
    β”œβ”€β”€ model_index.json
    β”œβ”€β”€ demo.png
    β”œβ”€β”€ scheduler/scheduler_config.json
    β”œβ”€β”€ transformer/
    └── vae/
```

Each variant is self-contained. The `scheduler/` folder uses built-in `DDIMScheduler` from PyPI diffusers.

## ImageNet class labels

`id2label` is embedded in each variant's `model_index.json` (DiT-style).

- `pipe.id2label` β€” inspect id β†’ English label correspondence
- `pipe.labels` β€” reverse map (English synonym β†’ id)
- `pipe.get_label_ids("golden retriever")`
- `pipe(class_labels="golden retriever", ...)` β€” string labels resolved automatically

## Demo

![DiCo-XL-256 demo](DiCo-XL-256/demo.png)

Class 207 β€” golden retriever, 256Γ—256, 250 steps, `guidance_scale=1.4`.

```bash
python demo_inference.py
python demo_inference.py --variant s   # DiCo-S-256, CFG 1.0
```

## Load from a local clone

### ImageNet 256Γ—256 (`DiCo-XL-256`)

```python
from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path("./DiCo-XL-256").resolve()
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))

generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(
    class_labels="golden retriever",
    height=256,
    width=256,
    num_inference_steps=250,
    guidance_scale=1.4,
    generator=generator,
).images[0]
image.save("demo.png")
```

## Recommended inference settings

| Variant | Steps | CFG scale |
| --- | ---: | ---: |
| `DiCo-S-256` | 250 | 1.0 |
| `DiCo-B-256` | 250 | 1.0 |
| `DiCo-L-256` | 250 | 1.0 |
| `DiCo-XL-256` | 250 | 1.4 |

Classifier-free guidance applies to the first 3 latent channels only (DiT reproducibility convention).

## Conversion

```bash
cd libs/DiCo-diffusers

python scripts/convert_dico_to_diffusers.py \
  --checkpoint /path/to/DiCo-XL-3750K-256x256.pt \
  --output /path/to/DiCo-XL-256 \
  --model-type DiCo-XL \
  --weights ema \
  --safe-serialization \
  --id2label ../../src/labels/id2label_en.json
```

## Citation

```bibtex
@inproceedings{ai2025dico,
    title={DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling},
    author={Yuang Ai and Qihang Fan and Xuefeng Hu and Zhenheng Yang and Ran He and Huaibo Huang},
    booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
    year={2025},
    url={https://openreview.net/forum?id=UnslcaZSnb}
}
```

## License

Weights are converted from checkpoints released under the [Apache 2.0 license](https://huggingface.co/shallowdream204/DiCo).