File size: 3,977 Bytes
9dc3cb9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23c5090
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
---
library_name: diffusers
pipeline_tag: unconditional-image-generation
tags:
  - diffusers
  - deco
  - image-generation
  - class-conditional
  - imagenet
license: mit
inference: true
widget:
  - text: golden retriever
    output:
      url: DeCo-XL-16-512/demo.png
language:
  - en
---

# DeCo-diffusers

Diffusers-ready checkpoints for **DeCo** (Decoupled Conditioning), converted for local/offline use.

This root folder is a model collection that contains:

- `DeCo-XL-16-256`
- `DeCo-XL-16-512`
- `DeCo-XXL-16-512-t2i` (text-to-image; requires `Qwen/Qwen3-1.7B` text encoder)

Each subfolder is a self-contained Diffusers model repo with:

- `pipeline.py`
- `transformer/transformer_deco.py`
- `scheduler/scheduling_deco_flow_match_euler_discrete.py`
- `transformer/diffusion_pytorch_model.safetensors`
- `vae/autoencoder_deco.py`

Each variant embeds English `id2label` directly in `model_index.json` (DiT-style), so class labels can be passed as
ImageNet ids or English synonym strings.

- `pipe.id2label` — id → English label (comma-separated synonyms)
- `pipe.get_label_ids("golden retriever")` — English label → id

## Demo

![DeCo-XL-16-512 demo](DeCo-XL-16-512/demo.png)

Class-conditional sample (ImageNet class **207**, golden retriever), `DeCo-XL/16` at 512×512, 100 steps, CFG 5.0, seed 42.

## Model Paths

Use paths relative to this root README:

| Model | Resolution | Source checkpoint | Local path |
| --- | ---: | --- | --- |
| DeCo-XL/16 | 256×256 | `imagenet256_epoch800.ckpt` (EMA) | `./DeCo-XL-16-256` |
| DeCo-XL/16 | 512×512 | `imagenet512_epoch340.ckpt` (EMA) | `./DeCo-XL-16-512` |
| DeCo-XXL/16 | 512×512 t2i | `t2i_DeCo.ckpt` (EMA) | `./DeCo-XXL-16-512-t2i` |

## Inference Demo (Diffusers)

### 1) Load a local subfolder checkpoint

```python
import torch
from diffusers import DiffusionPipeline

model_path = "./DeCo-XL-16-512"  # change to ./DeCo-XL-16-256 for 256px
device = "cuda" if torch.cuda.is_available() else "cpu"

pipe = DiffusionPipeline.from_pretrained(
    model_path,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).to(device)

generator = torch.Generator(device=device).manual_seed(42)

# ImageNet class example: 207 = golden retriever
print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))  # [207]

result = pipe(
    class_labels="golden retriever",
    num_inference_steps=100,
    guidance_scale=5.0,  # use 3.2 for DeCo-XL-16-256
    generator=generator,
)

image = result.images[0]
image.save("deco_xl_512_demo.png")
```

### 2) Quick variant switch (256 model)

```python
model_path = "./DeCo-XL-16-256"

pipe = DiffusionPipeline.from_pretrained(model_path, trust_remote_code=True).to(device)
image = pipe(
    class_labels=207,
    num_inference_steps=100,
    guidance_scale=3.2,
    generator=generator,
).images[0]
image.save("deco_xl_256_demo.png")
```

Integer class ids, batched labels, and optional `batch_size` for repeating a single label are also supported.

### 3) Text-to-image (`DeCo-XXL-16-512-t2i` / `t2i_DeCo.ckpt`)

Use the **AdamLM** scheduler defaults from official DeCo (not the c2i 100-step / CFG 5.0 recipe):

```python
import torch
from diffusers import DiffusionPipeline

model_path = "./DeCo-XXL-16-512-t2i"
device = "cuda" if torch.cuda.is_available() else "cpu"

pipe = DiffusionPipeline.from_pretrained(
    model_path,
    trust_remote_code=True,
    custom_pipeline=f"{model_path}/pipeline.py",
    torch_dtype=torch.bfloat16,
).to(device)

# Bundled ./text_encoder (Qwen3-1.7B weights + tokenizer). Pipeline loads both from that folder.
# Denoiser runs in float32 during __call__ (matches official GenEval predict).

image = pipe(
    prompt="a golden retriever playing in the snow, high quality photograph",
    negative_prompt="Unrealistic, JPEG artifacts.",
    num_inference_steps=25,
    guidance_scale=4.0,
    timeshift=3.0,
    generator=torch.Generator(device="cpu").manual_seed(42),
).images[0]
image.save("deco_t2i_demo.png")
```