DeCo-diffusers / README.md
BiliSakura's picture
Upload folder using huggingface_hub
23c5090 verified
---
library_name: diffusers
pipeline_tag: unconditional-image-generation
tags:
- diffusers
- deco
- image-generation
- class-conditional
- imagenet
license: mit
inference: true
widget:
- text: golden retriever
output:
url: DeCo-XL-16-512/demo.png
language:
- en
---
# DeCo-diffusers
Diffusers-ready checkpoints for **DeCo** (Decoupled Conditioning), converted for local/offline use.
This root folder is a model collection that contains:
- `DeCo-XL-16-256`
- `DeCo-XL-16-512`
- `DeCo-XXL-16-512-t2i` (text-to-image; requires `Qwen/Qwen3-1.7B` text encoder)
Each subfolder is a self-contained Diffusers model repo with:
- `pipeline.py`
- `transformer/transformer_deco.py`
- `scheduler/scheduling_deco_flow_match_euler_discrete.py`
- `transformer/diffusion_pytorch_model.safetensors`
- `vae/autoencoder_deco.py`
Each variant embeds English `id2label` directly in `model_index.json` (DiT-style), so class labels can be passed as
ImageNet ids or English synonym strings.
- `pipe.id2label` — id → English label (comma-separated synonyms)
- `pipe.get_label_ids("golden retriever")` — English label → id
## Demo
![DeCo-XL-16-512 demo](DeCo-XL-16-512/demo.png)
Class-conditional sample (ImageNet class **207**, golden retriever), `DeCo-XL/16` at 512×512, 100 steps, CFG 5.0, seed 42.
## Model Paths
Use paths relative to this root README:
| Model | Resolution | Source checkpoint | Local path |
| --- | ---: | --- | --- |
| DeCo-XL/16 | 256×256 | `imagenet256_epoch800.ckpt` (EMA) | `./DeCo-XL-16-256` |
| DeCo-XL/16 | 512×512 | `imagenet512_epoch340.ckpt` (EMA) | `./DeCo-XL-16-512` |
| DeCo-XXL/16 | 512×512 t2i | `t2i_DeCo.ckpt` (EMA) | `./DeCo-XXL-16-512-t2i` |
## Inference Demo (Diffusers)
### 1) Load a local subfolder checkpoint
```python
import torch
from diffusers import DiffusionPipeline
model_path = "./DeCo-XL-16-512" # change to ./DeCo-XL-16-256 for 256px
device = "cuda" if torch.cuda.is_available() else "cpu"
pipe = DiffusionPipeline.from_pretrained(
model_path,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
).to(device)
generator = torch.Generator(device=device).manual_seed(42)
# ImageNet class example: 207 = golden retriever
print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever")) # [207]
result = pipe(
class_labels="golden retriever",
num_inference_steps=100,
guidance_scale=5.0, # use 3.2 for DeCo-XL-16-256
generator=generator,
)
image = result.images[0]
image.save("deco_xl_512_demo.png")
```
### 2) Quick variant switch (256 model)
```python
model_path = "./DeCo-XL-16-256"
pipe = DiffusionPipeline.from_pretrained(model_path, trust_remote_code=True).to(device)
image = pipe(
class_labels=207,
num_inference_steps=100,
guidance_scale=3.2,
generator=generator,
).images[0]
image.save("deco_xl_256_demo.png")
```
Integer class ids, batched labels, and optional `batch_size` for repeating a single label are also supported.
### 3) Text-to-image (`DeCo-XXL-16-512-t2i` / `t2i_DeCo.ckpt`)
Use the **AdamLM** scheduler defaults from official DeCo (not the c2i 100-step / CFG 5.0 recipe):
```python
import torch
from diffusers import DiffusionPipeline
model_path = "./DeCo-XXL-16-512-t2i"
device = "cuda" if torch.cuda.is_available() else "cpu"
pipe = DiffusionPipeline.from_pretrained(
model_path,
trust_remote_code=True,
custom_pipeline=f"{model_path}/pipeline.py",
torch_dtype=torch.bfloat16,
).to(device)
# Bundled ./text_encoder (Qwen3-1.7B weights + tokenizer). Pipeline loads both from that folder.
# Denoiser runs in float32 during __call__ (matches official GenEval predict).
image = pipe(
prompt="a golden retriever playing in the snow, high quality photograph",
negative_prompt="Unrealistic, JPEG artifacts.",
num_inference_steps=25,
guidance_scale=4.0,
timeshift=3.0,
generator=torch.Generator(device="cpu").manual_seed(42),
).images[0]
image.save("deco_t2i_demo.png")
```