File size: 4,653 Bytes

# BiliSakura/PixelFlow-diffusers

Self-contained PixelFlow checkpoints for Hugging Face diffusers. Each variant folder ships its own `pipeline.py`, component modules, and weights.

## Available checkpoints

| Subfolder | Pipeline | Task | Resolution | Params |
| --- | --- | --- | ---: | ---: |
| [`PixelFlow-256/`](PixelFlow-256/) | `PixelFlowPipeline` | class-to-image | 256×256 | 677M |
| [`PixelFlow-T2I/`](PixelFlow-T2I/) | `PixelFlowT2IPipeline` | text-to-image | 1024×1024 | 882M |

## Repo layout

```text
BiliSakura/PixelFlow-diffusers/
├── README.md
├── PixelFlow-256/
│   ├── pipeline.py
│   ├── model_index.json
│   ├── scheduler/scheduler_config.json
│   └── transformer/
└── PixelFlow-T2I/
    ├── pipeline.py
    ├── model_index.json
    ├── scheduler/scheduler_config.json
    ├── text_encoder/
    ├── tokenizer/
    └── transformer/
```

Each variant is self-contained. The `scheduler/` folder contains `scheduler_config.json` and `scheduling_pixelflow.py` with [`PixelFlowScheduler`](PixelFlow-256/scheduler/scheduling_pixelflow.py).

No shared helper modules at inference time; only PyPI `diffusers` plus the local variant directory.

## ImageNet class labels

For class-conditional [`PixelFlow-256/`](PixelFlow-256/), `id2label` is embedded in `PixelFlow-256/model_index.json` (DiT-style).

- `pipe.id2label` — inspect id → English label correspondence
- `pipe.labels` — reverse map (English synonym → id)
- `pipe.get_label_ids("golden retriever")`
- `pipe(class_labels="golden retriever", ...)` — string labels resolved automatically

## Demo

Class-to-image:

```bash
python demo_inference_c2i.py
```

Text-to-image:

```bash
python demo_inference_t2i.py
```

## Load from a local clone

### Class-to-image (`PixelFlow-256`)

```python
from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path("./PixelFlow-256").resolve()
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))

generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
    class_labels="golden retriever",
    height=256,
    width=256,
    num_inference_steps=[10, 10, 10, 10],
    guidance_scale=4.0,
    generator=generator,
).images[0]
image.save("demo.png")
```

### Text-to-image (`PixelFlow-T2I`)

Uses [`google/flan-t5-xl`](https://huggingface.co/google/flan-t5-xl) when `text_encoder/` is not bundled.

```python
from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path("./PixelFlow-T2I").resolve()
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
    prompt="A golden retriever playing in a sunny garden",
    height=1024,
    width=1024,
    num_inference_steps=[10, 10, 10, 10],
    guidance_scale=4.0,
    generator=generator,
).images[0]
image.save("demo.png")
```

Load a **variant subfolder** (e.g. `./PixelFlow-256`), not the repo root.

## Load from the Hub

```python
import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "BiliSakura/PixelFlow-diffusers",
    subfolder="PixelFlow-256",
    custom_pipeline="pipeline.py",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

image = pipe(class_labels="golden retriever", num_inference_steps=[10, 10, 10, 10]).images[0]
```

Swap `subfolder="PixelFlow-T2I"` and call with `prompt=...` for text-to-image.

## Conversion

```bash
python scripts/convert_pixelflow_to_diffusers.py \
  --checkpoint models/raw/PixelFlow/c2i/model.pt \
  --config models/raw/PixelFlow/c2i/config.yaml \
  --output models/BiliSakura/PixelFlow-diffusers/PixelFlow-256

python scripts/convert_pixelflow_to_diffusers.py \
  --checkpoint models/raw/PixelFlow/t2i/model.pt \
  --config models/raw/PixelFlow/t2i/config.yaml \
  --output models/BiliSakura/PixelFlow-diffusers/PixelFlow-T2I \
  --skip-text-encoder
```

## Citation

```bibtex
@article{chen2025pixelflow,
  title={PixelFlow: Pixel-Space Flow Matching for High-Resolution Image Synthesis},
  author={Chen, Shoufa and others},
  year={2025},
  eprint={2504.07963},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
```