PixelFlow-diffusers / README.md
BiliSakura's picture
Upload folder using huggingface_hub
098ef8f verified
|
raw
history blame
4.65 kB

BiliSakura/PixelFlow-diffusers

Self-contained PixelFlow checkpoints for Hugging Face diffusers. Each variant folder ships its own pipeline.py, component modules, and weights.

Available checkpoints

Subfolder Pipeline Task Resolution Params
PixelFlow-256/ PixelFlowPipeline class-to-image 256Γ—256 677M
PixelFlow-T2I/ PixelFlowT2IPipeline text-to-image 1024Γ—1024 882M

Repo layout

BiliSakura/PixelFlow-diffusers/
β”œβ”€β”€ README.md
β”œβ”€β”€ PixelFlow-256/
β”‚   β”œβ”€β”€ pipeline.py
β”‚   β”œβ”€β”€ model_index.json
β”‚   β”œβ”€β”€ scheduler/scheduler_config.json
β”‚   └── transformer/
└── PixelFlow-T2I/
    β”œβ”€β”€ pipeline.py
    β”œβ”€β”€ model_index.json
    β”œβ”€β”€ scheduler/scheduler_config.json
    β”œβ”€β”€ text_encoder/
    β”œβ”€β”€ tokenizer/
    └── transformer/

Each variant is self-contained. The scheduler/ folder contains scheduler_config.json and scheduling_pixelflow.py with PixelFlowScheduler.

No shared helper modules at inference time; only PyPI diffusers plus the local variant directory.

ImageNet class labels

For class-conditional PixelFlow-256/, id2label is embedded in PixelFlow-256/model_index.json (DiT-style).

  • pipe.id2label β€” inspect id β†’ English label correspondence
  • pipe.labels β€” reverse map (English synonym β†’ id)
  • pipe.get_label_ids("golden retriever")
  • pipe(class_labels="golden retriever", ...) β€” string labels resolved automatically

Demo

Class-to-image:

python demo_inference_c2i.py

Text-to-image:

python demo_inference_t2i.py

Load from a local clone

Class-to-image (PixelFlow-256)

from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path("./PixelFlow-256").resolve()
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))

generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
    class_labels="golden retriever",
    height=256,
    width=256,
    num_inference_steps=[10, 10, 10, 10],
    guidance_scale=4.0,
    generator=generator,
).images[0]
image.save("demo.png")

Text-to-image (PixelFlow-T2I)

Uses google/flan-t5-xl when text_encoder/ is not bundled.

from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path("./PixelFlow-T2I").resolve()
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
    prompt="A golden retriever playing in a sunny garden",
    height=1024,
    width=1024,
    num_inference_steps=[10, 10, 10, 10],
    guidance_scale=4.0,
    generator=generator,
).images[0]
image.save("demo.png")

Load a variant subfolder (e.g. ./PixelFlow-256), not the repo root.

Load from the Hub

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "BiliSakura/PixelFlow-diffusers",
    subfolder="PixelFlow-256",
    custom_pipeline="pipeline.py",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

image = pipe(class_labels="golden retriever", num_inference_steps=[10, 10, 10, 10]).images[0]

Swap subfolder="PixelFlow-T2I" and call with prompt=... for text-to-image.

Conversion

python scripts/convert_pixelflow_to_diffusers.py \
  --checkpoint models/raw/PixelFlow/c2i/model.pt \
  --config models/raw/PixelFlow/c2i/config.yaml \
  --output models/BiliSakura/PixelFlow-diffusers/PixelFlow-256

python scripts/convert_pixelflow_to_diffusers.py \
  --checkpoint models/raw/PixelFlow/t2i/model.pt \
  --config models/raw/PixelFlow/t2i/config.yaml \
  --output models/BiliSakura/PixelFlow-diffusers/PixelFlow-T2I \
  --skip-text-encoder

Citation

@article{chen2025pixelflow,
  title={PixelFlow: Pixel-Space Flow Matching for High-Resolution Image Synthesis},
  author={Chen, Shoufa and others},
  year={2025},
  eprint={2504.07963},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}