PixelGen-diffusers / README.md
BiliSakura's picture
Upload folder using huggingface_hub
8587d34 verified
|
Raw
History Blame Contribute Delete
5.91 kB
---
license: mit
library_name: diffusers
pipeline_tag: text-to-image
tags:
- diffusers
- image-generation
- class-conditional
- text-to-image
- imagenet
- pixelgen
- flow-matching
- pixel-space
- jit
widget:
- text: golden retriever
output:
url: PixelGen-XL-16-256/demo.png
language:
- en
---
# BiliSakura/PixelGen-diffusers
Self-contained PixelGen checkpoints for Hugging Face diffusers. Each variant folder ships its own pipeline code, component modules, and weights.
Converted from upstream PixelGen checkpoints using [PixelGen-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/PixelGen-diffusers) in [Visual-Generative-Foundation-Model-Collection](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection).
## Available checkpoints
| Subfolder | Pipeline | Task | Resolution | Model type |
| --- | --- | --- | ---: | --- |
| [`PixelGen-XL-16-256/`](PixelGen-XL-16-256/) | `PixelGenC2IPipeline` | class-to-image | 256Γ—256 | PixelGen-XL/16 |
| [`PixelGen-XXL-16-512-t2i/`](PixelGen-XXL-16-512-t2i/) | `PixelGenT2IPipeline` | text-to-image | 512Γ—512 | PixelGen-XXL/16-T2I |
## Repo layout
```text
BiliSakura/PixelGen-diffusers/
β”œβ”€β”€ README.md
β”œβ”€β”€ PixelGen-XL-16-256/
β”‚ β”œβ”€β”€ pipeline.py
β”‚ β”œβ”€β”€ model_index.json
β”‚ β”œβ”€β”€ demo.png
β”‚ β”œβ”€β”€ scheduler/
β”‚ β”‚ β”œβ”€β”€ scheduler_config.json
β”‚ β”‚ └── scheduling_pixelgen.py
β”‚ └── transformer/
β”‚ β”œβ”€β”€ config.json
β”‚ └── transformer_jit.py
└── PixelGen-XXL-16-512-t2i/
β”œβ”€β”€ pipeline.py
β”œβ”€β”€ model_index.json
β”œβ”€β”€ conversion_metadata.json
β”œβ”€β”€ scheduler/
β”‚ β”œβ”€β”€ scheduler_config.json
β”‚ └── scheduling_pixelgen.py
β”œβ”€β”€ text_encoder/
β”œβ”€β”€ tokenizer/
└── transformer/
β”œβ”€β”€ config.json
β”œβ”€β”€ diffusion_pytorch_model.safetensors
└── transformer_jit_t2i.py
```
Each class-conditional variant is self-contained: load with `custom_pipeline=.../pipeline.py` and `trust_remote_code=True`. PixelGen denoises directly in pixel space (no VAE).
## ImageNet class labels
For [`PixelGen-XL-16-256/`](PixelGen-XL-16-256/), `id2label` is embedded in `model_index.json` (DiT-style).
- `pipe.id2label` β€” inspect id β†’ English label correspondence
- `pipe.labels` β€” reverse map (English synonym β†’ id)
- `pipe.get_label_ids("golden retriever")`
- `pipe(class_labels="golden retriever", ...)` β€” string labels resolved automatically
## Demo
![PixelGen-XL-16-256 demo](PixelGen-XL-16-256/demo.png)
Class 207 β€” golden retriever, 256Γ—256, 50 steps, `guidance_scale=2.25`, Heun solver, `timeshift=2.0`.
## Load from Hugging Face
### Class-to-image (`PixelGen-XL-16-256`)
```python
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"BiliSakura/PixelGen-diffusers/PixelGen-XL-16-256",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
).to("cuda")
print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))
generator = torch.Generator(device="cuda").manual_seed(0)
images = pipe(
class_labels="golden retriever",
num_inference_steps=50,
guidance_scale=2.25,
generator=generator,
).images
```
### Text-to-image (`PixelGen-XXL-16-512-t2i`)
Uses a bundled Qwen3 text encoder when `text_encoder/` is present; otherwise downloads from the path recorded in `conversion_metadata.json`.
```python
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"BiliSakura/PixelGen-diffusers/PixelGen-XXL-16-512-t2i",
trust_remote_code=True,
torch_dtype=torch.bfloat16,
)
generator = torch.Generator(device="cuda").manual_seed(42)
images = pipe(
prompt="A golden retriever playing in a sunny garden",
num_inference_steps=50,
guidance_scale=4.0,
generator=generator,
).images
```
## Load from a local clone
### Class-to-image (`PixelGen-XL-16-256`)
```python
from pathlib import Path
import torch
from diffusers import DiffusionPipeline
model_dir = Path("./PixelGen-XL-16-256").resolve()
pipe = DiffusionPipeline.from_pretrained(
str(model_dir),
local_files_only=True,
custom_pipeline=str(model_dir / "pipeline.py"),
trust_remote_code=True,
torch_dtype=torch.bfloat16,
).to("cuda")
generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(
class_labels="golden retriever",
num_inference_steps=50,
guidance_scale=2.25,
generator=generator,
).images[0]
image.save("demo.png")
```
## Recommended inference settings
| Variant | Steps | CFG scale | Solver | Timeshift | CFG interval |
| --- | ---: | ---: | --- | ---: | --- |
| `PixelGen-XL-16-256` | 50 | 2.25 | heun | 2.0 | [0.1, 0.9] |
| `PixelGen-XXL-16-512-t2i` | 25 | 4.0 | adam_lm | 3.0 | [0.0, 1.0] |
`height` and `width` are fixed by each checkpoint's `sample_size`. Custom sizes are not supported for these exports.
## Interface notes
- Class-conditional generation uses `class_labels` (integer ImageNet id or English synonym).
- `guidance_scale > 1.0` enables classifier-free guidance over a null class token.
- `sampling_method` accepts `heun` or `euler` for C2I; T2I defaults to `adam_lm`.
- `noise_scale` defaults to `1.0` at 256Γ—256 and `2.0` at 512Γ—512 when not specified.
## Citation
Source paper:
- [PixelGen: Improving Pixel Diffusion with Perceptual Loss](https://arxiv.org/abs/2602.02493)
- [Hugging Face Papers page](https://huggingface.co/papers/2602.02493)
```bibtex
@article{ma2026pixelgen,
title={PixelGen: Improving Pixel Diffusion with Perceptual Loss},
author={Zehong Ma and Ruihan Xu and Shiliang Zhang},
year={2026},
eprint={2602.02493},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2602.02493},
}
```