Text-to-Image
Diffusers
Safetensors
English
image-generation
class-conditional
imagenet
pixelgen
flow-matching
pixel-space
jit
Instructions to use BiliSakura/PixelGen-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use BiliSakura/PixelGen-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("BiliSakura/PixelGen-diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "golden retriever" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
| license: mit | |
| library_name: diffusers | |
| pipeline_tag: text-to-image | |
| tags: | |
| - diffusers | |
| - image-generation | |
| - class-conditional | |
| - text-to-image | |
| - imagenet | |
| - pixelgen | |
| - flow-matching | |
| - pixel-space | |
| - jit | |
| widget: | |
| - text: golden retriever | |
| output: | |
| url: PixelGen-XL-16-256/demo.png | |
| language: | |
| - en | |
| # BiliSakura/PixelGen-diffusers | |
| Self-contained PixelGen checkpoints for Hugging Face diffusers. Each variant folder ships its own pipeline code, component modules, and weights. | |
| Converted from upstream PixelGen checkpoints using [PixelGen-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/PixelGen-diffusers) in [Visual-Generative-Foundation-Model-Collection](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection). | |
| ## Available checkpoints | |
| | Subfolder | Pipeline | Task | Resolution | Model type | | |
| | --- | --- | --- | ---: | --- | | |
| | [`PixelGen-XL-16-256/`](PixelGen-XL-16-256/) | `PixelGenC2IPipeline` | class-to-image | 256Γ256 | PixelGen-XL/16 | | |
| | [`PixelGen-XXL-16-512-t2i/`](PixelGen-XXL-16-512-t2i/) | `PixelGenT2IPipeline` | text-to-image | 512Γ512 | PixelGen-XXL/16-T2I | | |
| ## Repo layout | |
| ```text | |
| BiliSakura/PixelGen-diffusers/ | |
| βββ README.md | |
| βββ PixelGen-XL-16-256/ | |
| β βββ pipeline.py | |
| β βββ model_index.json | |
| β βββ demo.png | |
| β βββ scheduler/ | |
| β β βββ scheduler_config.json | |
| β β βββ scheduling_pixelgen.py | |
| β βββ transformer/ | |
| β βββ config.json | |
| β βββ transformer_jit.py | |
| βββ PixelGen-XXL-16-512-t2i/ | |
| βββ pipeline.py | |
| βββ model_index.json | |
| βββ conversion_metadata.json | |
| βββ scheduler/ | |
| β βββ scheduler_config.json | |
| β βββ scheduling_pixelgen.py | |
| βββ text_encoder/ | |
| βββ tokenizer/ | |
| βββ transformer/ | |
| βββ config.json | |
| βββ diffusion_pytorch_model.safetensors | |
| βββ transformer_jit_t2i.py | |
| ``` | |
| Each class-conditional variant is self-contained: load with `custom_pipeline=.../pipeline.py` and `trust_remote_code=True`. PixelGen denoises directly in pixel space (no VAE). | |
| ## ImageNet class labels | |
| For [`PixelGen-XL-16-256/`](PixelGen-XL-16-256/), `id2label` is embedded in `model_index.json` (DiT-style). | |
| - `pipe.id2label` β inspect id β English label correspondence | |
| - `pipe.labels` β reverse map (English synonym β id) | |
| - `pipe.get_label_ids("golden retriever")` | |
| - `pipe(class_labels="golden retriever", ...)` β string labels resolved automatically | |
| ## Demo | |
|  | |
| Class 207 β golden retriever, 256Γ256, 50 steps, `guidance_scale=2.25`, Heun solver, `timeshift=2.0`. | |
| ## Load from Hugging Face | |
| ### Class-to-image (`PixelGen-XL-16-256`) | |
| ```python | |
| import torch | |
| from diffusers import DiffusionPipeline | |
| pipe = DiffusionPipeline.from_pretrained( | |
| "BiliSakura/PixelGen-diffusers/PixelGen-XL-16-256", | |
| trust_remote_code=True, | |
| torch_dtype=torch.bfloat16, | |
| ).to("cuda") | |
| print(pipe.id2label[207]) | |
| print(pipe.get_label_ids("golden retriever")) | |
| generator = torch.Generator(device="cuda").manual_seed(0) | |
| images = pipe( | |
| class_labels="golden retriever", | |
| num_inference_steps=50, | |
| guidance_scale=2.25, | |
| generator=generator, | |
| ).images | |
| ``` | |
| ### Text-to-image (`PixelGen-XXL-16-512-t2i`) | |
| Uses a bundled Qwen3 text encoder when `text_encoder/` is present; otherwise downloads from the path recorded in `conversion_metadata.json`. | |
| ```python | |
| import torch | |
| from diffusers import DiffusionPipeline | |
| pipe = DiffusionPipeline.from_pretrained( | |
| "BiliSakura/PixelGen-diffusers/PixelGen-XXL-16-512-t2i", | |
| trust_remote_code=True, | |
| torch_dtype=torch.bfloat16, | |
| ) | |
| generator = torch.Generator(device="cuda").manual_seed(42) | |
| images = pipe( | |
| prompt="A golden retriever playing in a sunny garden", | |
| num_inference_steps=50, | |
| guidance_scale=4.0, | |
| generator=generator, | |
| ).images | |
| ``` | |
| ## Load from a local clone | |
| ### Class-to-image (`PixelGen-XL-16-256`) | |
| ```python | |
| from pathlib import Path | |
| import torch | |
| from diffusers import DiffusionPipeline | |
| model_dir = Path("./PixelGen-XL-16-256").resolve() | |
| pipe = DiffusionPipeline.from_pretrained( | |
| str(model_dir), | |
| local_files_only=True, | |
| custom_pipeline=str(model_dir / "pipeline.py"), | |
| trust_remote_code=True, | |
| torch_dtype=torch.bfloat16, | |
| ).to("cuda") | |
| generator = torch.Generator(device="cuda").manual_seed(0) | |
| image = pipe( | |
| class_labels="golden retriever", | |
| num_inference_steps=50, | |
| guidance_scale=2.25, | |
| generator=generator, | |
| ).images[0] | |
| image.save("demo.png") | |
| ``` | |
| ## Recommended inference settings | |
| | Variant | Steps | CFG scale | Solver | Timeshift | CFG interval | | |
| | --- | ---: | ---: | --- | ---: | --- | | |
| | `PixelGen-XL-16-256` | 50 | 2.25 | heun | 2.0 | [0.1, 0.9] | | |
| | `PixelGen-XXL-16-512-t2i` | 25 | 4.0 | adam_lm | 3.0 | [0.0, 1.0] | | |
| `height` and `width` are fixed by each checkpoint's `sample_size`. Custom sizes are not supported for these exports. | |
| ## Interface notes | |
| - Class-conditional generation uses `class_labels` (integer ImageNet id or English synonym). | |
| - `guidance_scale > 1.0` enables classifier-free guidance over a null class token. | |
| - `sampling_method` accepts `heun` or `euler` for C2I; T2I defaults to `adam_lm`. | |
| - `noise_scale` defaults to `1.0` at 256Γ256 and `2.0` at 512Γ512 when not specified. | |
| ## Citation | |
| Source paper: | |
| - [PixelGen: Improving Pixel Diffusion with Perceptual Loss](https://arxiv.org/abs/2602.02493) | |
| - [Hugging Face Papers page](https://huggingface.co/papers/2602.02493) | |
| ```bibtex | |
| @article{ma2026pixelgen, | |
| title={PixelGen: Improving Pixel Diffusion with Perceptual Loss}, | |
| author={Zehong Ma and Ruihan Xu and Shiliang Zhang}, | |
| year={2026}, | |
| eprint={2602.02493}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CV}, | |
| url={https://arxiv.org/abs/2602.02493}, | |
| } | |
| ``` | |