Upload folder using huggingface_hub

8587d34 verified 4 days ago

5.91 kB

	---
	license: mit
	library_name: diffusers
	pipeline_tag: text-to-image
	tags:
	- diffusers
	- image-generation
	- class-conditional
	- text-to-image
	- imagenet
	- pixelgen
	- flow-matching
	- pixel-space
	- jit
	widget:
	- text: golden retriever
	output:
	url: PixelGen-XL-16-256/demo.png
	language:
	- en
	---

	# BiliSakura/PixelGen-diffusers

	Self-contained PixelGen checkpoints for Hugging Face diffusers. Each variant folder ships its own pipeline code, component modules, and weights.

	Converted from upstream PixelGen checkpoints using [PixelGen-diffusers](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection/tree/main/libs/PixelGen-diffusers) in [Visual-Generative-Foundation-Model-Collection](https://github.com/Bili-Sakura/Visual-Generative-Foundation-Model-Collection).

	## Available checkpoints

	\| Subfolder \| Pipeline \| Task \| Resolution \| Model type \|
	\| --- \| --- \| --- \| ---: \| --- \|
	\| [`PixelGen-XL-16-256/`](PixelGen-XL-16-256/) \| `PixelGenC2IPipeline` \| class-to-image \| 256×256 \| PixelGen-XL/16 \|
	\| [`PixelGen-XXL-16-512-t2i/`](PixelGen-XXL-16-512-t2i/) \| `PixelGenT2IPipeline` \| text-to-image \| 512×512 \| PixelGen-XXL/16-T2I \|

	## Repo layout

	```text
	BiliSakura/PixelGen-diffusers/
	├── README.md
	├── PixelGen-XL-16-256/
	│ ├── pipeline.py
	│ ├── model_index.json
	│ ├── demo.png
	│ ├── scheduler/
	│ │ ├── scheduler_config.json
	│ │ └── scheduling_pixelgen.py
	│ └── transformer/
	│ ├── config.json
	│ └── transformer_jit.py
	└── PixelGen-XXL-16-512-t2i/
	├── pipeline.py
	├── model_index.json
	├── conversion_metadata.json
	├── scheduler/
	│ ├── scheduler_config.json
	│ └── scheduling_pixelgen.py
	├── text_encoder/
	├── tokenizer/
	└── transformer/
	├── config.json
	├── diffusion_pytorch_model.safetensors
	└── transformer_jit_t2i.py
	```

	Each class-conditional variant is self-contained: load with `custom_pipeline=.../pipeline.py` and `trust_remote_code=True`. PixelGen denoises directly in pixel space (no VAE).

	## ImageNet class labels

	For [`PixelGen-XL-16-256/`](PixelGen-XL-16-256/), `id2label` is embedded in `model_index.json` (DiT-style).

	- `pipe.id2label` — inspect id → English label correspondence
	- `pipe.labels` — reverse map (English synonym → id)
	- `pipe.get_label_ids("golden retriever")`
	- `pipe(class_labels="golden retriever", ...)` — string labels resolved automatically

	## Demo

	![PixelGen-XL-16-256 demo](PixelGen-XL-16-256/demo.png)

	Class 207 — golden retriever, 256×256, 50 steps, `guidance_scale=2.25`, Heun solver, `timeshift=2.0`.

	## Load from Hugging Face

	### Class-to-image (`PixelGen-XL-16-256`)

	```python
	import torch
	from diffusers import DiffusionPipeline

	pipe = DiffusionPipeline.from_pretrained(
	"BiliSakura/PixelGen-diffusers/PixelGen-XL-16-256",
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	).to("cuda")

	print(pipe.id2label[207])
	print(pipe.get_label_ids("golden retriever"))

	generator = torch.Generator(device="cuda").manual_seed(0)
	images = pipe(
	class_labels="golden retriever",
	num_inference_steps=50,
	guidance_scale=2.25,
	generator=generator,
	).images
	```

	### Text-to-image (`PixelGen-XXL-16-512-t2i`)

	Uses a bundled Qwen3 text encoder when `text_encoder/` is present; otherwise downloads from the path recorded in `conversion_metadata.json`.

	```python
	import torch
	from diffusers import DiffusionPipeline

	pipe = DiffusionPipeline.from_pretrained(
	"BiliSakura/PixelGen-diffusers/PixelGen-XXL-16-512-t2i",
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	)

	generator = torch.Generator(device="cuda").manual_seed(42)
	images = pipe(
	prompt="A golden retriever playing in a sunny garden",
	num_inference_steps=50,
	guidance_scale=4.0,
	generator=generator,
	).images
	```

	## Load from a local clone

	### Class-to-image (`PixelGen-XL-16-256`)

	```python
	from pathlib import Path
	import torch
	from diffusers import DiffusionPipeline

	model_dir = Path("./PixelGen-XL-16-256").resolve()
	pipe = DiffusionPipeline.from_pretrained(
	str(model_dir),
	local_files_only=True,
	custom_pipeline=str(model_dir / "pipeline.py"),
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	).to("cuda")

	generator = torch.Generator(device="cuda").manual_seed(0)
	image = pipe(
	class_labels="golden retriever",
	num_inference_steps=50,
	guidance_scale=2.25,
	generator=generator,
	).images[0]
	image.save("demo.png")
	```

	## Recommended inference settings

	\| Variant \| Steps \| CFG scale \| Solver \| Timeshift \| CFG interval \|
	\| --- \| ---: \| ---: \| --- \| ---: \| --- \|
	\| `PixelGen-XL-16-256` \| 50 \| 2.25 \| heun \| 2.0 \| [0.1, 0.9] \|
	\| `PixelGen-XXL-16-512-t2i` \| 25 \| 4.0 \| adam_lm \| 3.0 \| [0.0, 1.0] \|

	`height` and `width` are fixed by each checkpoint's `sample_size`. Custom sizes are not supported for these exports.

	## Interface notes

	- Class-conditional generation uses `class_labels` (integer ImageNet id or English synonym).
	- `guidance_scale > 1.0` enables classifier-free guidance over a null class token.
	- `sampling_method` accepts `heun` or `euler` for C2I; T2I defaults to `adam_lm`.
	- `noise_scale` defaults to `1.0` at 256×256 and `2.0` at 512×512 when not specified.

	## Citation

	Source paper:

	- [PixelGen: Improving Pixel Diffusion with Perceptual Loss](https://arxiv.org/abs/2602.02493)
	- [Hugging Face Papers page](https://huggingface.co/papers/2602.02493)

	```bibtex
	@article{ma2026pixelgen,
	title={PixelGen: Improving Pixel Diffusion with Perceptual Loss},
	author={Zehong Ma and Ruihan Xu and Shiliang Zhang},
	year={2026},
	eprint={2602.02493},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2602.02493},
	}
	```