You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

PRXPixel (text-to-image, pixel space)

PRXPixel is a pixel-space variant of PRX: it denoises raw RGB directly (no VAE), conditions on a Qwen3-VL text encoder (rather than T5Gemma), and feeds the generation resolution into the timestep modulation. The denoiser is a ~7B PRXTransformer2DModel with a bottleneck patch projection and a resolution embedder.

Resolution: 1024
Transformer: ~7B params, torch.bfloat16
Text encoder: Qwen3-VL text tower (Qwen3VLTextModel)
VAE: none (pixel space)
Scheduler: FlowMatchEulerDiscreteScheduler

Requirements

PRXPixelPipeline is not yet in a released diffusers. Install diffusers from the branch that adds it, and use transformers >= 4.57 (the version that introduced Qwen3VLTextModel):
pip install "transformers>=4.57"
pip install "git+https://github.com/huggingface/diffusers.git@prx-pixel-pipeline"

Usage

import torch
from diffusers import PRXPixelPipeline

pipe = PRXPixelPipeline.from_pretrained("Photoroom/prxpixel-t2i", torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "A front-facing portrait of a lion in the golden savanna at sunset."
image = pipe(prompt, num_inference_steps=28, guidance_scale=5.0).images[0]
image.save("prxpixel_output.png")

License

Released under the Apache 2.0 license. See LICENSE and NOTICE.

Downloads last month: 9