effhtrh
/

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

PRXPixel (text-to-image, pixel space)

PRXPixel is a pixel-space variant of PRX: it denoises raw RGB directly (no VAE), conditions on a Qwen3-VL text encoder (rather than T5Gemma), and feeds the generation resolution into the timestep modulation. The denoiser is a ~7B PRXTransformer2DModel with a bottleneck patch projection and a resolution embedder.

  • Resolution: 1024
  • Transformer: ~7B params, torch.bfloat16
  • Text encoder: Qwen3-VL text tower (Qwen3VLTextModel)
  • VAE: none (pixel space)
  • Scheduler: FlowMatchEulerDiscreteScheduler

Requirements

PRXPixelPipeline is not yet in a released diffusers. Install diffusers from the branch that adds it, and use transformers >= 4.57 (the version that introduced Qwen3VLTextModel):

pip install "transformers>=4.57"
pip install "git+https://github.com/huggingface/diffusers.git@prx-pixel-pipeline"

Usage

import torch
from diffusers import PRXPixelPipeline

pipe = PRXPixelPipeline.from_pretrained("Photoroom/prxpixel-t2i", torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "A front-facing portrait of a lion in the golden savanna at sunset."
image = pipe(prompt, num_inference_steps=28, guidance_scale=5.0).images[0]
image.save("prxpixel_output.png")

License

Released under the Apache 2.0 license. See LICENSE and NOTICE.

Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support