madtune
/

pixeldit-diffusers

@@ -21,12 +21,13 @@ All credit goes to the original authors at NVIDIA. This repo only provides a `Di
 ## What is PixelDiT?
-PixelDiT is a 1.3B parameter **pixel-space** diffusion transformer — no VAE, generates images directly in pixel space. Runs on **4GB VRAM** at 512px.
 - **Architecture**: MMDiT patch blocks + pixel pathway (PiT blocks)
 - **Text encoder**: Gemma-2-2B with chi_prompt instruction prefix
-- **Resolution**: up to 1024×1024
 - **Sampler**: Flow matching (FlowMatchEulerDiscreteScheduler, shift=4.0)
 ---
@@ -68,9 +69,9 @@ pipe.enable_model_cpu_offload()
 image = pipe(
     "a white horse galloping through a meadow at sunset, cinematic lighting",
     negative_prompt="blurry, flat, low quality, cartoon",
-    height=512, width=512,
-    num_inference_steps=20,
-    guidance_scale=3.5,
 ).images[0]
 image.save("out.jpg")
 ```

 ## What is PixelDiT?
+PixelDiT is a 1.3B parameter **pixel-space** diffusion transformer — no VAE, generates images directly in pixel space. Runs on **4GB VRAM**.
 - **Architecture**: MMDiT patch blocks + pixel pathway (PiT blocks)
 - **Text encoder**: Gemma-2-2B with chi_prompt instruction prefix
+- **Native resolution**: 1024×1024
 - **Sampler**: Flow matching (FlowMatchEulerDiscreteScheduler, shift=4.0)
+- **Minimum steps**: 45–50 — below 45 produces garbage output
 ---
 image = pipe(
     "a white horse galloping through a meadow at sunset, cinematic lighting",
     negative_prompt="blurry, flat, low quality, cartoon",
+    height=1024, width=1024,
+    num_inference_steps=50,   # minimum 45 — below that produces garbage
+    guidance_scale=7.5,
 ).images[0]
 image.save("out.jpg")
 ```