Instructions to use madtune/pixeldit-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use madtune/pixeldit-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("nvidia/PixelDiT-1300M-1024px", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("madtune/pixeldit-diffusers") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -21,12 +21,13 @@ All credit goes to the original authors at NVIDIA. This repo only provides a `Di
|
|
| 21 |
|
| 22 |
## What is PixelDiT?
|
| 23 |
|
| 24 |
-
PixelDiT is a 1.3B parameter **pixel-space** diffusion transformer — no VAE, generates images directly in pixel space. Runs on **4GB VRAM**
|
| 25 |
|
| 26 |
- **Architecture**: MMDiT patch blocks + pixel pathway (PiT blocks)
|
| 27 |
- **Text encoder**: Gemma-2-2B with chi_prompt instruction prefix
|
| 28 |
-
- **
|
| 29 |
- **Sampler**: Flow matching (FlowMatchEulerDiscreteScheduler, shift=4.0)
|
|
|
|
| 30 |
|
| 31 |
---
|
| 32 |
|
|
@@ -68,9 +69,9 @@ pipe.enable_model_cpu_offload()
|
|
| 68 |
image = pipe(
|
| 69 |
"a white horse galloping through a meadow at sunset, cinematic lighting",
|
| 70 |
negative_prompt="blurry, flat, low quality, cartoon",
|
| 71 |
-
height=
|
| 72 |
-
num_inference_steps=
|
| 73 |
-
guidance_scale=
|
| 74 |
).images[0]
|
| 75 |
image.save("out.jpg")
|
| 76 |
```
|
|
|
|
| 21 |
|
| 22 |
## What is PixelDiT?
|
| 23 |
|
| 24 |
+
PixelDiT is a 1.3B parameter **pixel-space** diffusion transformer — no VAE, generates images directly in pixel space. Runs on **4GB VRAM**.
|
| 25 |
|
| 26 |
- **Architecture**: MMDiT patch blocks + pixel pathway (PiT blocks)
|
| 27 |
- **Text encoder**: Gemma-2-2B with chi_prompt instruction prefix
|
| 28 |
+
- **Native resolution**: 1024×1024
|
| 29 |
- **Sampler**: Flow matching (FlowMatchEulerDiscreteScheduler, shift=4.0)
|
| 30 |
+
- **Minimum steps**: 45–50 — below 45 produces garbage output
|
| 31 |
|
| 32 |
---
|
| 33 |
|
|
|
|
| 69 |
image = pipe(
|
| 70 |
"a white horse galloping through a meadow at sunset, cinematic lighting",
|
| 71 |
negative_prompt="blurry, flat, low quality, cartoon",
|
| 72 |
+
height=1024, width=1024,
|
| 73 |
+
num_inference_steps=50, # minimum 45 — below that produces garbage
|
| 74 |
+
guidance_scale=7.5,
|
| 75 |
).images[0]
|
| 76 |
image.save("out.jpg")
|
| 77 |
```
|