Instructions to use madtune/pixeldit-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use madtune/pixeldit-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("nvidia/PixelDiT-1300M-1024px", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("madtune/pixeldit-diffusers") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,3 +1,70 @@
|
|
| 1 |
---
|
| 2 |
-
license:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
license: other
|
| 3 |
+
tags:
|
| 4 |
+
- text-to-image
|
| 5 |
+
- diffusion
|
| 6 |
+
- pixeldit
|
| 7 |
+
- nvidia
|
| 8 |
+
- pixel-space
|
| 9 |
+
base_model: nvidia/PixelDiT-1300M-1024px
|
| 10 |
---
|
| 11 |
+
|
| 12 |
+
# PixelDiT 1.3B — Diffusers-Compatible Conversion
|
| 13 |
+
|
| 14 |
+
This is an **unofficial** HuggingFace-compatible conversion of NVIDIA's [PixelDiT-1300M-1024px](https://huggingface.co/nvidia/PixelDiT-1300M-1024px) model.
|
| 15 |
+
|
| 16 |
+
All credit goes to the original authors at NVIDIA. This repo only provides a `PreTrainedModel` wrapper to enable `from_pretrained`, `save_pretrained`, and LoRA fine-tuning via `peft`.
|
| 17 |
+
|
| 18 |
+
> **I do not own this model.** Original weights, architecture, and training are the work of NVIDIA Research. Please refer to their [original repository](https://huggingface.co/nvidia/PixelDiT-1300M-1024px) for license terms.
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
## What is PixelDiT?
|
| 23 |
+
|
| 24 |
+
PixelDiT is a 1.3B parameter pixel-space diffusion transformer — no VAE, generates images directly in pixel space. Text conditioning uses Gemma-2-2B with a chi_prompt prefix to produce rich visual descriptions.
|
| 25 |
+
|
| 26 |
+
- **Architecture**: MMDiT patch blocks + pixel pathway (PiT blocks)
|
| 27 |
+
- **Text encoder**: Gemma-2-2B (`Efficient-Large-Model/gemma-2-2b-it`)
|
| 28 |
+
- **Resolution**: up to 1024×1024
|
| 29 |
+
- **Sampler**: Flow matching (DPM-Solver++ recommended, 20 steps)
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
## Usage
|
| 34 |
+
|
| 35 |
+
```python
|
| 36 |
+
from pixeldit import PixelDiTPipeline
|
| 37 |
+
|
| 38 |
+
pipe = PixelDiTPipeline(pretrained="madtune/pixeldit-diffusers")
|
| 39 |
+
img = pipe("a white horse running in a meadow at sunset", height=512, width=512)[0]
|
| 40 |
+
img.save("out.jpg")
|
| 41 |
+
```
|
| 42 |
+
|
| 43 |
+
Install the package:
|
| 44 |
+
```bash
|
| 45 |
+
git clone https://github.com/madtune/pixeldit-diffusers
|
| 46 |
+
cd pixeldit-diffusers
|
| 47 |
+
pip install transformers accelerate safetensors pillow
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
---
|
| 51 |
+
|
| 52 |
+
## LoRA fine-tuning
|
| 53 |
+
|
| 54 |
+
```python
|
| 55 |
+
from pixeldit import PixelDiTModel
|
| 56 |
+
from peft import get_peft_model, LoraConfig
|
| 57 |
+
|
| 58 |
+
model = PixelDiTModel.from_pretrained("madtune/pixeldit-diffusers")
|
| 59 |
+
lora_cfg = LoraConfig(target_modules=["qkv_x", "qkv_y", "proj_x", "proj_y"])
|
| 60 |
+
model = get_peft_model(model, lora_cfg)
|
| 61 |
+
model.print_trainable_parameters()
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
---
|
| 65 |
+
|
| 66 |
+
## Credits
|
| 67 |
+
|
| 68 |
+
- **Original model**: [NVIDIA Research](https://huggingface.co/nvidia/PixelDiT-1300M-1024px)
|
| 69 |
+
- **Diffusers conversion**: [madtune](https://huggingface.co/madtune)
|
| 70 |
+
- **Paper**: *PixelDiT: Pixel-Space Diffusion Transformers for Text-to-Image Generation* — NVIDIA
|