madtune commited on
Commit
fe7e8a6
·
verified ·
1 Parent(s): 05cba49

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +68 -1
README.md CHANGED
@@ -1,3 +1,70 @@
1
  ---
2
- license: mit
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: other
3
+ tags:
4
+ - text-to-image
5
+ - diffusion
6
+ - pixeldit
7
+ - nvidia
8
+ - pixel-space
9
+ base_model: nvidia/PixelDiT-1300M-1024px
10
  ---
11
+
12
+ # PixelDiT 1.3B — Diffusers-Compatible Conversion
13
+
14
+ This is an **unofficial** HuggingFace-compatible conversion of NVIDIA's [PixelDiT-1300M-1024px](https://huggingface.co/nvidia/PixelDiT-1300M-1024px) model.
15
+
16
+ All credit goes to the original authors at NVIDIA. This repo only provides a `PreTrainedModel` wrapper to enable `from_pretrained`, `save_pretrained`, and LoRA fine-tuning via `peft`.
17
+
18
+ > **I do not own this model.** Original weights, architecture, and training are the work of NVIDIA Research. Please refer to their [original repository](https://huggingface.co/nvidia/PixelDiT-1300M-1024px) for license terms.
19
+
20
+ ---
21
+
22
+ ## What is PixelDiT?
23
+
24
+ PixelDiT is a 1.3B parameter pixel-space diffusion transformer — no VAE, generates images directly in pixel space. Text conditioning uses Gemma-2-2B with a chi_prompt prefix to produce rich visual descriptions.
25
+
26
+ - **Architecture**: MMDiT patch blocks + pixel pathway (PiT blocks)
27
+ - **Text encoder**: Gemma-2-2B (`Efficient-Large-Model/gemma-2-2b-it`)
28
+ - **Resolution**: up to 1024×1024
29
+ - **Sampler**: Flow matching (DPM-Solver++ recommended, 20 steps)
30
+
31
+ ---
32
+
33
+ ## Usage
34
+
35
+ ```python
36
+ from pixeldit import PixelDiTPipeline
37
+
38
+ pipe = PixelDiTPipeline(pretrained="madtune/pixeldit-diffusers")
39
+ img = pipe("a white horse running in a meadow at sunset", height=512, width=512)[0]
40
+ img.save("out.jpg")
41
+ ```
42
+
43
+ Install the package:
44
+ ```bash
45
+ git clone https://github.com/madtune/pixeldit-diffusers
46
+ cd pixeldit-diffusers
47
+ pip install transformers accelerate safetensors pillow
48
+ ```
49
+
50
+ ---
51
+
52
+ ## LoRA fine-tuning
53
+
54
+ ```python
55
+ from pixeldit import PixelDiTModel
56
+ from peft import get_peft_model, LoraConfig
57
+
58
+ model = PixelDiTModel.from_pretrained("madtune/pixeldit-diffusers")
59
+ lora_cfg = LoraConfig(target_modules=["qkv_x", "qkv_y", "proj_x", "proj_y"])
60
+ model = get_peft_model(model, lora_cfg)
61
+ model.print_trainable_parameters()
62
+ ```
63
+
64
+ ---
65
+
66
+ ## Credits
67
+
68
+ - **Original model**: [NVIDIA Research](https://huggingface.co/nvidia/PixelDiT-1300M-1024px)
69
+ - **Diffusers conversion**: [madtune](https://huggingface.co/madtune)
70
+ - **Paper**: *PixelDiT: Pixel-Space Diffusion Transformers for Text-to-Image Generation* — NVIDIA