madtune commited on
Commit
50e9bfe
·
verified ·
1 Parent(s): d506f16

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -21,12 +21,13 @@ All credit goes to the original authors at NVIDIA. This repo only provides a `Di
21
 
22
  ## What is PixelDiT?
23
 
24
- PixelDiT is a 1.3B parameter **pixel-space** diffusion transformer — no VAE, generates images directly in pixel space. Runs on **4GB VRAM** at 512px.
25
 
26
  - **Architecture**: MMDiT patch blocks + pixel pathway (PiT blocks)
27
  - **Text encoder**: Gemma-2-2B with chi_prompt instruction prefix
28
- - **Resolution**: up to 1024×1024
29
  - **Sampler**: Flow matching (FlowMatchEulerDiscreteScheduler, shift=4.0)
 
30
 
31
  ---
32
 
@@ -68,9 +69,9 @@ pipe.enable_model_cpu_offload()
68
  image = pipe(
69
  "a white horse galloping through a meadow at sunset, cinematic lighting",
70
  negative_prompt="blurry, flat, low quality, cartoon",
71
- height=512, width=512,
72
- num_inference_steps=20,
73
- guidance_scale=3.5,
74
  ).images[0]
75
  image.save("out.jpg")
76
  ```
 
21
 
22
  ## What is PixelDiT?
23
 
24
+ PixelDiT is a 1.3B parameter **pixel-space** diffusion transformer — no VAE, generates images directly in pixel space. Runs on **4GB VRAM**.
25
 
26
  - **Architecture**: MMDiT patch blocks + pixel pathway (PiT blocks)
27
  - **Text encoder**: Gemma-2-2B with chi_prompt instruction prefix
28
+ - **Native resolution**: 1024×1024
29
  - **Sampler**: Flow matching (FlowMatchEulerDiscreteScheduler, shift=4.0)
30
+ - **Minimum steps**: 45–50 — below 45 produces garbage output
31
 
32
  ---
33
 
 
69
  image = pipe(
70
  "a white horse galloping through a meadow at sunset, cinematic lighting",
71
  negative_prompt="blurry, flat, low quality, cartoon",
72
+ height=1024, width=1024,
73
+ num_inference_steps=50, # minimum 45 — below that produces garbage
74
+ guidance_scale=7.5,
75
  ).images[0]
76
  image.save("out.jpg")
77
  ```