File size: 3,755 Bytes
f0eba3b
 
 
 
 
 
 
 
 
 
 
 
4c9bb8c
dc71583
f0eba3b
 
dc71583
f0eba3b
dc71583
f0eba3b
dc71583
 
 
 
f0eba3b
dc71583
f0eba3b
dc71583
f0eba3b
dc71583
 
3d7e8b9
 
 
dc71583
f0eba3b
dc71583
f0eba3b
dc71583
f0eba3b
dc71583
 
 
 
f0eba3b
dc71583
f0eba3b
dc71583
f0eba3b
 
dc71583
f0eba3b
 
dc71583
f0eba3b
 
 
 
 
 
dc71583
f0eba3b
dc71583
f0eba3b
dc71583
 
 
 
 
 
 
 
 
 
 
 
f0eba3b
 
 
 
 
dc71583
 
 
f0eba3b
 
3d7e8b9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dc71583
 
f0eba3b
 
 
dc71583
f0eba3b
 
dc71583
 
 
 
 
f0eba3b
dc71583
f0eba3b
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
license: apache-2.0
language:
  - en
library_name: diffusers
tags:
  - diffusers
  - image-generation
  - class-conditional
  - nit
pipeline_tag: unconditional-image-generation
widget:
  - output:
      url: demo.png
---

# NiT-diffusers

Native diffusers implementation of **NiT** (Native-resolution Image Transformer). Each variant folder is self-contained:

- `pipeline.py` β€” `NiTPipeline`
- `scheduler/scheduler_config.json` β€” `FlowMatchEulerDiscreteScheduler` config (class ships with Diffusers)
- `transformer/nit_transformer_2d.py` β€” `NiTTransformer2DModel`
- `vae/` β€” `AutoencoderDC` weights + config

No separate `NiT-diffusers` package at inference time; only PyPI `diffusers` plus local custom code in the variant directory.

## Available checkpoints

| Checkpoint | Path | Resolution | Recommended settings |
| --- | --- | --- | --- |
| NiT-S | `./NiT-S` | 256Γ—256 | 250 steps, CFG 2.25, interval (0.0, 0.7) |
| NiT-B | `./NiT-B` | 256Γ—256 | 250 steps, CFG 2.25, interval (0.0, 0.7) |
| NiT-L | `./NiT-L` | 512Γ—512 | 250 steps, CFG 2.05, interval (0.0, 0.7) |
| NiT-XL | `./NiT-XL` | 512Γ—512 | 250 steps, CFG 2.05, interval (0.0, 0.7) |

## ImageNet class labels

Each variant keeps an English `id2label` map directly in its own `model_index.json` (DiT-style).

- `pipe.id2label` β€” inspect id β†’ English label correspondence
- `pipe.labels` β€” reverse map (English synonym β†’ id), sorted for browsing
- `pipe.get_label_ids("golden retriever")`
- `pipe(class_labels="golden retriever", ...)` β€” string labels resolved automatically

## Inference

Run the bundled demo script from the repo root:

```bash
python demo_inference.py
```

This writes `demo.png` using `NiT-XL` with the settings below.

```python
from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path("./NiT-XL").resolve()
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))

generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
    class_labels="golden retriever",
    height=512,
    width=512,
    num_inference_steps=250,
    guidance_scale=2.05,
    guidance_interval=(0.0, 0.7),
    generator=generator,
).images[0]
image.save("demo.png")
```

Load a **variant subfolder** (e.g. `./NiT-XL`, `./NiT-L`, `./NiT-B`, or `./NiT-S`), not the repo root.

For NiT-S / NiT-B at 256Γ—256 (official defaults):

```python
model_dir = Path("./NiT-S").resolve()  # or ./NiT-B
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

image = pipe(
    class_labels="golden retriever",
    height=256,
    width=256,
    num_inference_steps=250,
    guidance_scale=2.25,
    guidance_interval=(0.0, 0.7),
    generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]
```

Hub usage follows Hugging Face model-id style (`UserID/RepoID`):

```python
from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "BiliSakura/NiT-diffusers",
    subfolder="NiT-XL",
    custom_pipeline="pipeline.py",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
```

## Citation

```bibtex
@article{wang2025native,
  title={Native-Resolution Image Synthesis},
  author={Wang, Zidong and Bai, Lei and Yue, Xiangyu and Ouyang, Wanli and Zhang, Yiyuan},
  year={2025},
  eprint={2506.03131},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
```