File size: 3,755 Bytes

f0eba3b
 
 
 
 
 
 
 
 
 
 
 
4c9bb8c
dc71583
f0eba3b
 
dc71583
f0eba3b
dc71583
f0eba3b
dc71583
 
 
 
f0eba3b
dc71583
f0eba3b
dc71583
f0eba3b
dc71583
 
3d7e8b9
 
 
dc71583
f0eba3b
dc71583
f0eba3b
dc71583
f0eba3b
dc71583
 
 
 
f0eba3b
dc71583
f0eba3b
dc71583
f0eba3b
 
dc71583
f0eba3b
 
dc71583
f0eba3b
 
 
 
 
 
dc71583
f0eba3b
dc71583
f0eba3b
dc71583
 
 
 
 
 
 
 
 
 
 
 
f0eba3b
 
 
 
 
dc71583
 
 
f0eba3b
 
3d7e8b9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
dc71583
 
f0eba3b
 
 
dc71583
f0eba3b
 
dc71583
 
 
 
 
f0eba3b
dc71583
f0eba3b

---
license: apache-2.0
language:
  - en
library_name: diffusers
tags:
  - diffusers
  - image-generation
  - class-conditional
  - nit
pipeline_tag: unconditional-image-generation
widget:
  - output:
      url: demo.png
---

# NiT-diffusers

Native diffusers implementation of **NiT** (Native-resolution Image Transformer). Each variant folder is self-contained:

- `pipeline.py` — `NiTPipeline`
- `scheduler/scheduler_config.json` — `FlowMatchEulerDiscreteScheduler` config (class ships with Diffusers)
- `transformer/nit_transformer_2d.py` — `NiTTransformer2DModel`
- `vae/` — `AutoencoderDC` weights + config

No separate `NiT-diffusers` package at inference time; only PyPI `diffusers` plus local custom code in the variant directory.

## Available checkpoints

| Checkpoint | Path | Resolution | Recommended settings |
| --- | --- | --- | --- |
| NiT-S | `./NiT-S` | 256×256 | 250 steps, CFG 2.25, interval (0.0, 0.7) |
| NiT-B | `./NiT-B` | 256×256 | 250 steps, CFG 2.25, interval (0.0, 0.7) |
| NiT-L | `./NiT-L` | 512×512 | 250 steps, CFG 2.05, interval (0.0, 0.7) |
| NiT-XL | `./NiT-XL` | 512×512 | 250 steps, CFG 2.05, interval (0.0, 0.7) |

## ImageNet class labels

Each variant keeps an English `id2label` map directly in its own `model_index.json` (DiT-style).

- `pipe.id2label` — inspect id → English label correspondence
- `pipe.labels` — reverse map (English synonym → id), sorted for browsing
- `pipe.get_label_ids("golden retriever")`
- `pipe(class_labels="golden retriever", ...)` — string labels resolved automatically

## Inference

Run the bundled demo script from the repo root:

```bash
python demo_inference.py
```

This writes `demo.png` using `NiT-XL` with the settings below.

```python
from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path("./NiT-XL").resolve()
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

print(pipe.id2label[207])
print(pipe.get_label_ids("golden retriever"))

generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
    class_labels="golden retriever",
    height=512,
    width=512,
    num_inference_steps=250,
    guidance_scale=2.05,
    guidance_interval=(0.0, 0.7),
    generator=generator,
).images[0]
image.save("demo.png")
```

Load a **variant subfolder** (e.g. `./NiT-XL`, `./NiT-L`, `./NiT-B`, or `./NiT-S`), not the repo root.

For NiT-S / NiT-B at 256×256 (official defaults):

```python
model_dir = Path("./NiT-S").resolve()  # or ./NiT-B
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

image = pipe(
    class_labels="golden retriever",
    height=256,
    width=256,
    num_inference_steps=250,
    guidance_scale=2.25,
    guidance_interval=(0.0, 0.7),
    generator=torch.Generator(device="cuda").manual_seed(42),
).images[0]
```

Hub usage follows Hugging Face model-id style (`UserID/RepoID`):

```python
from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "BiliSakura/NiT-diffusers",
    subfolder="NiT-XL",
    custom_pipeline="pipeline.py",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")
```

## Citation

```bibtex
@article{wang2025native,
  title={Native-Resolution Image Synthesis},
  author={Wang, Zidong and Bai, Lei and Yue, Xiangyu and Ouyang, Wanli and Zhang, Yiyuan},
  year={2025},
  eprint={2506.03131},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
```