--- license: apache-2.0 language: - en library_name: diffusers tags: - diffusers - image-generation - class-conditional - nit pipeline_tag: unconditional-image-generation widget: - output: url: demo_images/demo_sde250_class207_seed42.png --- # NiT-XL Diffusers (Class-Conditional) Native-resolution Image Transformer (NiT-XL) checkpoint packaged as a Diffusers-style repository with vendored custom code. ## What is included - `transformer/`: `NiTTransformer2DModel` weights + config - `scheduler/`: `NiTFlowMatchScheduler` config - `vae/`: `AutoencoderDC` weights + config - `custom_pipeline/`: local, self-contained implementation for: - `NiTPipeline` - `NiTTransformer2DModel` - `NiTFlowMatchScheduler` - `test_inference.py`: standalone sampling script This repository does **not** depend on an external `NiT-diffusers` checkout during inference. It includes a root `pipeline.py` custom entrypoint for Diffusers dynamic loading. ## Quickstart ### 1) Environment Install dependencies (example): ```bash pip install torch diffusers safetensors ``` If using this project environment: ```bash conda activate rsgen ``` ### 2) Generate a demo image Run from this repository root: ```bash python test_inference.py \ --class-label 207 \ --height 512 \ --width 512 \ --steps 250 \ --mode sde \ --guidance-scale 2.05 \ --guidance-low 0.0 \ --guidance-high 0.7 \ --output demo_images/demo_sde250_class207_seed42.png ``` ## Python usage ```python from pathlib import Path import torch from diffusers import DiffusionPipeline model_dir = Path(".").resolve() device = "cuda" if torch.cuda.is_available() else "cpu" dtype = torch.bfloat16 if device == "cuda" and torch.cuda.is_bf16_supported() else torch.float32 pipe = DiffusionPipeline.from_pretrained( model_dir, custom_pipeline=str(model_dir / "pipeline.py"), local_files_only=True, ).to(device) if device == "cuda": pipe.transformer.to(dtype=dtype) pipe.vae.to(dtype=dtype) gen = torch.Generator(device=device).manual_seed(42) result = pipe( class_labels=[207], height=512, width=512, num_inference_steps=250, mode="sde", guidance_scale=2.05, guidance_interval=(0.0, 0.7), generator=gen, ) result.images[0].save("demo_images/sample.png") ``` For remote Hub loading: ```python from diffusers import DiffusionPipeline pipe = DiffusionPipeline.from_pretrained( "BiliSakura/NiT-XL-diffusers", custom_pipeline="pipeline", ) ``` ## Recommended inference settings - Resolution: `512x512` - Mode: `sde` - Steps: `250` - Guidance scale: `2.05` - Guidance interval: `(0.0, 0.7)` Using very low steps (for example `2`) is only a smoke test and will produce low-quality images. ## Demo ![NiT-XL demo image](demo_images/demo_sde250_class207_seed42.png) ## Citation If you use this model or the NiT method in your work, please cite: ```bibtex @article{wang2025native, title={Native-Resolution Image Synthesis}, author={Wang, Zidong and Bai, Lei and Yue, Xiangyu and Ouyang, Wanli and Zhang, Yiyuan}, year={2025}, eprint={2506.03131}, archivePrefix={arXiv}, primaryClass={cs.CV} } ``` ## Notes - This is a class-conditional generator (ImageNet label ids), not a text-to-image model. - For reproducibility, set `--seed`. - The vendored custom pipeline keeps inference behavior consistent without external code dependencies.