blanchon
/

nitro_e_512_lite

Model card Files Files and versions

nitro_e_512_lite / README.md

blanchon's picture

Add model card for DC-AE-Lite variant

5afb087 verified about 2 months ago

|

history blame contribute delete

3 kB

	---
	license: mit
	library_name: diffusers
	tags:
	- text-to-image
	- diffusion
	- nitro-e
	- amd
	- dc-ae-lite
	base_model: amd/Nitro-E
	---

	# Nitro-E 512px Lite - Fast Decoding Variant

	This is the Nitro-E 512px text-to-image diffusion model with DC-AE-Lite for faster image decoding.

	## Key Features

	- 🚀 1.8× Faster Decoding: Uses DC-AE-Lite instead of standard DC-AE
	- 🎯 Same Quality: Similar reconstruction quality to standard DC-AE
	- ⚡ Drop-in Compatible: Uses the same Nitro-E transformer weights
	- 💾 Memory Efficient: Smaller decoder footprint

	## Performance Comparison

	\| VAE Variant \| Decoding Speed \| Quality \|
	\|-------------\|---------------\|---------\|
	\| DC-AE (Standard) \| 1.0× \| Reference \|
	\| DC-AE-Lite \| 1.8× \| Similar \|

	This makes Nitro-E even faster for real-time applications!

	## Model Details

	- Transformer: Nitro-E 512px (304M parameters)
	- VAE: DC-AE-Lite-f32c32 (faster decoder)
	- Text Encoder: Llama-3.2-1B
	- Scheduler: Flow Matching with Euler Discrete
	- Attention: Alternating Subregion Attention (ASA)

	## Usage

	```python
	import torch
	from diffusers import NitroEPipeline

	# Load the lite variant
	pipe = NitroEPipeline.from_pretrained(
	"blanchon/nitro_e_512_lite",
	torch_dtype=torch.bfloat16
	)
	pipe.to("cuda")

	# Generate image (1.8x faster decoding!)
	prompt = "A hot air balloon in the shape of a heart grand canyon"
	image = pipe(
	prompt=prompt,
	width=512,
	height=512,
	num_inference_steps=20,
	guidance_scale=4.5,
	).images[0]

	image.save("output.png")
	```

	## When to Use This Variant

	Use DC-AE-Lite (this model) when:
	- You need faster inference
	- Running real-time applications
	- Batch processing many images
	- Decoding is your bottleneck

	Use standard DC-AE when:
	- You need absolute best reconstruction quality
	- Decoding speed is not critical

	## Technical Details

	### Architecture
	- Type: E-MMDiT (Efficient Multi-scale Masked Diffusion Transformer)
	- Attention: Alternating Subregion Attention (ASA)
	- Text Encoder: Llama-3.2-1B
	- VAE: DC-AE-Lite-f32c32 (1.8× faster decoding)
	- Scheduler: Flow Matching with Euler Discrete Scheduler
	- Latent Size: 16×16 for 512×512 images

	### Recommended Settings
	- Steps: 20 (good quality/speed trade-off)
	- Guidance Scale: 4.5 (balanced)
	- Resolution: 512×512 (optimized)

	## Citation

	```bibtex
	@article{nitro-e-2025,
	title={Nitro-E: Efficient Training of Diffusion Models},
	author={AMD AI Group},
	journal={arXiv preprint arXiv:2510.27135},
	year={2025}
	}
	```

	## License

	Copyright (c) 2025 Advanced Micro Devices, Inc. All Rights Reserved.

	Licensed under the MIT License.

	## Related Models

	- [Nitro-E 512px (Standard DC-AE)](https://huggingface.co/blanchon/nitro_e_512)
	- [Nitro-E 1024px](https://huggingface.co/blanchon/nitro_e_1024)
	- [Original AMD Nitro-E](https://huggingface.co/amd/Nitro-E)
	- [DC-AE-Lite VAE](https://huggingface.co/dc-ai/dc-ae-lite-f32c32-diffusers)