|
|
--- |
|
|
license: mit |
|
|
library_name: diffusers |
|
|
tags: |
|
|
- text-to-image |
|
|
- diffusion |
|
|
- nitro-e |
|
|
- amd |
|
|
- dc-ae-lite |
|
|
base_model: amd/Nitro-E |
|
|
--- |
|
|
|
|
|
# Nitro-E 512px Lite - Fast Decoding Variant |
|
|
|
|
|
This is the Nitro-E 512px text-to-image diffusion model with **DC-AE-Lite** for faster image decoding. |
|
|
|
|
|
## Key Features |
|
|
|
|
|
- π **1.8Γ Faster Decoding**: Uses DC-AE-Lite instead of standard DC-AE |
|
|
- π― **Same Quality**: Similar reconstruction quality to standard DC-AE |
|
|
- β‘ **Drop-in Compatible**: Uses the same Nitro-E transformer weights |
|
|
- πΎ **Memory Efficient**: Smaller decoder footprint |
|
|
|
|
|
## Performance Comparison |
|
|
|
|
|
| VAE Variant | Decoding Speed | Quality | |
|
|
|-------------|---------------|---------| |
|
|
| DC-AE (Standard) | 1.0Γ | Reference | |
|
|
| **DC-AE-Lite** | **1.8Γ** | Similar | |
|
|
|
|
|
This makes Nitro-E even faster for real-time applications! |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Transformer**: Nitro-E 512px (304M parameters) |
|
|
- **VAE**: DC-AE-Lite-f32c32 (faster decoder) |
|
|
- **Text Encoder**: Llama-3.2-1B |
|
|
- **Scheduler**: Flow Matching with Euler Discrete |
|
|
- **Attention**: Alternating Subregion Attention (ASA) |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from diffusers import NitroEPipeline |
|
|
|
|
|
# Load the lite variant |
|
|
pipe = NitroEPipeline.from_pretrained( |
|
|
"blanchon/nitro_e_512_lite", |
|
|
torch_dtype=torch.bfloat16 |
|
|
) |
|
|
pipe.to("cuda") |
|
|
|
|
|
# Generate image (1.8x faster decoding!) |
|
|
prompt = "A hot air balloon in the shape of a heart grand canyon" |
|
|
image = pipe( |
|
|
prompt=prompt, |
|
|
width=512, |
|
|
height=512, |
|
|
num_inference_steps=20, |
|
|
guidance_scale=4.5, |
|
|
).images[0] |
|
|
|
|
|
image.save("output.png") |
|
|
``` |
|
|
|
|
|
## When to Use This Variant |
|
|
|
|
|
**Use DC-AE-Lite (this model) when:** |
|
|
- You need faster inference |
|
|
- Running real-time applications |
|
|
- Batch processing many images |
|
|
- Decoding is your bottleneck |
|
|
|
|
|
**Use standard DC-AE when:** |
|
|
- You need absolute best reconstruction quality |
|
|
- Decoding speed is not critical |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
### Architecture |
|
|
- **Type**: E-MMDiT (Efficient Multi-scale Masked Diffusion Transformer) |
|
|
- **Attention**: Alternating Subregion Attention (ASA) |
|
|
- **Text Encoder**: Llama-3.2-1B |
|
|
- **VAE**: DC-AE-Lite-f32c32 (1.8Γ faster decoding) |
|
|
- **Scheduler**: Flow Matching with Euler Discrete Scheduler |
|
|
- **Latent Size**: 16Γ16 for 512Γ512 images |
|
|
|
|
|
### Recommended Settings |
|
|
- **Steps**: 20 (good quality/speed trade-off) |
|
|
- **Guidance Scale**: 4.5 (balanced) |
|
|
- **Resolution**: 512Γ512 (optimized) |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{nitro-e-2025, |
|
|
title={Nitro-E: Efficient Training of Diffusion Models}, |
|
|
author={AMD AI Group}, |
|
|
journal={arXiv preprint arXiv:2510.27135}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Copyright (c) 2025 Advanced Micro Devices, Inc. All Rights Reserved. |
|
|
|
|
|
Licensed under the MIT License. |
|
|
|
|
|
## Related Models |
|
|
|
|
|
- [Nitro-E 512px (Standard DC-AE)](https://huggingface.co/blanchon/nitro_e_512) |
|
|
- [Nitro-E 1024px](https://huggingface.co/blanchon/nitro_e_1024) |
|
|
- [Original AMD Nitro-E](https://huggingface.co/amd/Nitro-E) |
|
|
- [DC-AE-Lite VAE](https://huggingface.co/dc-ai/dc-ae-lite-f32c32-diffusers) |
|
|
|