nitro_e_512_lite / README.md
blanchon's picture
Add model card for DC-AE-Lite variant
5afb087 verified
---
license: mit
library_name: diffusers
tags:
- text-to-image
- diffusion
- nitro-e
- amd
- dc-ae-lite
base_model: amd/Nitro-E
---
# Nitro-E 512px Lite - Fast Decoding Variant
This is the Nitro-E 512px text-to-image diffusion model with **DC-AE-Lite** for faster image decoding.
## Key Features
- πŸš€ **1.8Γ— Faster Decoding**: Uses DC-AE-Lite instead of standard DC-AE
- 🎯 **Same Quality**: Similar reconstruction quality to standard DC-AE
- ⚑ **Drop-in Compatible**: Uses the same Nitro-E transformer weights
- πŸ’Ύ **Memory Efficient**: Smaller decoder footprint
## Performance Comparison
| VAE Variant | Decoding Speed | Quality |
|-------------|---------------|---------|
| DC-AE (Standard) | 1.0Γ— | Reference |
| **DC-AE-Lite** | **1.8Γ—** | Similar |
This makes Nitro-E even faster for real-time applications!
## Model Details
- **Transformer**: Nitro-E 512px (304M parameters)
- **VAE**: DC-AE-Lite-f32c32 (faster decoder)
- **Text Encoder**: Llama-3.2-1B
- **Scheduler**: Flow Matching with Euler Discrete
- **Attention**: Alternating Subregion Attention (ASA)
## Usage
```python
import torch
from diffusers import NitroEPipeline
# Load the lite variant
pipe = NitroEPipeline.from_pretrained(
"blanchon/nitro_e_512_lite",
torch_dtype=torch.bfloat16
)
pipe.to("cuda")
# Generate image (1.8x faster decoding!)
prompt = "A hot air balloon in the shape of a heart grand canyon"
image = pipe(
prompt=prompt,
width=512,
height=512,
num_inference_steps=20,
guidance_scale=4.5,
).images[0]
image.save("output.png")
```
## When to Use This Variant
**Use DC-AE-Lite (this model) when:**
- You need faster inference
- Running real-time applications
- Batch processing many images
- Decoding is your bottleneck
**Use standard DC-AE when:**
- You need absolute best reconstruction quality
- Decoding speed is not critical
## Technical Details
### Architecture
- **Type**: E-MMDiT (Efficient Multi-scale Masked Diffusion Transformer)
- **Attention**: Alternating Subregion Attention (ASA)
- **Text Encoder**: Llama-3.2-1B
- **VAE**: DC-AE-Lite-f32c32 (1.8Γ— faster decoding)
- **Scheduler**: Flow Matching with Euler Discrete Scheduler
- **Latent Size**: 16Γ—16 for 512Γ—512 images
### Recommended Settings
- **Steps**: 20 (good quality/speed trade-off)
- **Guidance Scale**: 4.5 (balanced)
- **Resolution**: 512Γ—512 (optimized)
## Citation
```bibtex
@article{nitro-e-2025,
title={Nitro-E: Efficient Training of Diffusion Models},
author={AMD AI Group},
journal={arXiv preprint arXiv:2510.27135},
year={2025}
}
```
## License
Copyright (c) 2025 Advanced Micro Devices, Inc. All Rights Reserved.
Licensed under the MIT License.
## Related Models
- [Nitro-E 512px (Standard DC-AE)](https://huggingface.co/blanchon/nitro_e_512)
- [Nitro-E 1024px](https://huggingface.co/blanchon/nitro_e_1024)
- [Original AMD Nitro-E](https://huggingface.co/amd/Nitro-E)
- [DC-AE-Lite VAE](https://huggingface.co/dc-ai/dc-ae-lite-f32c32-diffusers)