|
|
--- |
|
|
license: mit |
|
|
library_name: diffusers |
|
|
tags: |
|
|
- text-to-image |
|
|
- diffusion |
|
|
- nitro-e |
|
|
- amd |
|
|
base_model: amd/Nitro-E |
|
|
--- |
|
|
|
|
|
# Nitro-E 1024px - Diffusers Integration |
|
|
|
|
|
This is the Nitro-E 1024px text-to-image diffusion model in diffusers format. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
Nitro-E is a family of text-to-image diffusion models focused on highly efficient training. With just 304M parameters, Nitro-E is designed to be resource-friendly for both training and inference. |
|
|
|
|
|
**Key Features:** |
|
|
- 304M parameters |
|
|
- Efficient training: 1.5 days on 8x AMD Instinct MI300X GPUs |
|
|
- High throughput: Optimized samples/second on single MI300X |
|
|
- Consumer GPU support: Fast per 1024px image on Strix Halo iGPU |
|
|
|
|
|
## Model Variant |
|
|
|
|
|
This is the **1024px** variant, optimized for generating 1024x1024 images. |
|
|
|
|
|
**Note**: This variant uses standard attention (no ASA subsampling). |
|
|
|
|
|
## Original Model |
|
|
|
|
|
This model is based on [amd/Nitro-E](https://huggingface.co/amd/Nitro-E) and has been converted to the diffusers format for easier integration and use. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from diffusers import NitroEPipeline |
|
|
|
|
|
# Load pipeline |
|
|
pipe = NitroEPipeline.from_pretrained("blanchon/nitro_e_1024", torch_dtype=torch.bfloat16) |
|
|
pipe = pipe.to("cuda") |
|
|
|
|
|
# Generate 1024x1024 image |
|
|
prompt = "A hot air balloon in the shape of a heart grand canyon" |
|
|
image = pipe( |
|
|
prompt=prompt, |
|
|
width=1024, |
|
|
height=1024, |
|
|
num_inference_steps=20, |
|
|
guidance_scale=4.5, |
|
|
).images[0] |
|
|
|
|
|
image.save("output.png") |
|
|
``` |
|
|
|
|
|
## Technical Details |
|
|
|
|
|
### Architecture |
|
|
- **Type**: E-MMDiT (Efficient Multi-scale Masked Diffusion Transformer) |
|
|
- **Attention**: Standard attention |
|
|
- **Text Encoder**: Llama-3.2-1B |
|
|
- **VAE**: DC-AE-f32c32 from MIT-Han-Lab |
|
|
- **Scheduler**: Flow Matching with Euler Discrete Scheduler |
|
|
- **Sample Size**: 32 (latent space) |
|
|
|
|
|
### Training |
|
|
- **Dataset**: ~25M images (real + synthetic) |
|
|
- **Duration**: 1.5 days on 8x AMD Instinct MI300X GPUs |
|
|
- **Training Details**: See [Nitro-E Technical Report](https://arxiv.org/abs/2510.27135) |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@article{nitro-e-2025, |
|
|
title={Nitro-E: Efficient Training of Diffusion Models}, |
|
|
author={AMD AI Group}, |
|
|
journal={arXiv preprint arXiv:2510.27135}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Copyright (c) 2025 Advanced Micro Devices, Inc. All Rights Reserved. |
|
|
|
|
|
Licensed under the MIT License. See the [LICENSE](https://mit-license.org/) for details. |
|
|
|
|
|
## Related Projects |
|
|
|
|
|
- [Nitro-T](https://github.com/AMD-AGI/Nitro-T): Efficient Training of diffusion models |
|
|
- [Nitro-1](https://github.com/AMD-AGI/Nitro-1): One-step distillation of diffusion models |
|
|
- [Original Nitro-E Repository](https://github.com/AMD-AGI/Nitro-E) |
|
|
- [AMD Nitro-E on HuggingFace](https://huggingface.co/amd/Nitro-E) |
|
|
|