File size: 2,743 Bytes

64886e5
be0d942
64886e5
be0d942
 
 
 
 
 
64886e5
 
be0d942
64886e5
be0d942
64886e5
be0d942
64886e5
be0d942
64886e5
be0d942
 
 
 
 
64886e5
be0d942
64886e5
be0d942
64886e5
be0d942
64886e5
be0d942
64886e5
be0d942
64886e5
be0d942
64886e5
be0d942
 
 
64886e5
be0d942
 
 
64886e5
be0d942
 
 
 
 
 
 
 
 
64886e5
be0d942
 
64886e5
be0d942
64886e5
be0d942
 
 
 
 
 
 
64886e5
be0d942
 
 
 
64886e5
be0d942
64886e5
be0d942
64886e5
be0d942
 
 
 
 
 
 
 
64886e5
be0d942
64886e5
be0d942
64886e5
be0d942
64886e5
be0d942
64886e5
be0d942

---
license: mit
library_name: diffusers
tags:
- text-to-image
- diffusion
- nitro-e
- amd
base_model: amd/Nitro-E
---

# Nitro-E 1024px - Diffusers Integration

This is the Nitro-E 1024px text-to-image diffusion model in diffusers format.

## Model Description

Nitro-E is a family of text-to-image diffusion models focused on highly efficient training. With just 304M parameters, Nitro-E is designed to be resource-friendly for both training and inference.

**Key Features:**
- 304M parameters
- Efficient training: 1.5 days on 8x AMD Instinct MI300X GPUs
- High throughput: Optimized samples/second on single MI300X
- Consumer GPU support: Fast per 1024px image on Strix Halo iGPU

## Model Variant

This is the **1024px** variant, optimized for generating 1024x1024 images.

**Note**: This variant uses standard attention (no ASA subsampling).

## Original Model

This model is based on [amd/Nitro-E](https://huggingface.co/amd/Nitro-E) and has been converted to the diffusers format for easier integration and use.

## Usage

```python
import torch
from diffusers import NitroEPipeline

# Load pipeline
pipe = NitroEPipeline.from_pretrained("blanchon/nitro_e_1024", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

# Generate 1024x1024 image
prompt = "A hot air balloon in the shape of a heart grand canyon"
image = pipe(
    prompt=prompt,
    width=1024,
    height=1024,
    num_inference_steps=20,
    guidance_scale=4.5,
).images[0]

image.save("output.png")
```

## Technical Details

### Architecture
- **Type**: E-MMDiT (Efficient Multi-scale Masked Diffusion Transformer)
- **Attention**: Standard attention
- **Text Encoder**: Llama-3.2-1B
- **VAE**: DC-AE-f32c32 from MIT-Han-Lab
- **Scheduler**: Flow Matching with Euler Discrete Scheduler
- **Sample Size**: 32 (latent space)

### Training
- **Dataset**: ~25M images (real + synthetic)
- **Duration**: 1.5 days on 8x AMD Instinct MI300X GPUs
- **Training Details**: See [Nitro-E Technical Report](https://arxiv.org/abs/2510.27135)

## Citation

If you use this model, please cite:

```bibtex
@article{nitro-e-2025,
  title={Nitro-E: Efficient Training of Diffusion Models},
  author={AMD AI Group},
  journal={arXiv preprint arXiv:2510.27135},
  year={2025}
}
```

## License

Copyright (c) 2025 Advanced Micro Devices, Inc. All Rights Reserved.

Licensed under the MIT License. See the [LICENSE](https://mit-license.org/) for details.

## Related Projects

- [Nitro-T](https://github.com/AMD-AGI/Nitro-T): Efficient Training of diffusion models
- [Nitro-1](https://github.com/AMD-AGI/Nitro-1): One-step distillation of diffusion models
- [Original Nitro-E Repository](https://github.com/AMD-AGI/Nitro-E)
- [AMD Nitro-E on HuggingFace](https://huggingface.co/amd/Nitro-E)