File size: 2,760 Bytes
87ad322
afe1f12
87ad322
afe1f12
 
 
 
 
 
87ad322
 
afe1f12
87ad322
afe1f12
87ad322
afe1f12
87ad322
afe1f12
87ad322
afe1f12
 
 
 
 
87ad322
afe1f12
87ad322
afe1f12
87ad322
afe1f12
87ad322
afe1f12
87ad322
afe1f12
87ad322
afe1f12
87ad322
afe1f12
 
 
87ad322
afe1f12
 
 
87ad322
afe1f12
 
 
 
 
 
 
 
 
87ad322
afe1f12
 
87ad322
afe1f12
87ad322
afe1f12
 
 
 
 
 
 
87ad322
afe1f12
 
 
 
87ad322
afe1f12
87ad322
afe1f12
87ad322
afe1f12
 
 
 
 
 
 
 
87ad322
afe1f12
87ad322
afe1f12
87ad322
afe1f12
87ad322
afe1f12
87ad322
afe1f12
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
---
license: mit
library_name: diffusers
tags:
- text-to-image
- diffusion
- nitro-e
- amd
base_model: amd/Nitro-E
---

# Nitro-E 512px - Diffusers Integration

This is the Nitro-E 512px text-to-image diffusion model in diffusers format.

## Model Description

Nitro-E is a family of text-to-image diffusion models focused on highly efficient training. With just 304M parameters, Nitro-E is designed to be resource-friendly for both training and inference.

**Key Features:**
- 304M parameters
- Efficient training: 1.5 days on 8x AMD Instinct MI300X GPUs
- High throughput: 18.8 samples/second on single MI300X
- Consumer GPU support: 0.16s per 512px image on Strix Halo iGPU

## Model Variant

This is the **512px** variant, optimized for generating 512x512 images.

**Note**: This variant uses Alternating Subregion Attention (ASA) for efficiency.

## Original Model

This model is based on [amd/Nitro-E](https://huggingface.co/amd/Nitro-E) and has been converted to the diffusers format for easier integration and use.

## Usage

```python
import torch
from diffusers import NitroEPipeline

# Load pipeline
pipe = NitroEPipeline.from_pretrained("blanchon/nitro_e_512", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

# Generate 512x512 image
prompt = "A hot air balloon in the shape of a heart grand canyon"
image = pipe(
    prompt=prompt,
    width=512,
    height=512,
    num_inference_steps=20,
    guidance_scale=4.5,
).images[0]

image.save("output.png")
```

## Technical Details

### Architecture
- **Type**: E-MMDiT (Efficient Multi-scale Masked Diffusion Transformer)
- **Attention**: Alternating Subregion Attention (ASA)
- **Text Encoder**: Llama-3.2-1B
- **VAE**: DC-AE-f32c32 from MIT-Han-Lab
- **Scheduler**: Flow Matching with Euler Discrete Scheduler
- **Sample Size**: 16 (latent space)

### Training
- **Dataset**: ~25M images (real + synthetic)
- **Duration**: 1.5 days on 8x AMD Instinct MI300X GPUs
- **Training Details**: See [Nitro-E Technical Report](https://arxiv.org/abs/2510.27135)

## Citation

If you use this model, please cite:

```bibtex
@article{nitro-e-2025,
  title={Nitro-E: Efficient Training of Diffusion Models},
  author={AMD AI Group},
  journal={arXiv preprint arXiv:2510.27135},
  year={2025}
}
```

## License

Copyright (c) 2025 Advanced Micro Devices, Inc. All Rights Reserved.

Licensed under the MIT License. See the [LICENSE](https://mit-license.org/) for details.

## Related Projects

- [Nitro-T](https://github.com/AMD-AGI/Nitro-T): Efficient Training of diffusion models
- [Nitro-1](https://github.com/AMD-AGI/Nitro-1): One-step distillation of diffusion models
- [Original Nitro-E Repository](https://github.com/AMD-AGI/Nitro-E)
- [AMD Nitro-E on HuggingFace](https://huggingface.co/amd/Nitro-E)