blanchon
/

nitro_e_1024

Model card Files Files and versions

nitro_e_1024 / README.md

blanchon's picture

Add model card

be0d942 verified 3 months ago

|

history blame contribute delete

2.74 kB

	---
	license: mit
	library_name: diffusers
	tags:
	- text-to-image
	- diffusion
	- nitro-e
	- amd
	base_model: amd/Nitro-E
	---

	# Nitro-E 1024px - Diffusers Integration

	This is the Nitro-E 1024px text-to-image diffusion model in diffusers format.

	## Model Description

	Nitro-E is a family of text-to-image diffusion models focused on highly efficient training. With just 304M parameters, Nitro-E is designed to be resource-friendly for both training and inference.

	Key Features:
	- 304M parameters
	- Efficient training: 1.5 days on 8x AMD Instinct MI300X GPUs
	- High throughput: Optimized samples/second on single MI300X
	- Consumer GPU support: Fast per 1024px image on Strix Halo iGPU

	## Model Variant

	This is the 1024px variant, optimized for generating 1024x1024 images.

	Note: This variant uses standard attention (no ASA subsampling).

	## Original Model

	This model is based on [amd/Nitro-E](https://huggingface.co/amd/Nitro-E) and has been converted to the diffusers format for easier integration and use.

	## Usage

	```python
	import torch
	from diffusers import NitroEPipeline

	# Load pipeline
	pipe = NitroEPipeline.from_pretrained("blanchon/nitro_e_1024", torch_dtype=torch.bfloat16)
	pipe = pipe.to("cuda")

	# Generate 1024x1024 image
	prompt = "A hot air balloon in the shape of a heart grand canyon"
	image = pipe(
	prompt=prompt,
	width=1024,
	height=1024,
	num_inference_steps=20,
	guidance_scale=4.5,
	).images[0]

	image.save("output.png")
	```

	## Technical Details

	### Architecture
	- Type: E-MMDiT (Efficient Multi-scale Masked Diffusion Transformer)
	- Attention: Standard attention
	- Text Encoder: Llama-3.2-1B
	- VAE: DC-AE-f32c32 from MIT-Han-Lab
	- Scheduler: Flow Matching with Euler Discrete Scheduler
	- Sample Size: 32 (latent space)

	### Training
	- Dataset: ~25M images (real + synthetic)
	- Duration: 1.5 days on 8x AMD Instinct MI300X GPUs
	- Training Details: See [Nitro-E Technical Report](https://arxiv.org/abs/2510.27135)

	## Citation

	If you use this model, please cite:

	```bibtex
	@article{nitro-e-2025,
	title={Nitro-E: Efficient Training of Diffusion Models},
	author={AMD AI Group},
	journal={arXiv preprint arXiv:2510.27135},
	year={2025}
	}
	```

	## License

	Copyright (c) 2025 Advanced Micro Devices, Inc. All Rights Reserved.

	Licensed under the MIT License. See the [LICENSE](https://mit-license.org/) for details.

	## Related Projects

	- [Nitro-T](https://github.com/AMD-AGI/Nitro-T): Efficient Training of diffusion models
	- [Nitro-1](https://github.com/AMD-AGI/Nitro-1): One-step distillation of diffusion models
	- [Original Nitro-E Repository](https://github.com/AMD-AGI/Nitro-E)
	- [AMD Nitro-E on HuggingFace](https://huggingface.co/amd/Nitro-E)