Buckets:
| # AuraFlow | |
| AuraFlow is inspired by [Stable Diffusion 3](../pipelines/stable_diffusion/stable_diffusion_3) and is by far the largest text-to-image generation model that comes with an Apache 2.0 license. This model achieves state-of-the-art results on the [GenEval](https://github.com/djghosh13/geneval) benchmark. | |
| It was developed by the Fal team and more details about it can be found in [this blog post](https://blog.fal.ai/auraflow/). | |
| > [!TIP] | |
| > AuraFlow can be quite expensive to run on consumer hardware devices. However, you can perform a suite of optimizations to run it faster and in a more memory-friendly manner. Check out [this section](https://huggingface.co/blog/sd3#memory-optimizations-for-sd3) for more details. | |
| ## Quantization | |
| Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model. | |
| Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [AuraFlowPipeline](/docs/diffusers/pr_12595/en/api/pipelines/aura_flow#diffusers.AuraFlowPipeline) for inference with bitsandbytes. | |
| ```py | |
| import torch | |
| from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, AuraFlowTransformer2DModel, AuraFlowPipeline | |
| from transformers import BitsAndBytesConfig as BitsAndBytesConfig, T5EncoderModel | |
| quant_config = BitsAndBytesConfig(load_in_8bit=True) | |
| text_encoder_8bit = T5EncoderModel.from_pretrained( | |
| "fal/AuraFlow", | |
| subfolder="text_encoder", | |
| quantization_config=quant_config, | |
| torch_dtype=torch.float16, | |
| ) | |
| quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True) | |
| transformer_8bit = AuraFlowTransformer2DModel.from_pretrained( | |
| "fal/AuraFlow", | |
| subfolder="transformer", | |
| quantization_config=quant_config, | |
| torch_dtype=torch.float16, | |
| ) | |
| pipeline = AuraFlowPipeline.from_pretrained( | |
| "fal/AuraFlow", | |
| text_encoder=text_encoder_8bit, | |
| transformer=transformer_8bit, | |
| torch_dtype=torch.float16, | |
| device_map="balanced", | |
| ) | |
| prompt = "a tiny astronaut hatching from an egg on the moon" | |
| image = pipeline(prompt).images[0] | |
| image.save("auraflow.png") | |
| ``` | |
| Loading [GGUF checkpoints](https://huggingface.co/docs/diffusers/quantization/gguf) are also supported: | |
| ```py | |
| import torch | |
| from diffusers import ( | |
| AuraFlowPipeline, | |
| GGUFQuantizationConfig, | |
| AuraFlowTransformer2DModel, | |
| ) | |
| transformer = AuraFlowTransformer2DModel.from_single_file( | |
| "https://huggingface.co/city96/AuraFlow-v0.3-gguf/blob/main/aura_flow_0.3-Q2_K.gguf", | |
| quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16), | |
| torch_dtype=torch.bfloat16, | |
| ) | |
| pipeline = AuraFlowPipeline.from_pretrained( | |
| "fal/AuraFlow-v0.3", | |
| transformer=transformer, | |
| torch_dtype=torch.bfloat16, | |
| ) | |
| prompt = "a cute pony in a field of flowers" | |
| image = pipeline(prompt).images[0] | |
| image.save("auraflow.png") | |
| ``` | |
| ## Support for `torch.compile()` | |
| AuraFlow can be compiled with `torch.compile()` to speed up inference latency even for different resolutions. First, install PyTorch nightly following the instructions from [here](https://pytorch.org/). The snippet below shows the changes needed to enable this: | |
| ```diff | |
| + torch.fx.experimental._config.use_duck_shape = False | |
| + pipeline.transformer = torch.compile( | |
| pipeline.transformer, fullgraph=True, dynamic=True | |
| ) | |
| ``` | |
| Specifying `use_duck_shape` to be `False` instructs the compiler if it should use the same symbolic variable to represent input sizes that are the same. For more details, check out [this comment](https://github.com/huggingface/diffusers/pull/11327#discussion_r2047659790). | |
| This enables from 100% (on low resolutions) to a 30% (on 1536x1536 resolution) speed improvements. | |
| Thanks to [AstraliteHeart](https://github.com/huggingface/diffusers/pull/11297/) who helped us rewrite the [AuraFlowTransformer2DModel](/docs/diffusers/pr_12595/en/api/models/aura_flow_transformer2d#diffusers.AuraFlowTransformer2DModel) class so that the above works for different resolutions ([PR](https://github.com/huggingface/diffusers/pull/11297/)). | |
| ## AuraFlowPipeline[[diffusers.AuraFlowPipeline]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.AuraFlowPipeline</name><anchor>diffusers.AuraFlowPipeline</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/pipelines/aura_flow/pipeline_aura_flow.py#L123</source><parameters>[{"name": "tokenizer", "val": ": T5Tokenizer"}, {"name": "text_encoder", "val": ": UMT5EncoderModel"}, {"name": "vae", "val": ": AutoencoderKL"}, {"name": "transformer", "val": ": AuraFlowTransformer2DModel"}, {"name": "scheduler", "val": ": FlowMatchEulerDiscreteScheduler"}]</parameters><paramsdesc>- **tokenizer** (`T5TokenizerFast`) -- | |
| Tokenizer of class | |
| [T5Tokenizer](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Tokenizer). | |
| - **text_encoder** (`T5EncoderModel`) -- | |
| Frozen text-encoder. AuraFlow uses | |
| [T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5EncoderModel), specifically the | |
| [EleutherAI/pile-t5-xl](https://huggingface.co/EleutherAI/pile-t5-xl) variant. | |
| - **vae** ([AutoencoderKL](/docs/diffusers/pr_12595/en/api/models/autoencoderkl#diffusers.AutoencoderKL)) -- | |
| Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations. | |
| - **transformer** ([AuraFlowTransformer2DModel](/docs/diffusers/pr_12595/en/api/models/aura_flow_transformer2d#diffusers.AuraFlowTransformer2DModel)) -- | |
| Conditional Transformer (MMDiT and DiT) architecture to denoise the encoded image latents. | |
| - **scheduler** ([FlowMatchEulerDiscreteScheduler](/docs/diffusers/pr_12595/en/api/schedulers/flow_match_euler_discrete#diffusers.FlowMatchEulerDiscreteScheduler)) -- | |
| A scheduler to be used in combination with `transformer` to denoise the encoded image latents.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>__call__</name><anchor>diffusers.AuraFlowPipeline.__call__</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/pipelines/aura_flow/pipeline_aura_flow.py#L438</source><parameters>[{"name": "prompt", "val": ": typing.Union[str, typing.List[str]] = None"}, {"name": "negative_prompt", "val": ": typing.Union[str, typing.List[str]] = None"}, {"name": "num_inference_steps", "val": ": int = 50"}, {"name": "sigmas", "val": ": typing.List[float] = None"}, {"name": "guidance_scale", "val": ": float = 3.5"}, {"name": "num_images_per_prompt", "val": ": typing.Optional[int] = 1"}, {"name": "height", "val": ": typing.Optional[int] = 1024"}, {"name": "width", "val": ": typing.Optional[int] = 1024"}, {"name": "generator", "val": ": typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None"}, {"name": "latents", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "prompt_attention_mask", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_prompt_attention_mask", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "max_sequence_length", "val": ": int = 256"}, {"name": "output_type", "val": ": typing.Optional[str] = 'pil'"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "attention_kwargs", "val": ": typing.Optional[typing.Dict[str, typing.Any]] = None"}, {"name": "callback_on_step_end", "val": ": typing.Union[typing.Callable[[int, int, typing.Dict], NoneType], diffusers.callbacks.PipelineCallback, diffusers.callbacks.MultiPipelineCallbacks, NoneType] = None"}, {"name": "callback_on_step_end_tensor_inputs", "val": ": typing.List[str] = ['latents']"}]</parameters><paramsdesc>- **prompt** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`. | |
| instead. | |
| - **negative_prompt** (`str` or `List[str]`, *optional*) -- | |
| The prompt or prompts not to guide the image generation. If not defined, one has to pass | |
| `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is | |
| less than `1`). | |
| - **height** (`int`, *optional*, defaults to self.transformer.config.sample_size * self.vae_scale_factor) -- | |
| The height in pixels of the generated image. This is set to 1024 by default for best results. | |
| - **width** (`int`, *optional*, defaults to self.transformer.config.sample_size * self.vae_scale_factor) -- | |
| The width in pixels of the generated image. This is set to 1024 by default for best results. | |
| - **num_inference_steps** (`int`, *optional*, defaults to 50) -- | |
| The number of denoising steps. More denoising steps usually lead to a higher quality image at the | |
| expense of slower inference. | |
| - **sigmas** (`List[float]`, *optional*) -- | |
| Custom sigmas used to override the timestep spacing strategy of the scheduler. If `sigmas` is passed, | |
| `num_inference_steps` and `timesteps` must be `None`. | |
| - **guidance_scale** (`float`, *optional*, defaults to 5.0) -- | |
| Guidance scale as defined in [Classifier-Free Diffusion | |
| Guidance](https://huggingface.co/papers/2207.12598). `guidance_scale` is defined as `w` of equation 2. | |
| of [Imagen Paper](https://huggingface.co/papers/2205.11487). Guidance scale is enabled by setting | |
| `guidance_scale > 1`. Higher guidance scale encourages to generate images that are closely linked to | |
| the text `prompt`, usually at the expense of lower image quality. | |
| - **num_images_per_prompt** (`int`, *optional*, defaults to 1) -- | |
| The number of images to generate per prompt. | |
| - **generator** (`torch.Generator` or `List[torch.Generator]`, *optional*) -- | |
| One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html) | |
| to make generation deterministic. | |
| - **latents** (`torch.FloatTensor`, *optional*) -- | |
| Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image | |
| generation. Can be used to tweak the same generation with different prompts. If not provided, a latents | |
| tensor will be generated by sampling using the supplied random `generator`. | |
| - **prompt_embeds** (`torch.FloatTensor`, *optional*) -- | |
| Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not | |
| provided, text embeddings will be generated from `prompt` input argument. | |
| - **prompt_attention_mask** (`torch.Tensor`, *optional*) -- | |
| Pre-generated attention mask for text embeddings. | |
| - **negative_prompt_embeds** (`torch.FloatTensor`, *optional*) -- | |
| Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt | |
| weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input | |
| argument. | |
| - **negative_prompt_attention_mask** (`torch.Tensor`, *optional*) -- | |
| Pre-generated attention mask for negative text embeddings. | |
| - **output_type** (`str`, *optional*, defaults to `"pil"`) -- | |
| The output format of the generate image. Choose between | |
| [PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`. | |
| - **return_dict** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to return a `~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput` instead | |
| of a plain tuple. | |
| - **attention_kwargs** (`dict`, *optional*) -- | |
| A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under | |
| `self.processor` in | |
| [diffusers.models.attention_processor](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py). | |
| - **callback_on_step_end** (`Callable`, *optional*) -- | |
| A function that calls at the end of each denoising steps during the inference. The function is called | |
| with the following arguments: `callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, | |
| callback_kwargs: Dict)`. `callback_kwargs` will include a list of all tensors as specified by | |
| `callback_on_step_end_tensor_inputs`. | |
| - **callback_on_step_end_tensor_inputs** (`List`, *optional*) -- | |
| The list of tensor inputs for the `callback_on_step_end` function. The tensors specified in the list | |
| will be passed as `callback_kwargs` argument. You will only be able to include variables listed in the | |
| `._callback_tensor_inputs` attribute of your pipeline class. | |
| - **max_sequence_length** (`int` defaults to 256) -- Maximum sequence length to use with the `prompt`.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Function invoked when calling the pipeline for generation. | |
| <ExampleCodeBlock anchor="diffusers.AuraFlowPipeline.__call__.example"> | |
| Examples: | |
| ```py | |
| >>> import torch | |
| >>> from diffusers import AuraFlowPipeline | |
| >>> pipe = AuraFlowPipeline.from_pretrained("fal/AuraFlow", torch_dtype=torch.float16) | |
| >>> pipe = pipe.to("cuda") | |
| >>> prompt = "A cat holding a sign that says hello world" | |
| >>> image = pipe(prompt).images[0] | |
| >>> image.save("aura_flow.png") | |
| ``` | |
| </ExampleCodeBlock> | |
| Returns: [ImagePipelineOutput](/docs/diffusers/pr_12595/en/api/pipelines/ddim#diffusers.ImagePipelineOutput) or `tuple`: | |
| If `return_dict` is `True`, [ImagePipelineOutput](/docs/diffusers/pr_12595/en/api/pipelines/ddim#diffusers.ImagePipelineOutput) is returned, otherwise a `tuple` is returned | |
| where the first element is a list with the generated images. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>encode_prompt</name><anchor>diffusers.AuraFlowPipeline.encode_prompt</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/pipelines/aura_flow/pipeline_aura_flow.py#L232</source><parameters>[{"name": "prompt", "val": ": typing.Union[str, typing.List[str]]"}, {"name": "negative_prompt", "val": ": typing.Union[str, typing.List[str]] = None"}, {"name": "do_classifier_free_guidance", "val": ": bool = True"}, {"name": "num_images_per_prompt", "val": ": int = 1"}, {"name": "device", "val": ": typing.Optional[torch.device] = None"}, {"name": "prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_prompt_embeds", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "prompt_attention_mask", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "negative_prompt_attention_mask", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "max_sequence_length", "val": ": int = 256"}, {"name": "lora_scale", "val": ": typing.Optional[float] = None"}]</parameters><paramsdesc>- **prompt** (`str` or `List[str]`, *optional*) -- | |
| prompt to be encoded | |
| - **negative_prompt** (`str` or `List[str]`, *optional*) -- | |
| The prompt not to guide the image generation. If not defined, one has to pass `negative_prompt_embeds` | |
| instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is less than `1`). | |
| - **do_classifier_free_guidance** (`bool`, *optional*, defaults to `True`) -- | |
| whether to use classifier free guidance or not | |
| - **num_images_per_prompt** (`int`, *optional*, defaults to 1) -- | |
| number of images that should be generated per prompt | |
| - **device** -- (`torch.device`, *optional*): | |
| torch device to place the resulting embeddings on | |
| - **prompt_embeds** (`torch.Tensor`, *optional*) -- | |
| Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not | |
| provided, text embeddings will be generated from `prompt` input argument. | |
| - **prompt_attention_mask** (`torch.Tensor`, *optional*) -- | |
| Pre-generated attention mask for text embeddings. | |
| - **negative_prompt_embeds** (`torch.Tensor`, *optional*) -- | |
| Pre-generated negative text embeddings. | |
| - **negative_prompt_attention_mask** (`torch.Tensor`, *optional*) -- | |
| Pre-generated attention mask for negative text embeddings. | |
| - **max_sequence_length** (`int`, defaults to 256) -- Maximum sequence length to use for the prompt. | |
| - **lora_scale** (`float`, *optional*) -- | |
| A lora scale that will be applied to all LoRA layers of the text encoder if LoRA layers are loaded.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Encodes the prompt into text encoder hidden states. | |
| </div></div> | |
| <EditOnGithub source="https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/pipelines/aura_flow.md" /> |
Xet Storage Details
- Size:
- 16.7 kB
- Xet hash:
- 5d3cb1d6ed3f4541b3b678c6f77a5de141ad82f6503d6f32850c09dde37f278f
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.