| | --- |
| | license: other |
| | license_name: stabilityai-ai-community |
| | license_link: LICENSE.md |
| | tags: |
| | - text-to-image |
| | - stable-diffusion |
| | - diffusers |
| | inference: true |
| | language: |
| | - en |
| | pipeline_tag: text-to-image |
| | --- |
| | |
| | # Stable Diffusion 3.5 Large BF16 |
| |  |
| |
|
| | ## Model |
| |
|
| |  |
| |
|
| |
|
| | [Stable Diffusion 3.5 Large](https://stability.ai/news/introducing-stable-diffusion-3-5) is a Multimodal Diffusion Transformer (MMDiT) text-to-image model that features improved performance in image quality, typography, complex prompt understanding, and resource-efficiency. |
| |
|
| | Please note: This model is released under the [Stability Community License](https://stability.ai/community-license-agreement). Visit [Stability AI](https://stability.ai/license) to learn or [contact us](https://stability.ai/enterprise) for commercial licensing details. |
| |
|
| |
|
| | ### Model Description |
| |
|
| | - **Developed by:** Stability AI |
| | - **Model type:** MMDiT text-to-image generative model |
| | - **Model Description:** This model generates images based on text prompts. It is a [Multimodal Diffusion Transformer](https://arxiv.org/abs/2403.03206) that use three fixed, pretrained text encoders, and with QK-normalization to improve training stability. |
| |
|
| | ### License |
| |
|
| | - **Community License:** Free for research, non-commercial, and commercial use for organizations or individuals with less than $1M in total annual revenue. More details can be found in the [Community License Agreement](https://stability.ai/community-license-agreement). Read more at https://stability.ai/license. |
| | - **For individuals and organizations with annual revenue above $1M**: please [contact us](https://stability.ai/enterprise) to get an Enterprise License. |
| |
|
| | ### Model Sources |
| |
|
| | For local or self-hosted use, we recommend [ComfyUI](https://github.com/comfyanonymous/ComfyUI) for node-based UI inference, or [diffusers](https://github.com/huggingface/diffusers) or [GitHub](https://github.com/Stability-AI/sd3.5) for programmatic use. |
| |
|
| | - **ComfyUI:** [Github](https://github.com/comfyanonymous/ComfyUI), [Example Workflow](https://comfyanonymous.github.io/ComfyUI_examples/sd3/) |
| | - **Huggingface Space:** [Space](https://huggingface.co/spaces/stabilityai/stable-diffusion-3.5-large) |
| | - **Diffusers**: [See below](#using-with-diffusers). |
| | - **GitHub**: [GitHub](https://github.com/Stability-AI/sd3.5). |
| |
|
| | - **API Endpoints:** |
| | - [Stability AI API](https://platform.stability.ai/docs/api-reference#tag/Generate/paths/~1v2beta~1stable-image~1generate~1sd3/post) |
| | - [Replicate](https://replicate.com/stability-ai/stable-diffusion-3.5-large) |
| | - [Deepinfra](https://deepinfra.com/stabilityai/sd3.5) |
| |
|
| |
|
| | ### Implementation Details |
| |
|
| | - **QK Normalization:** Implements the QK normalization technique to improve training Stability. |
| |
|
| | - **Text Encoders:** |
| | - CLIPs: [OpenCLIP-ViT/G](https://github.com/mlfoundations/open_clip), [CLIP-ViT/L](https://github.com/openai/CLIP/tree/main), context length 77 tokens |
| | - T5: [T5-xxl](https://huggingface.co/google/t5-v1_1-xxl), context length 77/256 tokens at different stages of training |
| |
|
| | - **Training Data and Strategy:** |
| | |
| | This model was trained on a wide variety of data, including synthetic data and filtered publicly available data. |
| | |
| | For more technical details of the original MMDiT architecture, please refer to the [Research paper](https://stability.ai/news/stable-diffusion-3-research-paper). |
| |
|
| |
|
| | ### Model Performance |
| |
|
| | See [blog](https://stability.ai/news/introducing-stable-diffusion-3-5) for our study about comparative performance in prompt adherence and aesthetic quality. |
| |
|
| | ## Using with Diffusers |
| | Upgrade to the latest version of the [🧨 diffusers library](https://github.com/huggingface/diffusers) |
| | ``` |
| | pip install -U diffusers |
| | ``` |
| |
|
| | and then you can run |
| | ```py |
| | import torch |
| | from diffusers import StableDiffusion3Pipeline |
| | |
| | pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16) |
| | pipe = pipe.to("cuda") |
| | |
| | image = pipe( |
| | "A capybara holding a sign that reads Hello World", |
| | num_inference_steps=28, |
| | guidance_scale=3.5, |
| | ).images[0] |
| | image.save("capybara.png") |
| | ``` |
| |
|
| | ### Contact |
| |
|
| | Please report any issues with the model or contact us: |
| |
|
| | * Safety issues: safety@stability.ai |
| | * Security issues: security@stability.ai |
| | * Privacy issues: privacy@stability.ai |
| | * License and general: https://stability.ai/license |
| | * Enterprise license: https://stability.ai/enterprise |
| |
|
| |
|