Add files using upload-large-folder tool
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- diffusers/.github/ISSUE_TEMPLATE/bug-report.yml +110 -0
- diffusers/.github/ISSUE_TEMPLATE/feature_request.md +20 -0
- diffusers/.github/ISSUE_TEMPLATE/new-model-addition.yml +31 -0
- diffusers/.github/ISSUE_TEMPLATE/remote-vae-pilot-feedback.yml +38 -0
- diffusers/.github/actions/setup-miniconda/action.yml +146 -0
- diffusers/.github/workflows/benchmark.yml +89 -0
- diffusers/.github/workflows/build_docker_images.yml +107 -0
- diffusers/.github/workflows/build_documentation.yml +27 -0
- diffusers/.github/workflows/build_pr_documentation.yml +23 -0
- diffusers/.github/workflows/mirror_community_pipeline.yml +102 -0
- diffusers/.github/workflows/nightly_tests.yml +612 -0
- diffusers/.github/workflows/notify_slack_about_release.yml +23 -0
- diffusers/.github/workflows/pr_dependency_test.yml +35 -0
- diffusers/.github/workflows/pr_flax_dependency_test.yml +38 -0
- diffusers/.github/workflows/pr_style_bot.yml +17 -0
- diffusers/.github/workflows/pr_test_fetcher.yml +177 -0
- diffusers/.github/workflows/pr_tests.yml +289 -0
- diffusers/.github/workflows/pr_tests_gpu.yml +296 -0
- diffusers/.github/workflows/pr_torch_dependency_test.yml +36 -0
- diffusers/.github/workflows/push_tests.yml +294 -0
- diffusers/.github/workflows/push_tests_fast.yml +98 -0
- diffusers/.github/workflows/push_tests_mps.yml +71 -0
- diffusers/.github/workflows/pypi_publish.yaml +81 -0
- diffusers/.github/workflows/release_tests_fast.yml +351 -0
- diffusers/.github/workflows/run_tests_from_a_pr.yml +74 -0
- diffusers/.github/workflows/ssh-pr-runner.yml +40 -0
- diffusers/.github/workflows/ssh-runner.yml +52 -0
- diffusers/.github/workflows/stale.yml +30 -0
- diffusers/.github/workflows/trufflehog.yml +18 -0
- diffusers/.github/workflows/typos.yml +14 -0
- diffusers/.github/workflows/update_metadata.yml +30 -0
- diffusers/.github/workflows/upload_pr_documentation.yml +16 -0
- diffusers/docs/source/_config.py +9 -0
- diffusers/docs/source/en/_toctree.yml +701 -0
- diffusers/docs/source/en/community_projects.md +90 -0
- diffusers/docs/source/en/conceptual/contribution.md +568 -0
- diffusers/docs/source/en/conceptual/ethical_guidelines.md +63 -0
- diffusers/docs/source/en/conceptual/evaluation.md +578 -0
- diffusers/docs/source/en/conceptual/philosophy.md +110 -0
- diffusers/docs/source/en/index.md +48 -0
- diffusers/docs/source/en/installation.md +194 -0
- diffusers/docs/source/en/quicktour.md +323 -0
- diffusers/docs/source/en/stable_diffusion.md +261 -0
- diffusers/docs/source/en/using-diffusers/conditional_image_generation.md +316 -0
- diffusers/docs/source/en/using-diffusers/consisid.md +96 -0
- diffusers/docs/source/en/using-diffusers/controlling_generation.md +217 -0
- diffusers/docs/source/en/using-diffusers/depth2img.md +46 -0
- diffusers/docs/source/en/using-diffusers/ip_adapter.md +790 -0
- diffusers/docs/source/en/using-diffusers/loading.md +583 -0
- diffusers/docs/source/en/using-diffusers/other-formats.md +512 -0
diffusers/.github/ISSUE_TEMPLATE/bug-report.yml
ADDED
|
@@ -0,0 +1,110 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: "\U0001F41B Bug Report"
|
| 2 |
+
description: Report a bug on Diffusers
|
| 3 |
+
labels: [ "bug" ]
|
| 4 |
+
body:
|
| 5 |
+
- type: markdown
|
| 6 |
+
attributes:
|
| 7 |
+
value: |
|
| 8 |
+
Thanks a lot for taking the time to file this issue 🤗.
|
| 9 |
+
Issues do not only help to improve the library, but also publicly document common problems, questions, workflows for the whole community!
|
| 10 |
+
Thus, issues are of the same importance as pull requests when contributing to this library ❤️.
|
| 11 |
+
In order to make your issue as **useful for the community as possible**, let's try to stick to some simple guidelines:
|
| 12 |
+
- 1. Please try to be as precise and concise as possible.
|
| 13 |
+
*Give your issue a fitting title. Assume that someone which very limited knowledge of Diffusers can understand your issue. Add links to the source code, documentation other issues, pull requests etc...*
|
| 14 |
+
- 2. If your issue is about something not working, **always** provide a reproducible code snippet. The reader should be able to reproduce your issue by **only copy-pasting your code snippet into a Python shell**.
|
| 15 |
+
*The community cannot solve your issue if it cannot reproduce it. If your bug is related to training, add your training script and make everything needed to train public. Otherwise, just add a simple Python code snippet.*
|
| 16 |
+
- 3. Add the **minimum** amount of code / context that is needed to understand, reproduce your issue.
|
| 17 |
+
*Make the life of maintainers easy. `diffusers` is getting many issues every day. Make sure your issue is about one bug and one bug only. Make sure you add only the context, code needed to understand your issues - nothing more. Generally, every issue is a way of documenting this library, try to make it a good documentation entry.*
|
| 18 |
+
- 4. For issues related to community pipelines (i.e., the pipelines located in the `examples/community` folder), please tag the author of the pipeline in your issue thread as those pipelines are not maintained.
|
| 19 |
+
- type: markdown
|
| 20 |
+
attributes:
|
| 21 |
+
value: |
|
| 22 |
+
For more in-detail information on how to write good issues you can have a look [here](https://huggingface.co/course/chapter8/5?fw=pt).
|
| 23 |
+
- type: textarea
|
| 24 |
+
id: bug-description
|
| 25 |
+
attributes:
|
| 26 |
+
label: Describe the bug
|
| 27 |
+
description: A clear and concise description of what the bug is. If you intend to submit a pull request for this issue, tell us in the description. Thanks!
|
| 28 |
+
placeholder: Bug description
|
| 29 |
+
validations:
|
| 30 |
+
required: true
|
| 31 |
+
- type: textarea
|
| 32 |
+
id: reproduction
|
| 33 |
+
attributes:
|
| 34 |
+
label: Reproduction
|
| 35 |
+
description: Please provide a minimal reproducible code which we can copy/paste and reproduce the issue.
|
| 36 |
+
placeholder: Reproduction
|
| 37 |
+
validations:
|
| 38 |
+
required: true
|
| 39 |
+
- type: textarea
|
| 40 |
+
id: logs
|
| 41 |
+
attributes:
|
| 42 |
+
label: Logs
|
| 43 |
+
description: "Please include the Python logs if you can."
|
| 44 |
+
render: shell
|
| 45 |
+
- type: textarea
|
| 46 |
+
id: system-info
|
| 47 |
+
attributes:
|
| 48 |
+
label: System Info
|
| 49 |
+
description: Please share your system info with us. You can run the command `diffusers-cli env` and copy-paste its output below.
|
| 50 |
+
placeholder: Diffusers version, platform, Python version, ...
|
| 51 |
+
validations:
|
| 52 |
+
required: true
|
| 53 |
+
- type: textarea
|
| 54 |
+
id: who-can-help
|
| 55 |
+
attributes:
|
| 56 |
+
label: Who can help?
|
| 57 |
+
description: |
|
| 58 |
+
Your issue will be replied to more quickly if you can figure out the right person to tag with @.
|
| 59 |
+
If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
|
| 60 |
+
|
| 61 |
+
All issues are read by one of the core maintainers, so if you don't know who to tag, just leave this blank and
|
| 62 |
+
a core maintainer will ping the right person.
|
| 63 |
+
|
| 64 |
+
Please tag a maximum of 2 people.
|
| 65 |
+
|
| 66 |
+
Questions on DiffusionPipeline (Saving, Loading, From pretrained, ...): @sayakpaul @DN6
|
| 67 |
+
|
| 68 |
+
Questions on pipelines:
|
| 69 |
+
- Stable Diffusion @yiyixuxu @asomoza
|
| 70 |
+
- Stable Diffusion XL @yiyixuxu @sayakpaul @DN6
|
| 71 |
+
- Stable Diffusion 3: @yiyixuxu @sayakpaul @DN6 @asomoza
|
| 72 |
+
- Kandinsky @yiyixuxu
|
| 73 |
+
- ControlNet @sayakpaul @yiyixuxu @DN6
|
| 74 |
+
- T2I Adapter @sayakpaul @yiyixuxu @DN6
|
| 75 |
+
- IF @DN6
|
| 76 |
+
- Text-to-Video / Video-to-Video @DN6 @a-r-r-o-w
|
| 77 |
+
- Wuerstchen @DN6
|
| 78 |
+
- Other: @yiyixuxu @DN6
|
| 79 |
+
- Improving generation quality: @asomoza
|
| 80 |
+
|
| 81 |
+
Questions on models:
|
| 82 |
+
- UNet @DN6 @yiyixuxu @sayakpaul
|
| 83 |
+
- VAE @sayakpaul @DN6 @yiyixuxu
|
| 84 |
+
- Transformers/Attention @DN6 @yiyixuxu @sayakpaul
|
| 85 |
+
|
| 86 |
+
Questions on single file checkpoints: @DN6
|
| 87 |
+
|
| 88 |
+
Questions on Schedulers: @yiyixuxu
|
| 89 |
+
|
| 90 |
+
Questions on LoRA: @sayakpaul
|
| 91 |
+
|
| 92 |
+
Questions on Textual Inversion: @sayakpaul
|
| 93 |
+
|
| 94 |
+
Questions on Training:
|
| 95 |
+
- DreamBooth @sayakpaul
|
| 96 |
+
- Text-to-Image Fine-tuning @sayakpaul
|
| 97 |
+
- Textual Inversion @sayakpaul
|
| 98 |
+
- ControlNet @sayakpaul
|
| 99 |
+
|
| 100 |
+
Questions on Tests: @DN6 @sayakpaul @yiyixuxu
|
| 101 |
+
|
| 102 |
+
Questions on Documentation: @stevhliu
|
| 103 |
+
|
| 104 |
+
Questions on JAX- and MPS-related things: @pcuenca
|
| 105 |
+
|
| 106 |
+
Questions on audio pipelines: @sanchit-gandhi
|
| 107 |
+
|
| 108 |
+
|
| 109 |
+
|
| 110 |
+
placeholder: "@Username ..."
|
diffusers/.github/ISSUE_TEMPLATE/feature_request.md
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
name: "\U0001F680 Feature Request"
|
| 3 |
+
about: Suggest an idea for this project
|
| 4 |
+
title: ''
|
| 5 |
+
labels: ''
|
| 6 |
+
assignees: ''
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
**Is your feature request related to a problem? Please describe.**
|
| 11 |
+
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...].
|
| 12 |
+
|
| 13 |
+
**Describe the solution you'd like.**
|
| 14 |
+
A clear and concise description of what you want to happen.
|
| 15 |
+
|
| 16 |
+
**Describe alternatives you've considered.**
|
| 17 |
+
A clear and concise description of any alternative solutions or features you've considered.
|
| 18 |
+
|
| 19 |
+
**Additional context.**
|
| 20 |
+
Add any other context or screenshots about the feature request here.
|
diffusers/.github/ISSUE_TEMPLATE/new-model-addition.yml
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: "\U0001F31F New Model/Pipeline/Scheduler Addition"
|
| 2 |
+
description: Submit a proposal/request to implement a new diffusion model/pipeline/scheduler
|
| 3 |
+
labels: [ "New model/pipeline/scheduler" ]
|
| 4 |
+
|
| 5 |
+
body:
|
| 6 |
+
- type: textarea
|
| 7 |
+
id: description-request
|
| 8 |
+
validations:
|
| 9 |
+
required: true
|
| 10 |
+
attributes:
|
| 11 |
+
label: Model/Pipeline/Scheduler description
|
| 12 |
+
description: |
|
| 13 |
+
Put any and all important information relative to the model/pipeline/scheduler
|
| 14 |
+
|
| 15 |
+
- type: checkboxes
|
| 16 |
+
id: information-tasks
|
| 17 |
+
attributes:
|
| 18 |
+
label: Open source status
|
| 19 |
+
description: |
|
| 20 |
+
Please note that if the model implementation isn't available or if the weights aren't open-source, we are less likely to implement it in `diffusers`.
|
| 21 |
+
options:
|
| 22 |
+
- label: "The model implementation is available."
|
| 23 |
+
- label: "The model weights are available (Only relevant if addition is not a scheduler)."
|
| 24 |
+
|
| 25 |
+
- type: textarea
|
| 26 |
+
id: additional-info
|
| 27 |
+
attributes:
|
| 28 |
+
label: Provide useful links for the implementation
|
| 29 |
+
description: |
|
| 30 |
+
Please provide information regarding the implementation, the weights, and the authors.
|
| 31 |
+
Please mention the authors by @gh-username if you're aware of their usernames.
|
diffusers/.github/ISSUE_TEMPLATE/remote-vae-pilot-feedback.yml
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: "\U0001F31F Remote VAE"
|
| 2 |
+
description: Feedback for remote VAE pilot
|
| 3 |
+
labels: [ "Remote VAE" ]
|
| 4 |
+
|
| 5 |
+
body:
|
| 6 |
+
- type: textarea
|
| 7 |
+
id: positive
|
| 8 |
+
validations:
|
| 9 |
+
required: true
|
| 10 |
+
attributes:
|
| 11 |
+
label: Did you like the remote VAE solution?
|
| 12 |
+
description: |
|
| 13 |
+
If you liked it, we would appreciate it if you could elaborate what you liked.
|
| 14 |
+
|
| 15 |
+
- type: textarea
|
| 16 |
+
id: feedback
|
| 17 |
+
validations:
|
| 18 |
+
required: true
|
| 19 |
+
attributes:
|
| 20 |
+
label: What can be improved about the current solution?
|
| 21 |
+
description: |
|
| 22 |
+
Let us know the things you would like to see improved. Note that we will work optimizing the solution once the pilot is over and we have usage.
|
| 23 |
+
|
| 24 |
+
- type: textarea
|
| 25 |
+
id: others
|
| 26 |
+
validations:
|
| 27 |
+
required: true
|
| 28 |
+
attributes:
|
| 29 |
+
label: What other VAEs you would like to see if the pilot goes well?
|
| 30 |
+
description: |
|
| 31 |
+
Provide a list of the VAEs you would like to see in the future if the pilot goes well.
|
| 32 |
+
|
| 33 |
+
- type: textarea
|
| 34 |
+
id: additional-info
|
| 35 |
+
attributes:
|
| 36 |
+
label: Notify the members of the team
|
| 37 |
+
description: |
|
| 38 |
+
Tag the following folks when submitting this feedback: @hlky @sayakpaul
|
diffusers/.github/actions/setup-miniconda/action.yml
ADDED
|
@@ -0,0 +1,146 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Set up conda environment for testing
|
| 2 |
+
|
| 3 |
+
description: Sets up miniconda in your ${RUNNER_TEMP} environment and gives you the ${CONDA_RUN} environment variable so you don't have to worry about polluting non-empeheral runners anymore
|
| 4 |
+
|
| 5 |
+
inputs:
|
| 6 |
+
python-version:
|
| 7 |
+
description: If set to any value, don't use sudo to clean the workspace
|
| 8 |
+
required: false
|
| 9 |
+
type: string
|
| 10 |
+
default: "3.9"
|
| 11 |
+
miniconda-version:
|
| 12 |
+
description: Miniconda version to install
|
| 13 |
+
required: false
|
| 14 |
+
type: string
|
| 15 |
+
default: "4.12.0"
|
| 16 |
+
environment-file:
|
| 17 |
+
description: Environment file to install dependencies from
|
| 18 |
+
required: false
|
| 19 |
+
type: string
|
| 20 |
+
default: ""
|
| 21 |
+
|
| 22 |
+
runs:
|
| 23 |
+
using: composite
|
| 24 |
+
steps:
|
| 25 |
+
# Use the same trick from https://github.com/marketplace/actions/setup-miniconda
|
| 26 |
+
# to refresh the cache daily. This is kind of optional though
|
| 27 |
+
- name: Get date
|
| 28 |
+
id: get-date
|
| 29 |
+
shell: bash
|
| 30 |
+
run: echo "today=$(/bin/date -u '+%Y%m%d')d" >> $GITHUB_OUTPUT
|
| 31 |
+
- name: Setup miniconda cache
|
| 32 |
+
id: miniconda-cache
|
| 33 |
+
uses: actions/cache@v2
|
| 34 |
+
with:
|
| 35 |
+
path: ${{ runner.temp }}/miniconda
|
| 36 |
+
key: miniconda-${{ runner.os }}-${{ runner.arch }}-${{ inputs.python-version }}-${{ steps.get-date.outputs.today }}
|
| 37 |
+
- name: Install miniconda (${{ inputs.miniconda-version }})
|
| 38 |
+
if: steps.miniconda-cache.outputs.cache-hit != 'true'
|
| 39 |
+
env:
|
| 40 |
+
MINICONDA_VERSION: ${{ inputs.miniconda-version }}
|
| 41 |
+
shell: bash -l {0}
|
| 42 |
+
run: |
|
| 43 |
+
MINICONDA_INSTALL_PATH="${RUNNER_TEMP}/miniconda"
|
| 44 |
+
mkdir -p "${MINICONDA_INSTALL_PATH}"
|
| 45 |
+
case ${RUNNER_OS}-${RUNNER_ARCH} in
|
| 46 |
+
Linux-X64)
|
| 47 |
+
MINICONDA_ARCH="Linux-x86_64"
|
| 48 |
+
;;
|
| 49 |
+
macOS-ARM64)
|
| 50 |
+
MINICONDA_ARCH="MacOSX-arm64"
|
| 51 |
+
;;
|
| 52 |
+
macOS-X64)
|
| 53 |
+
MINICONDA_ARCH="MacOSX-x86_64"
|
| 54 |
+
;;
|
| 55 |
+
*)
|
| 56 |
+
echo "::error::Platform ${RUNNER_OS}-${RUNNER_ARCH} currently unsupported using this action"
|
| 57 |
+
exit 1
|
| 58 |
+
;;
|
| 59 |
+
esac
|
| 60 |
+
MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-py39_${MINICONDA_VERSION}-${MINICONDA_ARCH}.sh"
|
| 61 |
+
curl -fsSL "${MINICONDA_URL}" -o "${MINICONDA_INSTALL_PATH}/miniconda.sh"
|
| 62 |
+
bash "${MINICONDA_INSTALL_PATH}/miniconda.sh" -b -u -p "${MINICONDA_INSTALL_PATH}"
|
| 63 |
+
rm -rf "${MINICONDA_INSTALL_PATH}/miniconda.sh"
|
| 64 |
+
- name: Update GitHub path to include miniconda install
|
| 65 |
+
shell: bash
|
| 66 |
+
run: |
|
| 67 |
+
MINICONDA_INSTALL_PATH="${RUNNER_TEMP}/miniconda"
|
| 68 |
+
echo "${MINICONDA_INSTALL_PATH}/bin" >> $GITHUB_PATH
|
| 69 |
+
- name: Setup miniconda env cache (with env file)
|
| 70 |
+
id: miniconda-env-cache-env-file
|
| 71 |
+
if: ${{ runner.os }} == 'macOS' && ${{ inputs.environment-file }} != ''
|
| 72 |
+
uses: actions/cache@v2
|
| 73 |
+
with:
|
| 74 |
+
path: ${{ runner.temp }}/conda-python-${{ inputs.python-version }}
|
| 75 |
+
key: miniconda-env-${{ runner.os }}-${{ runner.arch }}-${{ inputs.python-version }}-${{ steps.get-date.outputs.today }}-${{ hashFiles(inputs.environment-file) }}
|
| 76 |
+
- name: Setup miniconda env cache (without env file)
|
| 77 |
+
id: miniconda-env-cache
|
| 78 |
+
if: ${{ runner.os }} == 'macOS' && ${{ inputs.environment-file }} == ''
|
| 79 |
+
uses: actions/cache@v2
|
| 80 |
+
with:
|
| 81 |
+
path: ${{ runner.temp }}/conda-python-${{ inputs.python-version }}
|
| 82 |
+
key: miniconda-env-${{ runner.os }}-${{ runner.arch }}-${{ inputs.python-version }}-${{ steps.get-date.outputs.today }}
|
| 83 |
+
- name: Setup conda environment with python (v${{ inputs.python-version }})
|
| 84 |
+
if: steps.miniconda-env-cache-env-file.outputs.cache-hit != 'true' && steps.miniconda-env-cache.outputs.cache-hit != 'true'
|
| 85 |
+
shell: bash
|
| 86 |
+
env:
|
| 87 |
+
PYTHON_VERSION: ${{ inputs.python-version }}
|
| 88 |
+
ENV_FILE: ${{ inputs.environment-file }}
|
| 89 |
+
run: |
|
| 90 |
+
CONDA_BASE_ENV="${RUNNER_TEMP}/conda-python-${PYTHON_VERSION}"
|
| 91 |
+
ENV_FILE_FLAG=""
|
| 92 |
+
if [[ -f "${ENV_FILE}" ]]; then
|
| 93 |
+
ENV_FILE_FLAG="--file ${ENV_FILE}"
|
| 94 |
+
elif [[ -n "${ENV_FILE}" ]]; then
|
| 95 |
+
echo "::warning::Specified env file (${ENV_FILE}) not found, not going to include it"
|
| 96 |
+
fi
|
| 97 |
+
conda create \
|
| 98 |
+
--yes \
|
| 99 |
+
--prefix "${CONDA_BASE_ENV}" \
|
| 100 |
+
"python=${PYTHON_VERSION}" \
|
| 101 |
+
${ENV_FILE_FLAG} \
|
| 102 |
+
cmake=3.22 \
|
| 103 |
+
conda-build=3.21 \
|
| 104 |
+
ninja=1.10 \
|
| 105 |
+
pkg-config=0.29 \
|
| 106 |
+
wheel=0.37
|
| 107 |
+
- name: Clone the base conda environment and update GitHub env
|
| 108 |
+
shell: bash
|
| 109 |
+
env:
|
| 110 |
+
PYTHON_VERSION: ${{ inputs.python-version }}
|
| 111 |
+
CONDA_BASE_ENV: ${{ runner.temp }}/conda-python-${{ inputs.python-version }}
|
| 112 |
+
run: |
|
| 113 |
+
CONDA_ENV="${RUNNER_TEMP}/conda_environment_${GITHUB_RUN_ID}"
|
| 114 |
+
conda create \
|
| 115 |
+
--yes \
|
| 116 |
+
--prefix "${CONDA_ENV}" \
|
| 117 |
+
--clone "${CONDA_BASE_ENV}"
|
| 118 |
+
# TODO: conda-build could not be cloned because it hardcodes the path, so it
|
| 119 |
+
# could not be cached
|
| 120 |
+
conda install --yes -p ${CONDA_ENV} conda-build=3.21
|
| 121 |
+
echo "CONDA_ENV=${CONDA_ENV}" >> "${GITHUB_ENV}"
|
| 122 |
+
echo "CONDA_RUN=conda run -p ${CONDA_ENV} --no-capture-output" >> "${GITHUB_ENV}"
|
| 123 |
+
echo "CONDA_BUILD=conda run -p ${CONDA_ENV} conda-build" >> "${GITHUB_ENV}"
|
| 124 |
+
echo "CONDA_INSTALL=conda install -p ${CONDA_ENV}" >> "${GITHUB_ENV}"
|
| 125 |
+
- name: Get disk space usage and throw an error for low disk space
|
| 126 |
+
shell: bash
|
| 127 |
+
run: |
|
| 128 |
+
echo "Print the available disk space for manual inspection"
|
| 129 |
+
df -h
|
| 130 |
+
# Set the minimum requirement space to 4GB
|
| 131 |
+
MINIMUM_AVAILABLE_SPACE_IN_GB=4
|
| 132 |
+
MINIMUM_AVAILABLE_SPACE_IN_KB=$(($MINIMUM_AVAILABLE_SPACE_IN_GB * 1024 * 1024))
|
| 133 |
+
# Use KB to avoid floating point warning like 3.1GB
|
| 134 |
+
df -k | tr -s ' ' | cut -d' ' -f 4,9 | while read -r LINE;
|
| 135 |
+
do
|
| 136 |
+
AVAIL=$(echo $LINE | cut -f1 -d' ')
|
| 137 |
+
MOUNT=$(echo $LINE | cut -f2 -d' ')
|
| 138 |
+
if [ "$MOUNT" = "/" ]; then
|
| 139 |
+
if [ "$AVAIL" -lt "$MINIMUM_AVAILABLE_SPACE_IN_KB" ]; then
|
| 140 |
+
echo "There is only ${AVAIL}KB free space left in $MOUNT, which is less than the minimum requirement of ${MINIMUM_AVAILABLE_SPACE_IN_KB}KB. Please help create an issue to PyTorch Release Engineering via https://github.com/pytorch/test-infra/issues and provide the link to the workflow run."
|
| 141 |
+
exit 1;
|
| 142 |
+
else
|
| 143 |
+
echo "There is ${AVAIL}KB free space left in $MOUNT, continue"
|
| 144 |
+
fi
|
| 145 |
+
fi
|
| 146 |
+
done
|
diffusers/.github/workflows/benchmark.yml
ADDED
|
@@ -0,0 +1,89 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Benchmarking tests
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
workflow_dispatch:
|
| 5 |
+
schedule:
|
| 6 |
+
- cron: "30 1 1,15 * *" # every 2 weeks on the 1st and the 15th of every month at 1:30 AM
|
| 7 |
+
|
| 8 |
+
env:
|
| 9 |
+
DIFFUSERS_IS_CI: yes
|
| 10 |
+
HF_HUB_ENABLE_HF_TRANSFER: 1
|
| 11 |
+
HF_HOME: /mnt/cache
|
| 12 |
+
OMP_NUM_THREADS: 8
|
| 13 |
+
MKL_NUM_THREADS: 8
|
| 14 |
+
BASE_PATH: benchmark_outputs
|
| 15 |
+
|
| 16 |
+
jobs:
|
| 17 |
+
torch_models_cuda_benchmark_tests:
|
| 18 |
+
env:
|
| 19 |
+
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL_BENCHMARK }}
|
| 20 |
+
name: Torch Core Models CUDA Benchmarking Tests
|
| 21 |
+
strategy:
|
| 22 |
+
fail-fast: false
|
| 23 |
+
max-parallel: 1
|
| 24 |
+
runs-on:
|
| 25 |
+
group: aws-g6e-4xlarge
|
| 26 |
+
container:
|
| 27 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 28 |
+
options: --shm-size "16gb" --ipc host --gpus 0
|
| 29 |
+
steps:
|
| 30 |
+
- name: Checkout diffusers
|
| 31 |
+
uses: actions/checkout@v3
|
| 32 |
+
with:
|
| 33 |
+
fetch-depth: 2
|
| 34 |
+
- name: NVIDIA-SMI
|
| 35 |
+
run: |
|
| 36 |
+
nvidia-smi
|
| 37 |
+
- name: Install dependencies
|
| 38 |
+
run: |
|
| 39 |
+
apt update
|
| 40 |
+
apt install -y libpq-dev postgresql-client
|
| 41 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 42 |
+
python -m uv pip install -e [quality,test]
|
| 43 |
+
python -m uv pip install -r benchmarks/requirements.txt
|
| 44 |
+
- name: Environment
|
| 45 |
+
run: |
|
| 46 |
+
python utils/print_env.py
|
| 47 |
+
- name: Diffusers Benchmarking
|
| 48 |
+
env:
|
| 49 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 50 |
+
run: |
|
| 51 |
+
cd benchmarks && python run_all.py
|
| 52 |
+
|
| 53 |
+
- name: Push results to the Hub
|
| 54 |
+
env:
|
| 55 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_BOT_TOKEN }}
|
| 56 |
+
run: |
|
| 57 |
+
cd benchmarks && python push_results.py
|
| 58 |
+
mkdir $BASE_PATH && cp *.csv $BASE_PATH
|
| 59 |
+
|
| 60 |
+
- name: Test suite reports artifacts
|
| 61 |
+
if: ${{ always() }}
|
| 62 |
+
uses: actions/upload-artifact@v4
|
| 63 |
+
with:
|
| 64 |
+
name: benchmark_test_reports
|
| 65 |
+
path: benchmarks/${{ env.BASE_PATH }}
|
| 66 |
+
|
| 67 |
+
# TODO: enable this once the connection problem has been resolved.
|
| 68 |
+
- name: Update benchmarking results to DB
|
| 69 |
+
env:
|
| 70 |
+
PGDATABASE: metrics
|
| 71 |
+
PGHOST: ${{ secrets.DIFFUSERS_BENCHMARKS_PGHOST }}
|
| 72 |
+
PGUSER: transformers_benchmarks
|
| 73 |
+
PGPASSWORD: ${{ secrets.DIFFUSERS_BENCHMARKS_PGPASSWORD }}
|
| 74 |
+
BRANCH_NAME: ${{ github.head_ref || github.ref_name }}
|
| 75 |
+
run: |
|
| 76 |
+
git config --global --add safe.directory /__w/diffusers/diffusers
|
| 77 |
+
commit_id=$GITHUB_SHA
|
| 78 |
+
commit_msg=$(git show -s --format=%s "$commit_id" | cut -c1-70)
|
| 79 |
+
cd benchmarks && python populate_into_db.py "$BRANCH_NAME" "$commit_id" "$commit_msg"
|
| 80 |
+
|
| 81 |
+
- name: Report success status
|
| 82 |
+
if: ${{ success() }}
|
| 83 |
+
run: |
|
| 84 |
+
pip install requests && python utils/notify_benchmarking_status.py --status=success
|
| 85 |
+
|
| 86 |
+
- name: Report failure status
|
| 87 |
+
if: ${{ failure() }}
|
| 88 |
+
run: |
|
| 89 |
+
pip install requests && python utils/notify_benchmarking_status.py --status=failure
|
diffusers/.github/workflows/build_docker_images.yml
ADDED
|
@@ -0,0 +1,107 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Test, build, and push Docker images
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
pull_request: # During PRs, we just check if the changes Dockerfiles can be successfully built
|
| 5 |
+
branches:
|
| 6 |
+
- main
|
| 7 |
+
paths:
|
| 8 |
+
- "docker/**"
|
| 9 |
+
workflow_dispatch:
|
| 10 |
+
schedule:
|
| 11 |
+
- cron: "0 0 * * *" # every day at midnight
|
| 12 |
+
|
| 13 |
+
concurrency:
|
| 14 |
+
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
|
| 15 |
+
cancel-in-progress: true
|
| 16 |
+
|
| 17 |
+
env:
|
| 18 |
+
REGISTRY: diffusers
|
| 19 |
+
CI_SLACK_CHANNEL: ${{ secrets.CI_DOCKER_CHANNEL }}
|
| 20 |
+
|
| 21 |
+
jobs:
|
| 22 |
+
test-build-docker-images:
|
| 23 |
+
runs-on:
|
| 24 |
+
group: aws-general-8-plus
|
| 25 |
+
if: github.event_name == 'pull_request'
|
| 26 |
+
steps:
|
| 27 |
+
- name: Set up Docker Buildx
|
| 28 |
+
uses: docker/setup-buildx-action@v1
|
| 29 |
+
|
| 30 |
+
- name: Check out code
|
| 31 |
+
uses: actions/checkout@v3
|
| 32 |
+
|
| 33 |
+
- name: Find Changed Dockerfiles
|
| 34 |
+
id: file_changes
|
| 35 |
+
uses: jitterbit/get-changed-files@v1
|
| 36 |
+
with:
|
| 37 |
+
format: "space-delimited"
|
| 38 |
+
token: ${{ secrets.GITHUB_TOKEN }}
|
| 39 |
+
|
| 40 |
+
- name: Build Changed Docker Images
|
| 41 |
+
env:
|
| 42 |
+
CHANGED_FILES: ${{ steps.file_changes.outputs.all }}
|
| 43 |
+
run: |
|
| 44 |
+
echo "$CHANGED_FILES"
|
| 45 |
+
for FILE in $CHANGED_FILES; do
|
| 46 |
+
# skip anything that isn't still on disk
|
| 47 |
+
if [[ ! -f "$FILE" ]]; then
|
| 48 |
+
echo "Skipping removed file $FILE"
|
| 49 |
+
continue
|
| 50 |
+
fi
|
| 51 |
+
if [[ "$FILE" == docker/*Dockerfile ]]; then
|
| 52 |
+
DOCKER_PATH="${FILE%/Dockerfile}"
|
| 53 |
+
DOCKER_TAG=$(basename "$DOCKER_PATH")
|
| 54 |
+
echo "Building Docker image for $DOCKER_TAG"
|
| 55 |
+
docker build -t "$DOCKER_TAG" "$DOCKER_PATH"
|
| 56 |
+
fi
|
| 57 |
+
done
|
| 58 |
+
if: steps.file_changes.outputs.all != ''
|
| 59 |
+
|
| 60 |
+
build-and-push-docker-images:
|
| 61 |
+
runs-on:
|
| 62 |
+
group: aws-general-8-plus
|
| 63 |
+
if: github.event_name != 'pull_request'
|
| 64 |
+
|
| 65 |
+
permissions:
|
| 66 |
+
contents: read
|
| 67 |
+
packages: write
|
| 68 |
+
|
| 69 |
+
strategy:
|
| 70 |
+
fail-fast: false
|
| 71 |
+
matrix:
|
| 72 |
+
image-name:
|
| 73 |
+
- diffusers-pytorch-cpu
|
| 74 |
+
- diffusers-pytorch-cuda
|
| 75 |
+
- diffusers-pytorch-cuda
|
| 76 |
+
- diffusers-pytorch-xformers-cuda
|
| 77 |
+
- diffusers-pytorch-minimum-cuda
|
| 78 |
+
- diffusers-doc-builder
|
| 79 |
+
|
| 80 |
+
steps:
|
| 81 |
+
- name: Checkout repository
|
| 82 |
+
uses: actions/checkout@v3
|
| 83 |
+
- name: Set up Docker Buildx
|
| 84 |
+
uses: docker/setup-buildx-action@v1
|
| 85 |
+
- name: Login to Docker Hub
|
| 86 |
+
uses: docker/login-action@v2
|
| 87 |
+
with:
|
| 88 |
+
username: ${{ env.REGISTRY }}
|
| 89 |
+
password: ${{ secrets.DOCKERHUB_TOKEN }}
|
| 90 |
+
- name: Build and push
|
| 91 |
+
uses: docker/build-push-action@v3
|
| 92 |
+
with:
|
| 93 |
+
no-cache: true
|
| 94 |
+
context: ./docker/${{ matrix.image-name }}
|
| 95 |
+
push: true
|
| 96 |
+
tags: ${{ env.REGISTRY }}/${{ matrix.image-name }}:latest
|
| 97 |
+
|
| 98 |
+
- name: Post to a Slack channel
|
| 99 |
+
id: slack
|
| 100 |
+
uses: huggingface/hf-workflows/.github/actions/post-slack@main
|
| 101 |
+
with:
|
| 102 |
+
# Slack channel id, channel name, or user id to post message.
|
| 103 |
+
# See also: https://api.slack.com/methods/chat.postMessage#channels
|
| 104 |
+
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
|
| 105 |
+
title: "🤗 Results of the ${{ matrix.image-name }} Docker Image build"
|
| 106 |
+
status: ${{ job.status }}
|
| 107 |
+
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
diffusers/.github/workflows/build_documentation.yml
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Build documentation
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
push:
|
| 5 |
+
branches:
|
| 6 |
+
- main
|
| 7 |
+
- doc-builder*
|
| 8 |
+
- v*-release
|
| 9 |
+
- v*-patch
|
| 10 |
+
paths:
|
| 11 |
+
- "src/diffusers/**.py"
|
| 12 |
+
- "examples/**"
|
| 13 |
+
- "docs/**"
|
| 14 |
+
|
| 15 |
+
jobs:
|
| 16 |
+
build:
|
| 17 |
+
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
|
| 18 |
+
with:
|
| 19 |
+
commit_sha: ${{ github.sha }}
|
| 20 |
+
install_libgl1: true
|
| 21 |
+
package: diffusers
|
| 22 |
+
notebook_folder: diffusers_doc
|
| 23 |
+
languages: en ko zh ja pt
|
| 24 |
+
custom_container: diffusers/diffusers-doc-builder
|
| 25 |
+
secrets:
|
| 26 |
+
token: ${{ secrets.HUGGINGFACE_PUSH }}
|
| 27 |
+
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
|
diffusers/.github/workflows/build_pr_documentation.yml
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Build PR Documentation
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
pull_request:
|
| 5 |
+
paths:
|
| 6 |
+
- "src/diffusers/**.py"
|
| 7 |
+
- "examples/**"
|
| 8 |
+
- "docs/**"
|
| 9 |
+
|
| 10 |
+
concurrency:
|
| 11 |
+
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
|
| 12 |
+
cancel-in-progress: true
|
| 13 |
+
|
| 14 |
+
jobs:
|
| 15 |
+
build:
|
| 16 |
+
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
|
| 17 |
+
with:
|
| 18 |
+
commit_sha: ${{ github.event.pull_request.head.sha }}
|
| 19 |
+
pr_number: ${{ github.event.number }}
|
| 20 |
+
install_libgl1: true
|
| 21 |
+
package: diffusers
|
| 22 |
+
languages: en ko zh ja pt
|
| 23 |
+
custom_container: diffusers/diffusers-doc-builder
|
diffusers/.github/workflows/mirror_community_pipeline.yml
ADDED
|
@@ -0,0 +1,102 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Mirror Community Pipeline
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
# Push changes on the main branch
|
| 5 |
+
push:
|
| 6 |
+
branches:
|
| 7 |
+
- main
|
| 8 |
+
paths:
|
| 9 |
+
- 'examples/community/**.py'
|
| 10 |
+
|
| 11 |
+
# And on tag creation (e.g. `v0.28.1`)
|
| 12 |
+
tags:
|
| 13 |
+
- '*'
|
| 14 |
+
|
| 15 |
+
# Manual trigger with ref input
|
| 16 |
+
workflow_dispatch:
|
| 17 |
+
inputs:
|
| 18 |
+
ref:
|
| 19 |
+
description: "Either 'main' or a tag ref"
|
| 20 |
+
required: true
|
| 21 |
+
default: 'main'
|
| 22 |
+
|
| 23 |
+
jobs:
|
| 24 |
+
mirror_community_pipeline:
|
| 25 |
+
env:
|
| 26 |
+
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL_COMMUNITY_MIRROR }}
|
| 27 |
+
|
| 28 |
+
runs-on: ubuntu-22.04
|
| 29 |
+
steps:
|
| 30 |
+
# Checkout to correct ref
|
| 31 |
+
# If workflow dispatch
|
| 32 |
+
# If ref is 'main', set:
|
| 33 |
+
# CHECKOUT_REF=refs/heads/main
|
| 34 |
+
# PATH_IN_REPO=main
|
| 35 |
+
# Else it must be a tag. Set:
|
| 36 |
+
# CHECKOUT_REF=refs/tags/{tag}
|
| 37 |
+
# PATH_IN_REPO={tag}
|
| 38 |
+
# If not workflow dispatch
|
| 39 |
+
# If ref is 'refs/heads/main' => set 'main'
|
| 40 |
+
# Else it must be a tag => set {tag}
|
| 41 |
+
- name: Set checkout_ref and path_in_repo
|
| 42 |
+
run: |
|
| 43 |
+
if [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
|
| 44 |
+
if [ -z "${{ github.event.inputs.ref }}" ]; then
|
| 45 |
+
echo "Error: Missing ref input"
|
| 46 |
+
exit 1
|
| 47 |
+
elif [ "${{ github.event.inputs.ref }}" == "main" ]; then
|
| 48 |
+
echo "CHECKOUT_REF=refs/heads/main" >> $GITHUB_ENV
|
| 49 |
+
echo "PATH_IN_REPO=main" >> $GITHUB_ENV
|
| 50 |
+
else
|
| 51 |
+
echo "CHECKOUT_REF=refs/tags/${{ github.event.inputs.ref }}" >> $GITHUB_ENV
|
| 52 |
+
echo "PATH_IN_REPO=${{ github.event.inputs.ref }}" >> $GITHUB_ENV
|
| 53 |
+
fi
|
| 54 |
+
elif [ "${{ github.ref }}" == "refs/heads/main" ]; then
|
| 55 |
+
echo "CHECKOUT_REF=${{ github.ref }}" >> $GITHUB_ENV
|
| 56 |
+
echo "PATH_IN_REPO=main" >> $GITHUB_ENV
|
| 57 |
+
else
|
| 58 |
+
# e.g. refs/tags/v0.28.1 -> v0.28.1
|
| 59 |
+
echo "CHECKOUT_REF=${{ github.ref }}" >> $GITHUB_ENV
|
| 60 |
+
echo "PATH_IN_REPO=$(echo ${{ github.ref }} | sed 's/^refs\/tags\///')" >> $GITHUB_ENV
|
| 61 |
+
fi
|
| 62 |
+
- name: Print env vars
|
| 63 |
+
run: |
|
| 64 |
+
echo "CHECKOUT_REF: ${{ env.CHECKOUT_REF }}"
|
| 65 |
+
echo "PATH_IN_REPO: ${{ env.PATH_IN_REPO }}"
|
| 66 |
+
- uses: actions/checkout@v3
|
| 67 |
+
with:
|
| 68 |
+
ref: ${{ env.CHECKOUT_REF }}
|
| 69 |
+
|
| 70 |
+
# Setup + install dependencies
|
| 71 |
+
- name: Set up Python
|
| 72 |
+
uses: actions/setup-python@v4
|
| 73 |
+
with:
|
| 74 |
+
python-version: "3.10"
|
| 75 |
+
- name: Install dependencies
|
| 76 |
+
run: |
|
| 77 |
+
python -m pip install --upgrade pip
|
| 78 |
+
pip install --upgrade huggingface_hub
|
| 79 |
+
|
| 80 |
+
# Check secret is set
|
| 81 |
+
- name: whoami
|
| 82 |
+
run: huggingface-cli whoami
|
| 83 |
+
env:
|
| 84 |
+
HF_TOKEN: ${{ secrets.HF_TOKEN_MIRROR_COMMUNITY_PIPELINES }}
|
| 85 |
+
|
| 86 |
+
# Push to HF! (under subfolder based on checkout ref)
|
| 87 |
+
# https://huggingface.co/datasets/diffusers/community-pipelines-mirror
|
| 88 |
+
- name: Mirror community pipeline to HF
|
| 89 |
+
run: huggingface-cli upload diffusers/community-pipelines-mirror ./examples/community ${PATH_IN_REPO} --repo-type dataset
|
| 90 |
+
env:
|
| 91 |
+
PATH_IN_REPO: ${{ env.PATH_IN_REPO }}
|
| 92 |
+
HF_TOKEN: ${{ secrets.HF_TOKEN_MIRROR_COMMUNITY_PIPELINES }}
|
| 93 |
+
|
| 94 |
+
- name: Report success status
|
| 95 |
+
if: ${{ success() }}
|
| 96 |
+
run: |
|
| 97 |
+
pip install requests && python utils/notify_community_pipelines_mirror.py --status=success
|
| 98 |
+
|
| 99 |
+
- name: Report failure status
|
| 100 |
+
if: ${{ failure() }}
|
| 101 |
+
run: |
|
| 102 |
+
pip install requests && python utils/notify_community_pipelines_mirror.py --status=failure
|
diffusers/.github/workflows/nightly_tests.yml
ADDED
|
@@ -0,0 +1,612 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Nightly and release tests on main/release branch
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
workflow_dispatch:
|
| 5 |
+
schedule:
|
| 6 |
+
- cron: "0 0 * * *" # every day at midnight
|
| 7 |
+
|
| 8 |
+
env:
|
| 9 |
+
DIFFUSERS_IS_CI: yes
|
| 10 |
+
HF_HUB_ENABLE_HF_TRANSFER: 1
|
| 11 |
+
OMP_NUM_THREADS: 8
|
| 12 |
+
MKL_NUM_THREADS: 8
|
| 13 |
+
PYTEST_TIMEOUT: 600
|
| 14 |
+
RUN_SLOW: yes
|
| 15 |
+
RUN_NIGHTLY: yes
|
| 16 |
+
PIPELINE_USAGE_CUTOFF: 0
|
| 17 |
+
SLACK_API_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
| 18 |
+
CONSOLIDATED_REPORT_PATH: consolidated_test_report.md
|
| 19 |
+
|
| 20 |
+
jobs:
|
| 21 |
+
setup_torch_cuda_pipeline_matrix:
|
| 22 |
+
name: Setup Torch Pipelines CUDA Slow Tests Matrix
|
| 23 |
+
runs-on:
|
| 24 |
+
group: aws-general-8-plus
|
| 25 |
+
container:
|
| 26 |
+
image: diffusers/diffusers-pytorch-cpu
|
| 27 |
+
outputs:
|
| 28 |
+
pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
|
| 29 |
+
steps:
|
| 30 |
+
- name: Checkout diffusers
|
| 31 |
+
uses: actions/checkout@v3
|
| 32 |
+
with:
|
| 33 |
+
fetch-depth: 2
|
| 34 |
+
- name: Install dependencies
|
| 35 |
+
run: |
|
| 36 |
+
pip install -e .[test]
|
| 37 |
+
pip install huggingface_hub
|
| 38 |
+
- name: Fetch Pipeline Matrix
|
| 39 |
+
id: fetch_pipeline_matrix
|
| 40 |
+
run: |
|
| 41 |
+
matrix=$(python utils/fetch_torch_cuda_pipeline_test_matrix.py)
|
| 42 |
+
echo $matrix
|
| 43 |
+
echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
|
| 44 |
+
|
| 45 |
+
- name: Pipeline Tests Artifacts
|
| 46 |
+
if: ${{ always() }}
|
| 47 |
+
uses: actions/upload-artifact@v4
|
| 48 |
+
with:
|
| 49 |
+
name: test-pipelines.json
|
| 50 |
+
path: reports
|
| 51 |
+
|
| 52 |
+
run_nightly_tests_for_torch_pipelines:
|
| 53 |
+
name: Nightly Torch Pipelines CUDA Tests
|
| 54 |
+
needs: setup_torch_cuda_pipeline_matrix
|
| 55 |
+
strategy:
|
| 56 |
+
fail-fast: false
|
| 57 |
+
max-parallel: 8
|
| 58 |
+
matrix:
|
| 59 |
+
module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
|
| 60 |
+
runs-on:
|
| 61 |
+
group: aws-g4dn-2xlarge
|
| 62 |
+
container:
|
| 63 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 64 |
+
options: --shm-size "16gb" --ipc host --gpus 0
|
| 65 |
+
steps:
|
| 66 |
+
- name: Checkout diffusers
|
| 67 |
+
uses: actions/checkout@v3
|
| 68 |
+
with:
|
| 69 |
+
fetch-depth: 2
|
| 70 |
+
- name: NVIDIA-SMI
|
| 71 |
+
run: nvidia-smi
|
| 72 |
+
- name: Install dependencies
|
| 73 |
+
run: |
|
| 74 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 75 |
+
python -m uv pip install -e [quality,test]
|
| 76 |
+
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
| 77 |
+
python -m uv pip install pytest-reportlog
|
| 78 |
+
- name: Environment
|
| 79 |
+
run: |
|
| 80 |
+
python utils/print_env.py
|
| 81 |
+
- name: Pipeline CUDA Test
|
| 82 |
+
env:
|
| 83 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 84 |
+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
| 85 |
+
CUBLAS_WORKSPACE_CONFIG: :16:8
|
| 86 |
+
run: |
|
| 87 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
| 88 |
+
-s -v -k "not Flax and not Onnx" \
|
| 89 |
+
--make-reports=tests_pipeline_${{ matrix.module }}_cuda \
|
| 90 |
+
--report-log=tests_pipeline_${{ matrix.module }}_cuda.log \
|
| 91 |
+
tests/pipelines/${{ matrix.module }}
|
| 92 |
+
- name: Failure short reports
|
| 93 |
+
if: ${{ failure() }}
|
| 94 |
+
run: |
|
| 95 |
+
cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
|
| 96 |
+
cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
|
| 97 |
+
- name: Test suite reports artifacts
|
| 98 |
+
if: ${{ always() }}
|
| 99 |
+
uses: actions/upload-artifact@v4
|
| 100 |
+
with:
|
| 101 |
+
name: pipeline_${{ matrix.module }}_test_reports
|
| 102 |
+
path: reports
|
| 103 |
+
|
| 104 |
+
run_nightly_tests_for_other_torch_modules:
|
| 105 |
+
name: Nightly Torch CUDA Tests
|
| 106 |
+
runs-on:
|
| 107 |
+
group: aws-g4dn-2xlarge
|
| 108 |
+
container:
|
| 109 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 110 |
+
options: --shm-size "16gb" --ipc host --gpus 0
|
| 111 |
+
defaults:
|
| 112 |
+
run:
|
| 113 |
+
shell: bash
|
| 114 |
+
strategy:
|
| 115 |
+
fail-fast: false
|
| 116 |
+
max-parallel: 2
|
| 117 |
+
matrix:
|
| 118 |
+
module: [models, schedulers, lora, others, single_file, examples]
|
| 119 |
+
steps:
|
| 120 |
+
- name: Checkout diffusers
|
| 121 |
+
uses: actions/checkout@v3
|
| 122 |
+
with:
|
| 123 |
+
fetch-depth: 2
|
| 124 |
+
|
| 125 |
+
- name: Install dependencies
|
| 126 |
+
run: |
|
| 127 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 128 |
+
python -m uv pip install -e [quality,test]
|
| 129 |
+
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
|
| 130 |
+
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
| 131 |
+
python -m uv pip install pytest-reportlog
|
| 132 |
+
- name: Environment
|
| 133 |
+
run: python utils/print_env.py
|
| 134 |
+
|
| 135 |
+
- name: Run nightly PyTorch CUDA tests for non-pipeline modules
|
| 136 |
+
if: ${{ matrix.module != 'examples'}}
|
| 137 |
+
env:
|
| 138 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 139 |
+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
| 140 |
+
CUBLAS_WORKSPACE_CONFIG: :16:8
|
| 141 |
+
run: |
|
| 142 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
| 143 |
+
-s -v -k "not Flax and not Onnx" \
|
| 144 |
+
--make-reports=tests_torch_${{ matrix.module }}_cuda \
|
| 145 |
+
--report-log=tests_torch_${{ matrix.module }}_cuda.log \
|
| 146 |
+
tests/${{ matrix.module }}
|
| 147 |
+
|
| 148 |
+
- name: Run nightly example tests with Torch
|
| 149 |
+
if: ${{ matrix.module == 'examples' }}
|
| 150 |
+
env:
|
| 151 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 152 |
+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
| 153 |
+
CUBLAS_WORKSPACE_CONFIG: :16:8
|
| 154 |
+
run: |
|
| 155 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
| 156 |
+
-s -v --make-reports=examples_torch_cuda \
|
| 157 |
+
--report-log=examples_torch_cuda.log \
|
| 158 |
+
examples/
|
| 159 |
+
|
| 160 |
+
- name: Failure short reports
|
| 161 |
+
if: ${{ failure() }}
|
| 162 |
+
run: |
|
| 163 |
+
cat reports/tests_torch_${{ matrix.module }}_cuda_stats.txt
|
| 164 |
+
cat reports/tests_torch_${{ matrix.module }}_cuda_failures_short.txt
|
| 165 |
+
|
| 166 |
+
- name: Test suite reports artifacts
|
| 167 |
+
if: ${{ always() }}
|
| 168 |
+
uses: actions/upload-artifact@v4
|
| 169 |
+
with:
|
| 170 |
+
name: torch_${{ matrix.module }}_cuda_test_reports
|
| 171 |
+
path: reports
|
| 172 |
+
|
| 173 |
+
run_torch_compile_tests:
|
| 174 |
+
name: PyTorch Compile CUDA tests
|
| 175 |
+
|
| 176 |
+
runs-on:
|
| 177 |
+
group: aws-g4dn-2xlarge
|
| 178 |
+
|
| 179 |
+
container:
|
| 180 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 181 |
+
options: --gpus 0 --shm-size "16gb" --ipc host
|
| 182 |
+
|
| 183 |
+
steps:
|
| 184 |
+
- name: Checkout diffusers
|
| 185 |
+
uses: actions/checkout@v3
|
| 186 |
+
with:
|
| 187 |
+
fetch-depth: 2
|
| 188 |
+
|
| 189 |
+
- name: NVIDIA-SMI
|
| 190 |
+
run: |
|
| 191 |
+
nvidia-smi
|
| 192 |
+
- name: Install dependencies
|
| 193 |
+
run: |
|
| 194 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 195 |
+
python -m uv pip install -e [quality,test,training]
|
| 196 |
+
- name: Environment
|
| 197 |
+
run: |
|
| 198 |
+
python utils/print_env.py
|
| 199 |
+
- name: Run torch compile tests on GPU
|
| 200 |
+
env:
|
| 201 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 202 |
+
RUN_COMPILE: yes
|
| 203 |
+
run: |
|
| 204 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
|
| 205 |
+
- name: Failure short reports
|
| 206 |
+
if: ${{ failure() }}
|
| 207 |
+
run: cat reports/tests_torch_compile_cuda_failures_short.txt
|
| 208 |
+
|
| 209 |
+
- name: Test suite reports artifacts
|
| 210 |
+
if: ${{ always() }}
|
| 211 |
+
uses: actions/upload-artifact@v4
|
| 212 |
+
with:
|
| 213 |
+
name: torch_compile_test_reports
|
| 214 |
+
path: reports
|
| 215 |
+
|
| 216 |
+
run_big_gpu_torch_tests:
|
| 217 |
+
name: Torch tests on big GPU
|
| 218 |
+
strategy:
|
| 219 |
+
fail-fast: false
|
| 220 |
+
max-parallel: 2
|
| 221 |
+
runs-on:
|
| 222 |
+
group: aws-g6e-xlarge-plus
|
| 223 |
+
container:
|
| 224 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 225 |
+
options: --shm-size "16gb" --ipc host --gpus 0
|
| 226 |
+
steps:
|
| 227 |
+
- name: Checkout diffusers
|
| 228 |
+
uses: actions/checkout@v3
|
| 229 |
+
with:
|
| 230 |
+
fetch-depth: 2
|
| 231 |
+
- name: NVIDIA-SMI
|
| 232 |
+
run: nvidia-smi
|
| 233 |
+
- name: Install dependencies
|
| 234 |
+
run: |
|
| 235 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 236 |
+
python -m uv pip install -e [quality,test]
|
| 237 |
+
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
|
| 238 |
+
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
| 239 |
+
python -m uv pip install pytest-reportlog
|
| 240 |
+
- name: Environment
|
| 241 |
+
run: |
|
| 242 |
+
python utils/print_env.py
|
| 243 |
+
- name: Selected Torch CUDA Test on big GPU
|
| 244 |
+
env:
|
| 245 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 246 |
+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
| 247 |
+
CUBLAS_WORKSPACE_CONFIG: :16:8
|
| 248 |
+
BIG_GPU_MEMORY: 40
|
| 249 |
+
run: |
|
| 250 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
| 251 |
+
-m "big_accelerator" \
|
| 252 |
+
--make-reports=tests_big_gpu_torch_cuda \
|
| 253 |
+
--report-log=tests_big_gpu_torch_cuda.log \
|
| 254 |
+
tests/
|
| 255 |
+
- name: Failure short reports
|
| 256 |
+
if: ${{ failure() }}
|
| 257 |
+
run: |
|
| 258 |
+
cat reports/tests_big_gpu_torch_cuda_stats.txt
|
| 259 |
+
cat reports/tests_big_gpu_torch_cuda_failures_short.txt
|
| 260 |
+
- name: Test suite reports artifacts
|
| 261 |
+
if: ${{ always() }}
|
| 262 |
+
uses: actions/upload-artifact@v4
|
| 263 |
+
with:
|
| 264 |
+
name: torch_cuda_big_gpu_test_reports
|
| 265 |
+
path: reports
|
| 266 |
+
|
| 267 |
+
torch_minimum_version_cuda_tests:
|
| 268 |
+
name: Torch Minimum Version CUDA Tests
|
| 269 |
+
runs-on:
|
| 270 |
+
group: aws-g4dn-2xlarge
|
| 271 |
+
container:
|
| 272 |
+
image: diffusers/diffusers-pytorch-minimum-cuda
|
| 273 |
+
options: --shm-size "16gb" --ipc host --gpus 0
|
| 274 |
+
defaults:
|
| 275 |
+
run:
|
| 276 |
+
shell: bash
|
| 277 |
+
steps:
|
| 278 |
+
- name: Checkout diffusers
|
| 279 |
+
uses: actions/checkout@v3
|
| 280 |
+
with:
|
| 281 |
+
fetch-depth: 2
|
| 282 |
+
|
| 283 |
+
- name: Install dependencies
|
| 284 |
+
run: |
|
| 285 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 286 |
+
python -m uv pip install -e [quality,test]
|
| 287 |
+
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
|
| 288 |
+
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
| 289 |
+
|
| 290 |
+
- name: Environment
|
| 291 |
+
run: |
|
| 292 |
+
python utils/print_env.py
|
| 293 |
+
|
| 294 |
+
- name: Run PyTorch CUDA tests
|
| 295 |
+
env:
|
| 296 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 297 |
+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
| 298 |
+
CUBLAS_WORKSPACE_CONFIG: :16:8
|
| 299 |
+
run: |
|
| 300 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
| 301 |
+
-s -v -k "not Flax and not Onnx" \
|
| 302 |
+
--make-reports=tests_torch_minimum_version_cuda \
|
| 303 |
+
tests/models/test_modeling_common.py \
|
| 304 |
+
tests/pipelines/test_pipelines_common.py \
|
| 305 |
+
tests/pipelines/test_pipeline_utils.py \
|
| 306 |
+
tests/pipelines/test_pipelines.py \
|
| 307 |
+
tests/pipelines/test_pipelines_auto.py \
|
| 308 |
+
tests/schedulers/test_schedulers.py \
|
| 309 |
+
tests/others
|
| 310 |
+
|
| 311 |
+
- name: Failure short reports
|
| 312 |
+
if: ${{ failure() }}
|
| 313 |
+
run: |
|
| 314 |
+
cat reports/tests_torch_minimum_version_cuda_stats.txt
|
| 315 |
+
cat reports/tests_torch_minimum_version_cuda_failures_short.txt
|
| 316 |
+
|
| 317 |
+
- name: Test suite reports artifacts
|
| 318 |
+
if: ${{ always() }}
|
| 319 |
+
uses: actions/upload-artifact@v4
|
| 320 |
+
with:
|
| 321 |
+
name: torch_minimum_version_cuda_test_reports
|
| 322 |
+
path: reports
|
| 323 |
+
|
| 324 |
+
run_nightly_quantization_tests:
|
| 325 |
+
name: Torch quantization nightly tests
|
| 326 |
+
strategy:
|
| 327 |
+
fail-fast: false
|
| 328 |
+
max-parallel: 2
|
| 329 |
+
matrix:
|
| 330 |
+
config:
|
| 331 |
+
- backend: "bitsandbytes"
|
| 332 |
+
test_location: "bnb"
|
| 333 |
+
additional_deps: ["peft"]
|
| 334 |
+
- backend: "gguf"
|
| 335 |
+
test_location: "gguf"
|
| 336 |
+
additional_deps: ["peft"]
|
| 337 |
+
- backend: "torchao"
|
| 338 |
+
test_location: "torchao"
|
| 339 |
+
additional_deps: []
|
| 340 |
+
- backend: "optimum_quanto"
|
| 341 |
+
test_location: "quanto"
|
| 342 |
+
additional_deps: []
|
| 343 |
+
runs-on:
|
| 344 |
+
group: aws-g6e-xlarge-plus
|
| 345 |
+
container:
|
| 346 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 347 |
+
options: --shm-size "20gb" --ipc host --gpus 0
|
| 348 |
+
steps:
|
| 349 |
+
- name: Checkout diffusers
|
| 350 |
+
uses: actions/checkout@v3
|
| 351 |
+
with:
|
| 352 |
+
fetch-depth: 2
|
| 353 |
+
- name: NVIDIA-SMI
|
| 354 |
+
run: nvidia-smi
|
| 355 |
+
- name: Install dependencies
|
| 356 |
+
run: |
|
| 357 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 358 |
+
python -m uv pip install -e [quality,test]
|
| 359 |
+
python -m uv pip install -U ${{ matrix.config.backend }}
|
| 360 |
+
if [ "${{ join(matrix.config.additional_deps, ' ') }}" != "" ]; then
|
| 361 |
+
python -m uv pip install ${{ join(matrix.config.additional_deps, ' ') }}
|
| 362 |
+
fi
|
| 363 |
+
python -m uv pip install pytest-reportlog
|
| 364 |
+
- name: Environment
|
| 365 |
+
run: |
|
| 366 |
+
python utils/print_env.py
|
| 367 |
+
- name: ${{ matrix.config.backend }} quantization tests on GPU
|
| 368 |
+
env:
|
| 369 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 370 |
+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
| 371 |
+
CUBLAS_WORKSPACE_CONFIG: :16:8
|
| 372 |
+
BIG_GPU_MEMORY: 40
|
| 373 |
+
run: |
|
| 374 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
| 375 |
+
--make-reports=tests_${{ matrix.config.backend }}_torch_cuda \
|
| 376 |
+
--report-log=tests_${{ matrix.config.backend }}_torch_cuda.log \
|
| 377 |
+
tests/quantization/${{ matrix.config.test_location }}
|
| 378 |
+
- name: Failure short reports
|
| 379 |
+
if: ${{ failure() }}
|
| 380 |
+
run: |
|
| 381 |
+
cat reports/tests_${{ matrix.config.backend }}_torch_cuda_stats.txt
|
| 382 |
+
cat reports/tests_${{ matrix.config.backend }}_torch_cuda_failures_short.txt
|
| 383 |
+
- name: Test suite reports artifacts
|
| 384 |
+
if: ${{ always() }}
|
| 385 |
+
uses: actions/upload-artifact@v4
|
| 386 |
+
with:
|
| 387 |
+
name: torch_cuda_${{ matrix.config.backend }}_reports
|
| 388 |
+
path: reports
|
| 389 |
+
|
| 390 |
+
run_nightly_pipeline_level_quantization_tests:
|
| 391 |
+
name: Torch quantization nightly tests
|
| 392 |
+
strategy:
|
| 393 |
+
fail-fast: false
|
| 394 |
+
max-parallel: 2
|
| 395 |
+
runs-on:
|
| 396 |
+
group: aws-g6e-xlarge-plus
|
| 397 |
+
container:
|
| 398 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 399 |
+
options: --shm-size "20gb" --ipc host --gpus 0
|
| 400 |
+
steps:
|
| 401 |
+
- name: Checkout diffusers
|
| 402 |
+
uses: actions/checkout@v3
|
| 403 |
+
with:
|
| 404 |
+
fetch-depth: 2
|
| 405 |
+
- name: NVIDIA-SMI
|
| 406 |
+
run: nvidia-smi
|
| 407 |
+
- name: Install dependencies
|
| 408 |
+
run: |
|
| 409 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 410 |
+
python -m uv pip install -e [quality,test]
|
| 411 |
+
python -m uv pip install -U bitsandbytes optimum_quanto
|
| 412 |
+
python -m uv pip install pytest-reportlog
|
| 413 |
+
- name: Environment
|
| 414 |
+
run: |
|
| 415 |
+
python utils/print_env.py
|
| 416 |
+
- name: Pipeline-level quantization tests on GPU
|
| 417 |
+
env:
|
| 418 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 419 |
+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
| 420 |
+
CUBLAS_WORKSPACE_CONFIG: :16:8
|
| 421 |
+
BIG_GPU_MEMORY: 40
|
| 422 |
+
run: |
|
| 423 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
| 424 |
+
--make-reports=tests_pipeline_level_quant_torch_cuda \
|
| 425 |
+
--report-log=tests_pipeline_level_quant_torch_cuda.log \
|
| 426 |
+
tests/quantization/test_pipeline_level_quantization.py
|
| 427 |
+
- name: Failure short reports
|
| 428 |
+
if: ${{ failure() }}
|
| 429 |
+
run: |
|
| 430 |
+
cat reports/tests_pipeline_level_quant_torch_cuda_stats.txt
|
| 431 |
+
cat reports/tests_pipeline_level_quant_torch_cuda_failures_short.txt
|
| 432 |
+
- name: Test suite reports artifacts
|
| 433 |
+
if: ${{ always() }}
|
| 434 |
+
uses: actions/upload-artifact@v4
|
| 435 |
+
with:
|
| 436 |
+
name: torch_cuda_pipeline_level_quant_reports
|
| 437 |
+
path: reports
|
| 438 |
+
|
| 439 |
+
generate_consolidated_report:
|
| 440 |
+
name: Generate Consolidated Test Report
|
| 441 |
+
needs: [
|
| 442 |
+
run_nightly_tests_for_torch_pipelines,
|
| 443 |
+
run_nightly_tests_for_other_torch_modules,
|
| 444 |
+
run_torch_compile_tests,
|
| 445 |
+
run_big_gpu_torch_tests,
|
| 446 |
+
run_nightly_quantization_tests,
|
| 447 |
+
run_nightly_pipeline_level_quantization_tests,
|
| 448 |
+
# run_nightly_onnx_tests,
|
| 449 |
+
torch_minimum_version_cuda_tests,
|
| 450 |
+
# run_flax_tpu_tests
|
| 451 |
+
]
|
| 452 |
+
if: always()
|
| 453 |
+
runs-on:
|
| 454 |
+
group: aws-general-8-plus
|
| 455 |
+
container:
|
| 456 |
+
image: diffusers/diffusers-pytorch-cpu
|
| 457 |
+
steps:
|
| 458 |
+
- name: Checkout diffusers
|
| 459 |
+
uses: actions/checkout@v3
|
| 460 |
+
with:
|
| 461 |
+
fetch-depth: 2
|
| 462 |
+
|
| 463 |
+
- name: Create reports directory
|
| 464 |
+
run: mkdir -p combined_reports
|
| 465 |
+
|
| 466 |
+
- name: Download all test reports
|
| 467 |
+
uses: actions/download-artifact@v4
|
| 468 |
+
with:
|
| 469 |
+
path: artifacts
|
| 470 |
+
|
| 471 |
+
- name: Prepare reports
|
| 472 |
+
run: |
|
| 473 |
+
# Move all report files to a single directory for processing
|
| 474 |
+
find artifacts -name "*.txt" -exec cp {} combined_reports/ \;
|
| 475 |
+
|
| 476 |
+
- name: Install dependencies
|
| 477 |
+
run: |
|
| 478 |
+
pip install -e .[test]
|
| 479 |
+
pip install slack_sdk tabulate
|
| 480 |
+
|
| 481 |
+
- name: Generate consolidated report
|
| 482 |
+
run: |
|
| 483 |
+
python utils/consolidated_test_report.py \
|
| 484 |
+
--reports_dir combined_reports \
|
| 485 |
+
--output_file $CONSOLIDATED_REPORT_PATH \
|
| 486 |
+
--slack_channel_name diffusers-ci-nightly
|
| 487 |
+
|
| 488 |
+
- name: Show consolidated report
|
| 489 |
+
run: |
|
| 490 |
+
cat $CONSOLIDATED_REPORT_PATH >> $GITHUB_STEP_SUMMARY
|
| 491 |
+
|
| 492 |
+
- name: Upload consolidated report
|
| 493 |
+
uses: actions/upload-artifact@v4
|
| 494 |
+
with:
|
| 495 |
+
name: consolidated_test_report
|
| 496 |
+
path: ${{ env.CONSOLIDATED_REPORT_PATH }}
|
| 497 |
+
|
| 498 |
+
# M1 runner currently not well supported
|
| 499 |
+
# TODO: (Dhruv) add these back when we setup better testing for Apple Silicon
|
| 500 |
+
# run_nightly_tests_apple_m1:
|
| 501 |
+
# name: Nightly PyTorch MPS tests on MacOS
|
| 502 |
+
# runs-on: [ self-hosted, apple-m1 ]
|
| 503 |
+
# if: github.event_name == 'schedule'
|
| 504 |
+
#
|
| 505 |
+
# steps:
|
| 506 |
+
# - name: Checkout diffusers
|
| 507 |
+
# uses: actions/checkout@v3
|
| 508 |
+
# with:
|
| 509 |
+
# fetch-depth: 2
|
| 510 |
+
#
|
| 511 |
+
# - name: Clean checkout
|
| 512 |
+
# shell: arch -arch arm64 bash {0}
|
| 513 |
+
# run: |
|
| 514 |
+
# git clean -fxd
|
| 515 |
+
# - name: Setup miniconda
|
| 516 |
+
# uses: ./.github/actions/setup-miniconda
|
| 517 |
+
# with:
|
| 518 |
+
# python-version: 3.9
|
| 519 |
+
#
|
| 520 |
+
# - name: Install dependencies
|
| 521 |
+
# shell: arch -arch arm64 bash {0}
|
| 522 |
+
# run: |
|
| 523 |
+
# ${CONDA_RUN} python -m pip install --upgrade pip uv
|
| 524 |
+
# ${CONDA_RUN} python -m uv pip install -e [quality,test]
|
| 525 |
+
# ${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
|
| 526 |
+
# ${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
|
| 527 |
+
# ${CONDA_RUN} python -m uv pip install pytest-reportlog
|
| 528 |
+
# - name: Environment
|
| 529 |
+
# shell: arch -arch arm64 bash {0}
|
| 530 |
+
# run: |
|
| 531 |
+
# ${CONDA_RUN} python utils/print_env.py
|
| 532 |
+
# - name: Run nightly PyTorch tests on M1 (MPS)
|
| 533 |
+
# shell: arch -arch arm64 bash {0}
|
| 534 |
+
# env:
|
| 535 |
+
# HF_HOME: /System/Volumes/Data/mnt/cache
|
| 536 |
+
# HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 537 |
+
# run: |
|
| 538 |
+
# ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
|
| 539 |
+
# --report-log=tests_torch_mps.log \
|
| 540 |
+
# tests/
|
| 541 |
+
# - name: Failure short reports
|
| 542 |
+
# if: ${{ failure() }}
|
| 543 |
+
# run: cat reports/tests_torch_mps_failures_short.txt
|
| 544 |
+
#
|
| 545 |
+
# - name: Test suite reports artifacts
|
| 546 |
+
# if: ${{ always() }}
|
| 547 |
+
# uses: actions/upload-artifact@v4
|
| 548 |
+
# with:
|
| 549 |
+
# name: torch_mps_test_reports
|
| 550 |
+
# path: reports
|
| 551 |
+
#
|
| 552 |
+
# - name: Generate Report and Notify Channel
|
| 553 |
+
# if: always()
|
| 554 |
+
# run: |
|
| 555 |
+
# pip install slack_sdk tabulate
|
| 556 |
+
# python utils/log_reports.py >> $GITHUB_STEP_SUMMARY run_nightly_tests_apple_m1:
|
| 557 |
+
# name: Nightly PyTorch MPS tests on MacOS
|
| 558 |
+
# runs-on: [ self-hosted, apple-m1 ]
|
| 559 |
+
# if: github.event_name == 'schedule'
|
| 560 |
+
#
|
| 561 |
+
# steps:
|
| 562 |
+
# - name: Checkout diffusers
|
| 563 |
+
# uses: actions/checkout@v3
|
| 564 |
+
# with:
|
| 565 |
+
# fetch-depth: 2
|
| 566 |
+
#
|
| 567 |
+
# - name: Clean checkout
|
| 568 |
+
# shell: arch -arch arm64 bash {0}
|
| 569 |
+
# run: |
|
| 570 |
+
# git clean -fxd
|
| 571 |
+
# - name: Setup miniconda
|
| 572 |
+
# uses: ./.github/actions/setup-miniconda
|
| 573 |
+
# with:
|
| 574 |
+
# python-version: 3.9
|
| 575 |
+
#
|
| 576 |
+
# - name: Install dependencies
|
| 577 |
+
# shell: arch -arch arm64 bash {0}
|
| 578 |
+
# run: |
|
| 579 |
+
# ${CONDA_RUN} python -m pip install --upgrade pip uv
|
| 580 |
+
# ${CONDA_RUN} python -m uv pip install -e [quality,test]
|
| 581 |
+
# ${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
|
| 582 |
+
# ${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
|
| 583 |
+
# ${CONDA_RUN} python -m uv pip install pytest-reportlog
|
| 584 |
+
# - name: Environment
|
| 585 |
+
# shell: arch -arch arm64 bash {0}
|
| 586 |
+
# run: |
|
| 587 |
+
# ${CONDA_RUN} python utils/print_env.py
|
| 588 |
+
# - name: Run nightly PyTorch tests on M1 (MPS)
|
| 589 |
+
# shell: arch -arch arm64 bash {0}
|
| 590 |
+
# env:
|
| 591 |
+
# HF_HOME: /System/Volumes/Data/mnt/cache
|
| 592 |
+
# HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 593 |
+
# run: |
|
| 594 |
+
# ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
|
| 595 |
+
# --report-log=tests_torch_mps.log \
|
| 596 |
+
# tests/
|
| 597 |
+
# - name: Failure short reports
|
| 598 |
+
# if: ${{ failure() }}
|
| 599 |
+
# run: cat reports/tests_torch_mps_failures_short.txt
|
| 600 |
+
#
|
| 601 |
+
# - name: Test suite reports artifacts
|
| 602 |
+
# if: ${{ always() }}
|
| 603 |
+
# uses: actions/upload-artifact@v4
|
| 604 |
+
# with:
|
| 605 |
+
# name: torch_mps_test_reports
|
| 606 |
+
# path: reports
|
| 607 |
+
#
|
| 608 |
+
# - name: Generate Report and Notify Channel
|
| 609 |
+
# if: always()
|
| 610 |
+
# run: |
|
| 611 |
+
# pip install slack_sdk tabulate
|
| 612 |
+
# python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
|
diffusers/.github/workflows/notify_slack_about_release.yml
ADDED
|
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Notify Slack about a release
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
workflow_dispatch:
|
| 5 |
+
release:
|
| 6 |
+
types: [published]
|
| 7 |
+
|
| 8 |
+
jobs:
|
| 9 |
+
build:
|
| 10 |
+
runs-on: ubuntu-22.04
|
| 11 |
+
|
| 12 |
+
steps:
|
| 13 |
+
- uses: actions/checkout@v3
|
| 14 |
+
|
| 15 |
+
- name: Setup Python
|
| 16 |
+
uses: actions/setup-python@v4
|
| 17 |
+
with:
|
| 18 |
+
python-version: '3.8'
|
| 19 |
+
|
| 20 |
+
- name: Notify Slack about the release
|
| 21 |
+
env:
|
| 22 |
+
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
|
| 23 |
+
run: pip install requests && python utils/notify_slack_about_release.py
|
diffusers/.github/workflows/pr_dependency_test.yml
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Run dependency tests
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
pull_request:
|
| 5 |
+
branches:
|
| 6 |
+
- main
|
| 7 |
+
paths:
|
| 8 |
+
- "src/diffusers/**.py"
|
| 9 |
+
push:
|
| 10 |
+
branches:
|
| 11 |
+
- main
|
| 12 |
+
|
| 13 |
+
concurrency:
|
| 14 |
+
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
|
| 15 |
+
cancel-in-progress: true
|
| 16 |
+
|
| 17 |
+
jobs:
|
| 18 |
+
check_dependencies:
|
| 19 |
+
runs-on: ubuntu-22.04
|
| 20 |
+
steps:
|
| 21 |
+
- uses: actions/checkout@v3
|
| 22 |
+
- name: Set up Python
|
| 23 |
+
uses: actions/setup-python@v4
|
| 24 |
+
with:
|
| 25 |
+
python-version: "3.8"
|
| 26 |
+
- name: Install dependencies
|
| 27 |
+
run: |
|
| 28 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 29 |
+
python -m pip install --upgrade pip uv
|
| 30 |
+
python -m uv pip install -e .
|
| 31 |
+
python -m uv pip install pytest
|
| 32 |
+
- name: Check for soft dependencies
|
| 33 |
+
run: |
|
| 34 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 35 |
+
pytest tests/others/test_dependencies.py
|
diffusers/.github/workflows/pr_flax_dependency_test.yml
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Run Flax dependency tests
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
pull_request:
|
| 5 |
+
branches:
|
| 6 |
+
- main
|
| 7 |
+
paths:
|
| 8 |
+
- "src/diffusers/**.py"
|
| 9 |
+
push:
|
| 10 |
+
branches:
|
| 11 |
+
- main
|
| 12 |
+
|
| 13 |
+
concurrency:
|
| 14 |
+
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
|
| 15 |
+
cancel-in-progress: true
|
| 16 |
+
|
| 17 |
+
jobs:
|
| 18 |
+
check_flax_dependencies:
|
| 19 |
+
runs-on: ubuntu-22.04
|
| 20 |
+
steps:
|
| 21 |
+
- uses: actions/checkout@v3
|
| 22 |
+
- name: Set up Python
|
| 23 |
+
uses: actions/setup-python@v4
|
| 24 |
+
with:
|
| 25 |
+
python-version: "3.8"
|
| 26 |
+
- name: Install dependencies
|
| 27 |
+
run: |
|
| 28 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 29 |
+
python -m pip install --upgrade pip uv
|
| 30 |
+
python -m uv pip install -e .
|
| 31 |
+
python -m uv pip install "jax[cpu]>=0.2.16,!=0.3.2"
|
| 32 |
+
python -m uv pip install "flax>=0.4.1"
|
| 33 |
+
python -m uv pip install "jaxlib>=0.1.65"
|
| 34 |
+
python -m uv pip install pytest
|
| 35 |
+
- name: Check for soft dependencies
|
| 36 |
+
run: |
|
| 37 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 38 |
+
pytest tests/others/test_dependencies.py
|
diffusers/.github/workflows/pr_style_bot.yml
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: PR Style Bot
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
issue_comment:
|
| 5 |
+
types: [created]
|
| 6 |
+
|
| 7 |
+
permissions:
|
| 8 |
+
contents: write
|
| 9 |
+
pull-requests: write
|
| 10 |
+
|
| 11 |
+
jobs:
|
| 12 |
+
style:
|
| 13 |
+
uses: huggingface/huggingface_hub/.github/workflows/style-bot-action.yml@main
|
| 14 |
+
with:
|
| 15 |
+
python_quality_dependencies: "[quality]"
|
| 16 |
+
secrets:
|
| 17 |
+
bot_token: ${{ secrets.HF_STYLE_BOT_ACTION }}
|
diffusers/.github/workflows/pr_test_fetcher.yml
ADDED
|
@@ -0,0 +1,177 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Fast tests for PRs - Test Fetcher
|
| 2 |
+
|
| 3 |
+
on: workflow_dispatch
|
| 4 |
+
|
| 5 |
+
env:
|
| 6 |
+
DIFFUSERS_IS_CI: yes
|
| 7 |
+
OMP_NUM_THREADS: 4
|
| 8 |
+
MKL_NUM_THREADS: 4
|
| 9 |
+
PYTEST_TIMEOUT: 60
|
| 10 |
+
|
| 11 |
+
concurrency:
|
| 12 |
+
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
|
| 13 |
+
cancel-in-progress: true
|
| 14 |
+
|
| 15 |
+
jobs:
|
| 16 |
+
setup_pr_tests:
|
| 17 |
+
name: Setup PR Tests
|
| 18 |
+
runs-on:
|
| 19 |
+
group: aws-general-8-plus
|
| 20 |
+
container:
|
| 21 |
+
image: diffusers/diffusers-pytorch-cpu
|
| 22 |
+
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
| 23 |
+
defaults:
|
| 24 |
+
run:
|
| 25 |
+
shell: bash
|
| 26 |
+
outputs:
|
| 27 |
+
matrix: ${{ steps.set_matrix.outputs.matrix }}
|
| 28 |
+
test_map: ${{ steps.set_matrix.outputs.test_map }}
|
| 29 |
+
steps:
|
| 30 |
+
- name: Checkout diffusers
|
| 31 |
+
uses: actions/checkout@v3
|
| 32 |
+
with:
|
| 33 |
+
fetch-depth: 0
|
| 34 |
+
- name: Install dependencies
|
| 35 |
+
run: |
|
| 36 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 37 |
+
python -m uv pip install -e [quality,test]
|
| 38 |
+
- name: Environment
|
| 39 |
+
run: |
|
| 40 |
+
python utils/print_env.py
|
| 41 |
+
echo $(git --version)
|
| 42 |
+
- name: Fetch Tests
|
| 43 |
+
run: |
|
| 44 |
+
python utils/tests_fetcher.py | tee test_preparation.txt
|
| 45 |
+
- name: Report fetched tests
|
| 46 |
+
uses: actions/upload-artifact@v3
|
| 47 |
+
with:
|
| 48 |
+
name: test_fetched
|
| 49 |
+
path: test_preparation.txt
|
| 50 |
+
- id: set_matrix
|
| 51 |
+
name: Create Test Matrix
|
| 52 |
+
# The `keys` is used as GitHub actions matrix for jobs, i.e. `models`, `pipelines`, etc.
|
| 53 |
+
# The `test_map` is used to get the actual identified test files under each key.
|
| 54 |
+
# If no test to run (so no `test_map.json` file), create a dummy map (empty matrix will fail)
|
| 55 |
+
run: |
|
| 56 |
+
if [ -f test_map.json ]; then
|
| 57 |
+
keys=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); d = list(test_map.keys()); print(json.dumps(d))')
|
| 58 |
+
test_map=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); print(json.dumps(test_map))')
|
| 59 |
+
else
|
| 60 |
+
keys=$(python3 -c 'keys = ["dummy"]; print(keys)')
|
| 61 |
+
test_map=$(python3 -c 'test_map = {"dummy": []}; print(test_map)')
|
| 62 |
+
fi
|
| 63 |
+
echo $keys
|
| 64 |
+
echo $test_map
|
| 65 |
+
echo "matrix=$keys" >> $GITHUB_OUTPUT
|
| 66 |
+
echo "test_map=$test_map" >> $GITHUB_OUTPUT
|
| 67 |
+
|
| 68 |
+
run_pr_tests:
|
| 69 |
+
name: Run PR Tests
|
| 70 |
+
needs: setup_pr_tests
|
| 71 |
+
if: contains(fromJson(needs.setup_pr_tests.outputs.matrix), 'dummy') != true
|
| 72 |
+
strategy:
|
| 73 |
+
fail-fast: false
|
| 74 |
+
max-parallel: 2
|
| 75 |
+
matrix:
|
| 76 |
+
modules: ${{ fromJson(needs.setup_pr_tests.outputs.matrix) }}
|
| 77 |
+
runs-on:
|
| 78 |
+
group: aws-general-8-plus
|
| 79 |
+
container:
|
| 80 |
+
image: diffusers/diffusers-pytorch-cpu
|
| 81 |
+
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
| 82 |
+
defaults:
|
| 83 |
+
run:
|
| 84 |
+
shell: bash
|
| 85 |
+
steps:
|
| 86 |
+
- name: Checkout diffusers
|
| 87 |
+
uses: actions/checkout@v3
|
| 88 |
+
with:
|
| 89 |
+
fetch-depth: 2
|
| 90 |
+
|
| 91 |
+
- name: Install dependencies
|
| 92 |
+
run: |
|
| 93 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 94 |
+
python -m pip install -e [quality,test]
|
| 95 |
+
python -m pip install accelerate
|
| 96 |
+
|
| 97 |
+
- name: Environment
|
| 98 |
+
run: |
|
| 99 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 100 |
+
python utils/print_env.py
|
| 101 |
+
|
| 102 |
+
- name: Run all selected tests on CPU
|
| 103 |
+
run: |
|
| 104 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 105 |
+
python -m pytest -n 2 --dist=loadfile -v --make-reports=${{ matrix.modules }}_tests_cpu ${{ fromJson(needs.setup_pr_tests.outputs.test_map)[matrix.modules] }}
|
| 106 |
+
|
| 107 |
+
- name: Failure short reports
|
| 108 |
+
if: ${{ failure() }}
|
| 109 |
+
continue-on-error: true
|
| 110 |
+
run: |
|
| 111 |
+
cat reports/${{ matrix.modules }}_tests_cpu_stats.txt
|
| 112 |
+
cat reports/${{ matrix.modules }}_tests_cpu_failures_short.txt
|
| 113 |
+
|
| 114 |
+
- name: Test suite reports artifacts
|
| 115 |
+
if: ${{ always() }}
|
| 116 |
+
uses: actions/upload-artifact@v3
|
| 117 |
+
with:
|
| 118 |
+
name: ${{ matrix.modules }}_test_reports
|
| 119 |
+
path: reports
|
| 120 |
+
|
| 121 |
+
run_staging_tests:
|
| 122 |
+
strategy:
|
| 123 |
+
fail-fast: false
|
| 124 |
+
matrix:
|
| 125 |
+
config:
|
| 126 |
+
- name: Hub tests for models, schedulers, and pipelines
|
| 127 |
+
framework: hub_tests_pytorch
|
| 128 |
+
runner: aws-general-8-plus
|
| 129 |
+
image: diffusers/diffusers-pytorch-cpu
|
| 130 |
+
report: torch_hub
|
| 131 |
+
|
| 132 |
+
name: ${{ matrix.config.name }}
|
| 133 |
+
runs-on:
|
| 134 |
+
group: ${{ matrix.config.runner }}
|
| 135 |
+
container:
|
| 136 |
+
image: ${{ matrix.config.image }}
|
| 137 |
+
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
| 138 |
+
|
| 139 |
+
defaults:
|
| 140 |
+
run:
|
| 141 |
+
shell: bash
|
| 142 |
+
|
| 143 |
+
steps:
|
| 144 |
+
- name: Checkout diffusers
|
| 145 |
+
uses: actions/checkout@v3
|
| 146 |
+
with:
|
| 147 |
+
fetch-depth: 2
|
| 148 |
+
|
| 149 |
+
- name: Install dependencies
|
| 150 |
+
run: |
|
| 151 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 152 |
+
python -m pip install -e [quality,test]
|
| 153 |
+
|
| 154 |
+
- name: Environment
|
| 155 |
+
run: |
|
| 156 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 157 |
+
python utils/print_env.py
|
| 158 |
+
|
| 159 |
+
- name: Run Hub tests for models, schedulers, and pipelines on a staging env
|
| 160 |
+
if: ${{ matrix.config.framework == 'hub_tests_pytorch' }}
|
| 161 |
+
run: |
|
| 162 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 163 |
+
HUGGINGFACE_CO_STAGING=true python -m pytest \
|
| 164 |
+
-m "is_staging_test" \
|
| 165 |
+
--make-reports=tests_${{ matrix.config.report }} \
|
| 166 |
+
tests
|
| 167 |
+
|
| 168 |
+
- name: Failure short reports
|
| 169 |
+
if: ${{ failure() }}
|
| 170 |
+
run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt
|
| 171 |
+
|
| 172 |
+
- name: Test suite reports artifacts
|
| 173 |
+
if: ${{ always() }}
|
| 174 |
+
uses: actions/upload-artifact@v4
|
| 175 |
+
with:
|
| 176 |
+
name: pr_${{ matrix.config.report }}_test_reports
|
| 177 |
+
path: reports
|
diffusers/.github/workflows/pr_tests.yml
ADDED
|
@@ -0,0 +1,289 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Fast tests for PRs
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
pull_request:
|
| 5 |
+
branches: [main]
|
| 6 |
+
paths:
|
| 7 |
+
- "src/diffusers/**.py"
|
| 8 |
+
- "benchmarks/**.py"
|
| 9 |
+
- "examples/**.py"
|
| 10 |
+
- "scripts/**.py"
|
| 11 |
+
- "tests/**.py"
|
| 12 |
+
- ".github/**.yml"
|
| 13 |
+
- "utils/**.py"
|
| 14 |
+
- "setup.py"
|
| 15 |
+
push:
|
| 16 |
+
branches:
|
| 17 |
+
- ci-*
|
| 18 |
+
|
| 19 |
+
concurrency:
|
| 20 |
+
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
|
| 21 |
+
cancel-in-progress: true
|
| 22 |
+
|
| 23 |
+
env:
|
| 24 |
+
DIFFUSERS_IS_CI: yes
|
| 25 |
+
HF_HUB_ENABLE_HF_TRANSFER: 1
|
| 26 |
+
OMP_NUM_THREADS: 4
|
| 27 |
+
MKL_NUM_THREADS: 4
|
| 28 |
+
PYTEST_TIMEOUT: 60
|
| 29 |
+
|
| 30 |
+
jobs:
|
| 31 |
+
check_code_quality:
|
| 32 |
+
runs-on: ubuntu-22.04
|
| 33 |
+
steps:
|
| 34 |
+
- uses: actions/checkout@v3
|
| 35 |
+
- name: Set up Python
|
| 36 |
+
uses: actions/setup-python@v4
|
| 37 |
+
with:
|
| 38 |
+
python-version: "3.8"
|
| 39 |
+
- name: Install dependencies
|
| 40 |
+
run: |
|
| 41 |
+
python -m pip install --upgrade pip
|
| 42 |
+
pip install .[quality]
|
| 43 |
+
- name: Check quality
|
| 44 |
+
run: make quality
|
| 45 |
+
- name: Check if failure
|
| 46 |
+
if: ${{ failure() }}
|
| 47 |
+
run: |
|
| 48 |
+
echo "Quality check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make style && make quality'" >> $GITHUB_STEP_SUMMARY
|
| 49 |
+
|
| 50 |
+
check_repository_consistency:
|
| 51 |
+
needs: check_code_quality
|
| 52 |
+
runs-on: ubuntu-22.04
|
| 53 |
+
steps:
|
| 54 |
+
- uses: actions/checkout@v3
|
| 55 |
+
- name: Set up Python
|
| 56 |
+
uses: actions/setup-python@v4
|
| 57 |
+
with:
|
| 58 |
+
python-version: "3.8"
|
| 59 |
+
- name: Install dependencies
|
| 60 |
+
run: |
|
| 61 |
+
python -m pip install --upgrade pip
|
| 62 |
+
pip install .[quality]
|
| 63 |
+
- name: Check repo consistency
|
| 64 |
+
run: |
|
| 65 |
+
python utils/check_copies.py
|
| 66 |
+
python utils/check_dummies.py
|
| 67 |
+
python utils/check_support_list.py
|
| 68 |
+
make deps_table_check_updated
|
| 69 |
+
- name: Check if failure
|
| 70 |
+
if: ${{ failure() }}
|
| 71 |
+
run: |
|
| 72 |
+
echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
|
| 73 |
+
|
| 74 |
+
run_fast_tests:
|
| 75 |
+
needs: [check_code_quality, check_repository_consistency]
|
| 76 |
+
strategy:
|
| 77 |
+
fail-fast: false
|
| 78 |
+
matrix:
|
| 79 |
+
config:
|
| 80 |
+
- name: Fast PyTorch Pipeline CPU tests
|
| 81 |
+
framework: pytorch_pipelines
|
| 82 |
+
runner: aws-highmemory-32-plus
|
| 83 |
+
image: diffusers/diffusers-pytorch-cpu
|
| 84 |
+
report: torch_cpu_pipelines
|
| 85 |
+
- name: Fast PyTorch Models & Schedulers CPU tests
|
| 86 |
+
framework: pytorch_models
|
| 87 |
+
runner: aws-general-8-plus
|
| 88 |
+
image: diffusers/diffusers-pytorch-cpu
|
| 89 |
+
report: torch_cpu_models_schedulers
|
| 90 |
+
- name: PyTorch Example CPU tests
|
| 91 |
+
framework: pytorch_examples
|
| 92 |
+
runner: aws-general-8-plus
|
| 93 |
+
image: diffusers/diffusers-pytorch-cpu
|
| 94 |
+
report: torch_example_cpu
|
| 95 |
+
|
| 96 |
+
name: ${{ matrix.config.name }}
|
| 97 |
+
|
| 98 |
+
runs-on:
|
| 99 |
+
group: ${{ matrix.config.runner }}
|
| 100 |
+
|
| 101 |
+
container:
|
| 102 |
+
image: ${{ matrix.config.image }}
|
| 103 |
+
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
| 104 |
+
|
| 105 |
+
defaults:
|
| 106 |
+
run:
|
| 107 |
+
shell: bash
|
| 108 |
+
|
| 109 |
+
steps:
|
| 110 |
+
- name: Checkout diffusers
|
| 111 |
+
uses: actions/checkout@v3
|
| 112 |
+
with:
|
| 113 |
+
fetch-depth: 2
|
| 114 |
+
|
| 115 |
+
- name: Install dependencies
|
| 116 |
+
run: |
|
| 117 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 118 |
+
python -m uv pip install -e [quality,test]
|
| 119 |
+
pip uninstall transformers -y && python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
|
| 120 |
+
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git --no-deps
|
| 121 |
+
|
| 122 |
+
- name: Environment
|
| 123 |
+
run: |
|
| 124 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 125 |
+
python utils/print_env.py
|
| 126 |
+
|
| 127 |
+
- name: Run fast PyTorch Pipeline CPU tests
|
| 128 |
+
if: ${{ matrix.config.framework == 'pytorch_pipelines' }}
|
| 129 |
+
run: |
|
| 130 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 131 |
+
python -m pytest -n 8 --max-worker-restart=0 --dist=loadfile \
|
| 132 |
+
-s -v -k "not Flax and not Onnx" \
|
| 133 |
+
--make-reports=tests_${{ matrix.config.report }} \
|
| 134 |
+
tests/pipelines
|
| 135 |
+
|
| 136 |
+
- name: Run fast PyTorch Model Scheduler CPU tests
|
| 137 |
+
if: ${{ matrix.config.framework == 'pytorch_models' }}
|
| 138 |
+
run: |
|
| 139 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 140 |
+
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
|
| 141 |
+
-s -v -k "not Flax and not Onnx and not Dependency" \
|
| 142 |
+
--make-reports=tests_${{ matrix.config.report }} \
|
| 143 |
+
tests/models tests/schedulers tests/others
|
| 144 |
+
|
| 145 |
+
- name: Run example PyTorch CPU tests
|
| 146 |
+
if: ${{ matrix.config.framework == 'pytorch_examples' }}
|
| 147 |
+
run: |
|
| 148 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 149 |
+
python -m uv pip install peft timm
|
| 150 |
+
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
|
| 151 |
+
--make-reports=tests_${{ matrix.config.report }} \
|
| 152 |
+
examples
|
| 153 |
+
|
| 154 |
+
- name: Failure short reports
|
| 155 |
+
if: ${{ failure() }}
|
| 156 |
+
run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt
|
| 157 |
+
|
| 158 |
+
- name: Test suite reports artifacts
|
| 159 |
+
if: ${{ always() }}
|
| 160 |
+
uses: actions/upload-artifact@v4
|
| 161 |
+
with:
|
| 162 |
+
name: pr_${{ matrix.config.framework }}_${{ matrix.config.report }}_test_reports
|
| 163 |
+
path: reports
|
| 164 |
+
|
| 165 |
+
run_staging_tests:
|
| 166 |
+
needs: [check_code_quality, check_repository_consistency]
|
| 167 |
+
strategy:
|
| 168 |
+
fail-fast: false
|
| 169 |
+
matrix:
|
| 170 |
+
config:
|
| 171 |
+
- name: Hub tests for models, schedulers, and pipelines
|
| 172 |
+
framework: hub_tests_pytorch
|
| 173 |
+
runner:
|
| 174 |
+
group: aws-general-8-plus
|
| 175 |
+
image: diffusers/diffusers-pytorch-cpu
|
| 176 |
+
report: torch_hub
|
| 177 |
+
|
| 178 |
+
name: ${{ matrix.config.name }}
|
| 179 |
+
|
| 180 |
+
runs-on: ${{ matrix.config.runner }}
|
| 181 |
+
|
| 182 |
+
container:
|
| 183 |
+
image: ${{ matrix.config.image }}
|
| 184 |
+
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
| 185 |
+
|
| 186 |
+
defaults:
|
| 187 |
+
run:
|
| 188 |
+
shell: bash
|
| 189 |
+
|
| 190 |
+
steps:
|
| 191 |
+
- name: Checkout diffusers
|
| 192 |
+
uses: actions/checkout@v3
|
| 193 |
+
with:
|
| 194 |
+
fetch-depth: 2
|
| 195 |
+
|
| 196 |
+
- name: Install dependencies
|
| 197 |
+
run: |
|
| 198 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 199 |
+
python -m uv pip install -e [quality,test]
|
| 200 |
+
|
| 201 |
+
- name: Environment
|
| 202 |
+
run: |
|
| 203 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 204 |
+
python utils/print_env.py
|
| 205 |
+
|
| 206 |
+
- name: Run Hub tests for models, schedulers, and pipelines on a staging env
|
| 207 |
+
if: ${{ matrix.config.framework == 'hub_tests_pytorch' }}
|
| 208 |
+
run: |
|
| 209 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 210 |
+
HUGGINGFACE_CO_STAGING=true python -m pytest \
|
| 211 |
+
-m "is_staging_test" \
|
| 212 |
+
--make-reports=tests_${{ matrix.config.report }} \
|
| 213 |
+
tests
|
| 214 |
+
|
| 215 |
+
- name: Failure short reports
|
| 216 |
+
if: ${{ failure() }}
|
| 217 |
+
run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt
|
| 218 |
+
|
| 219 |
+
- name: Test suite reports artifacts
|
| 220 |
+
if: ${{ always() }}
|
| 221 |
+
uses: actions/upload-artifact@v4
|
| 222 |
+
with:
|
| 223 |
+
name: pr_${{ matrix.config.report }}_test_reports
|
| 224 |
+
path: reports
|
| 225 |
+
|
| 226 |
+
run_lora_tests:
|
| 227 |
+
needs: [check_code_quality, check_repository_consistency]
|
| 228 |
+
strategy:
|
| 229 |
+
fail-fast: false
|
| 230 |
+
|
| 231 |
+
name: LoRA tests with PEFT main
|
| 232 |
+
|
| 233 |
+
runs-on:
|
| 234 |
+
group: aws-general-8-plus
|
| 235 |
+
|
| 236 |
+
container:
|
| 237 |
+
image: diffusers/diffusers-pytorch-cpu
|
| 238 |
+
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
| 239 |
+
|
| 240 |
+
defaults:
|
| 241 |
+
run:
|
| 242 |
+
shell: bash
|
| 243 |
+
|
| 244 |
+
steps:
|
| 245 |
+
- name: Checkout diffusers
|
| 246 |
+
uses: actions/checkout@v3
|
| 247 |
+
with:
|
| 248 |
+
fetch-depth: 2
|
| 249 |
+
|
| 250 |
+
- name: Install dependencies
|
| 251 |
+
run: |
|
| 252 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 253 |
+
python -m uv pip install -e [quality,test]
|
| 254 |
+
# TODO (sayakpaul, DN6): revisit `--no-deps`
|
| 255 |
+
python -m pip install -U peft@git+https://github.com/huggingface/peft.git --no-deps
|
| 256 |
+
python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
|
| 257 |
+
python -m uv pip install -U tokenizers
|
| 258 |
+
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git --no-deps
|
| 259 |
+
|
| 260 |
+
- name: Environment
|
| 261 |
+
run: |
|
| 262 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 263 |
+
python utils/print_env.py
|
| 264 |
+
|
| 265 |
+
- name: Run fast PyTorch LoRA tests with PEFT
|
| 266 |
+
run: |
|
| 267 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 268 |
+
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
|
| 269 |
+
-s -v \
|
| 270 |
+
--make-reports=tests_peft_main \
|
| 271 |
+
tests/lora/
|
| 272 |
+
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
|
| 273 |
+
-s -v \
|
| 274 |
+
--make-reports=tests_models_lora_peft_main \
|
| 275 |
+
tests/models/ -k "lora"
|
| 276 |
+
|
| 277 |
+
- name: Failure short reports
|
| 278 |
+
if: ${{ failure() }}
|
| 279 |
+
run: |
|
| 280 |
+
cat reports/tests_peft_main_failures_short.txt
|
| 281 |
+
cat reports/tests_models_lora_peft_main_failures_short.txt
|
| 282 |
+
|
| 283 |
+
- name: Test suite reports artifacts
|
| 284 |
+
if: ${{ always() }}
|
| 285 |
+
uses: actions/upload-artifact@v4
|
| 286 |
+
with:
|
| 287 |
+
name: pr_main_test_reports
|
| 288 |
+
path: reports
|
| 289 |
+
|
diffusers/.github/workflows/pr_tests_gpu.yml
ADDED
|
@@ -0,0 +1,296 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Fast GPU Tests on PR
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
pull_request:
|
| 5 |
+
branches: main
|
| 6 |
+
paths:
|
| 7 |
+
- "src/diffusers/models/modeling_utils.py"
|
| 8 |
+
- "src/diffusers/models/model_loading_utils.py"
|
| 9 |
+
- "src/diffusers/pipelines/pipeline_utils.py"
|
| 10 |
+
- "src/diffusers/pipeline_loading_utils.py"
|
| 11 |
+
- "src/diffusers/loaders/lora_base.py"
|
| 12 |
+
- "src/diffusers/loaders/lora_pipeline.py"
|
| 13 |
+
- "src/diffusers/loaders/peft.py"
|
| 14 |
+
- "tests/pipelines/test_pipelines_common.py"
|
| 15 |
+
- "tests/models/test_modeling_common.py"
|
| 16 |
+
workflow_dispatch:
|
| 17 |
+
|
| 18 |
+
concurrency:
|
| 19 |
+
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
|
| 20 |
+
cancel-in-progress: true
|
| 21 |
+
|
| 22 |
+
env:
|
| 23 |
+
DIFFUSERS_IS_CI: yes
|
| 24 |
+
OMP_NUM_THREADS: 8
|
| 25 |
+
MKL_NUM_THREADS: 8
|
| 26 |
+
HF_HUB_ENABLE_HF_TRANSFER: 1
|
| 27 |
+
PYTEST_TIMEOUT: 600
|
| 28 |
+
PIPELINE_USAGE_CUTOFF: 1000000000 # set high cutoff so that only always-test pipelines run
|
| 29 |
+
|
| 30 |
+
jobs:
|
| 31 |
+
check_code_quality:
|
| 32 |
+
runs-on: ubuntu-22.04
|
| 33 |
+
steps:
|
| 34 |
+
- uses: actions/checkout@v3
|
| 35 |
+
- name: Set up Python
|
| 36 |
+
uses: actions/setup-python@v4
|
| 37 |
+
with:
|
| 38 |
+
python-version: "3.8"
|
| 39 |
+
- name: Install dependencies
|
| 40 |
+
run: |
|
| 41 |
+
python -m pip install --upgrade pip
|
| 42 |
+
pip install .[quality]
|
| 43 |
+
- name: Check quality
|
| 44 |
+
run: make quality
|
| 45 |
+
- name: Check if failure
|
| 46 |
+
if: ${{ failure() }}
|
| 47 |
+
run: |
|
| 48 |
+
echo "Quality check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make style && make quality'" >> $GITHUB_STEP_SUMMARY
|
| 49 |
+
|
| 50 |
+
check_repository_consistency:
|
| 51 |
+
needs: check_code_quality
|
| 52 |
+
runs-on: ubuntu-22.04
|
| 53 |
+
steps:
|
| 54 |
+
- uses: actions/checkout@v3
|
| 55 |
+
- name: Set up Python
|
| 56 |
+
uses: actions/setup-python@v4
|
| 57 |
+
with:
|
| 58 |
+
python-version: "3.8"
|
| 59 |
+
- name: Install dependencies
|
| 60 |
+
run: |
|
| 61 |
+
python -m pip install --upgrade pip
|
| 62 |
+
pip install .[quality]
|
| 63 |
+
- name: Check repo consistency
|
| 64 |
+
run: |
|
| 65 |
+
python utils/check_copies.py
|
| 66 |
+
python utils/check_dummies.py
|
| 67 |
+
python utils/check_support_list.py
|
| 68 |
+
make deps_table_check_updated
|
| 69 |
+
- name: Check if failure
|
| 70 |
+
if: ${{ failure() }}
|
| 71 |
+
run: |
|
| 72 |
+
echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
|
| 73 |
+
|
| 74 |
+
setup_torch_cuda_pipeline_matrix:
|
| 75 |
+
needs: [check_code_quality, check_repository_consistency]
|
| 76 |
+
name: Setup Torch Pipelines CUDA Slow Tests Matrix
|
| 77 |
+
runs-on:
|
| 78 |
+
group: aws-general-8-plus
|
| 79 |
+
container:
|
| 80 |
+
image: diffusers/diffusers-pytorch-cpu
|
| 81 |
+
outputs:
|
| 82 |
+
pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
|
| 83 |
+
steps:
|
| 84 |
+
- name: Checkout diffusers
|
| 85 |
+
uses: actions/checkout@v3
|
| 86 |
+
with:
|
| 87 |
+
fetch-depth: 2
|
| 88 |
+
- name: Install dependencies
|
| 89 |
+
run: |
|
| 90 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 91 |
+
python -m uv pip install -e [quality,test]
|
| 92 |
+
- name: Environment
|
| 93 |
+
run: |
|
| 94 |
+
python utils/print_env.py
|
| 95 |
+
- name: Fetch Pipeline Matrix
|
| 96 |
+
id: fetch_pipeline_matrix
|
| 97 |
+
run: |
|
| 98 |
+
matrix=$(python utils/fetch_torch_cuda_pipeline_test_matrix.py)
|
| 99 |
+
echo $matrix
|
| 100 |
+
echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
|
| 101 |
+
- name: Pipeline Tests Artifacts
|
| 102 |
+
if: ${{ always() }}
|
| 103 |
+
uses: actions/upload-artifact@v4
|
| 104 |
+
with:
|
| 105 |
+
name: test-pipelines.json
|
| 106 |
+
path: reports
|
| 107 |
+
|
| 108 |
+
torch_pipelines_cuda_tests:
|
| 109 |
+
name: Torch Pipelines CUDA Tests
|
| 110 |
+
needs: setup_torch_cuda_pipeline_matrix
|
| 111 |
+
strategy:
|
| 112 |
+
fail-fast: false
|
| 113 |
+
max-parallel: 8
|
| 114 |
+
matrix:
|
| 115 |
+
module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
|
| 116 |
+
runs-on:
|
| 117 |
+
group: aws-g4dn-2xlarge
|
| 118 |
+
container:
|
| 119 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 120 |
+
options: --shm-size "16gb" --ipc host --gpus 0
|
| 121 |
+
steps:
|
| 122 |
+
- name: Checkout diffusers
|
| 123 |
+
uses: actions/checkout@v3
|
| 124 |
+
with:
|
| 125 |
+
fetch-depth: 2
|
| 126 |
+
|
| 127 |
+
- name: NVIDIA-SMI
|
| 128 |
+
run: |
|
| 129 |
+
nvidia-smi
|
| 130 |
+
- name: Install dependencies
|
| 131 |
+
run: |
|
| 132 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 133 |
+
python -m uv pip install -e [quality,test]
|
| 134 |
+
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
| 135 |
+
pip uninstall transformers -y && python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
|
| 136 |
+
|
| 137 |
+
- name: Environment
|
| 138 |
+
run: |
|
| 139 |
+
python utils/print_env.py
|
| 140 |
+
- name: Extract tests
|
| 141 |
+
id: extract_tests
|
| 142 |
+
run: |
|
| 143 |
+
pattern=$(python utils/extract_tests_from_mixin.py --type pipeline)
|
| 144 |
+
echo "$pattern" > /tmp/test_pattern.txt
|
| 145 |
+
echo "pattern_file=/tmp/test_pattern.txt" >> $GITHUB_OUTPUT
|
| 146 |
+
|
| 147 |
+
- name: PyTorch CUDA checkpoint tests on Ubuntu
|
| 148 |
+
env:
|
| 149 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 150 |
+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
| 151 |
+
CUBLAS_WORKSPACE_CONFIG: :16:8
|
| 152 |
+
run: |
|
| 153 |
+
if [ "${{ matrix.module }}" = "ip_adapters" ]; then
|
| 154 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
| 155 |
+
-s -v -k "not Flax and not Onnx" \
|
| 156 |
+
--make-reports=tests_pipeline_${{ matrix.module }}_cuda \
|
| 157 |
+
tests/pipelines/${{ matrix.module }}
|
| 158 |
+
else
|
| 159 |
+
pattern=$(cat ${{ steps.extract_tests.outputs.pattern_file }})
|
| 160 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
| 161 |
+
-s -v -k "not Flax and not Onnx and $pattern" \
|
| 162 |
+
--make-reports=tests_pipeline_${{ matrix.module }}_cuda \
|
| 163 |
+
tests/pipelines/${{ matrix.module }}
|
| 164 |
+
fi
|
| 165 |
+
|
| 166 |
+
- name: Failure short reports
|
| 167 |
+
if: ${{ failure() }}
|
| 168 |
+
run: |
|
| 169 |
+
cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
|
| 170 |
+
cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
|
| 171 |
+
- name: Test suite reports artifacts
|
| 172 |
+
if: ${{ always() }}
|
| 173 |
+
uses: actions/upload-artifact@v4
|
| 174 |
+
with:
|
| 175 |
+
name: pipeline_${{ matrix.module }}_test_reports
|
| 176 |
+
path: reports
|
| 177 |
+
|
| 178 |
+
torch_cuda_tests:
|
| 179 |
+
name: Torch CUDA Tests
|
| 180 |
+
needs: [check_code_quality, check_repository_consistency]
|
| 181 |
+
runs-on:
|
| 182 |
+
group: aws-g4dn-2xlarge
|
| 183 |
+
container:
|
| 184 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 185 |
+
options: --shm-size "16gb" --ipc host --gpus 0
|
| 186 |
+
defaults:
|
| 187 |
+
run:
|
| 188 |
+
shell: bash
|
| 189 |
+
strategy:
|
| 190 |
+
fail-fast: false
|
| 191 |
+
max-parallel: 4
|
| 192 |
+
matrix:
|
| 193 |
+
module: [models, schedulers, lora, others]
|
| 194 |
+
steps:
|
| 195 |
+
- name: Checkout diffusers
|
| 196 |
+
uses: actions/checkout@v3
|
| 197 |
+
with:
|
| 198 |
+
fetch-depth: 2
|
| 199 |
+
|
| 200 |
+
- name: Install dependencies
|
| 201 |
+
run: |
|
| 202 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 203 |
+
python -m uv pip install -e [quality,test]
|
| 204 |
+
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
|
| 205 |
+
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
| 206 |
+
pip uninstall transformers -y && python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
|
| 207 |
+
|
| 208 |
+
- name: Environment
|
| 209 |
+
run: |
|
| 210 |
+
python utils/print_env.py
|
| 211 |
+
|
| 212 |
+
- name: Extract tests
|
| 213 |
+
id: extract_tests
|
| 214 |
+
run: |
|
| 215 |
+
pattern=$(python utils/extract_tests_from_mixin.py --type ${{ matrix.module }})
|
| 216 |
+
echo "$pattern" > /tmp/test_pattern.txt
|
| 217 |
+
echo "pattern_file=/tmp/test_pattern.txt" >> $GITHUB_OUTPUT
|
| 218 |
+
|
| 219 |
+
- name: Run PyTorch CUDA tests
|
| 220 |
+
env:
|
| 221 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 222 |
+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
| 223 |
+
CUBLAS_WORKSPACE_CONFIG: :16:8
|
| 224 |
+
run: |
|
| 225 |
+
pattern=$(cat ${{ steps.extract_tests.outputs.pattern_file }})
|
| 226 |
+
if [ -z "$pattern" ]; then
|
| 227 |
+
python -m pytest -n 1 -sv --max-worker-restart=0 --dist=loadfile -k "not Flax and not Onnx" tests/${{ matrix.module }} \
|
| 228 |
+
--make-reports=tests_torch_cuda_${{ matrix.module }}
|
| 229 |
+
else
|
| 230 |
+
python -m pytest -n 1 -sv --max-worker-restart=0 --dist=loadfile -k "not Flax and not Onnx and $pattern" tests/${{ matrix.module }} \
|
| 231 |
+
--make-reports=tests_torch_cuda_${{ matrix.module }}
|
| 232 |
+
fi
|
| 233 |
+
|
| 234 |
+
- name: Failure short reports
|
| 235 |
+
if: ${{ failure() }}
|
| 236 |
+
run: |
|
| 237 |
+
cat reports/tests_torch_cuda_${{ matrix.module }}_stats.txt
|
| 238 |
+
cat reports/tests_torch_cuda_${{ matrix.module }}_failures_short.txt
|
| 239 |
+
|
| 240 |
+
- name: Test suite reports artifacts
|
| 241 |
+
if: ${{ always() }}
|
| 242 |
+
uses: actions/upload-artifact@v4
|
| 243 |
+
with:
|
| 244 |
+
name: torch_cuda_test_reports_${{ matrix.module }}
|
| 245 |
+
path: reports
|
| 246 |
+
|
| 247 |
+
run_examples_tests:
|
| 248 |
+
name: Examples PyTorch CUDA tests on Ubuntu
|
| 249 |
+
needs: [check_code_quality, check_repository_consistency]
|
| 250 |
+
runs-on:
|
| 251 |
+
group: aws-g4dn-2xlarge
|
| 252 |
+
|
| 253 |
+
container:
|
| 254 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 255 |
+
options: --gpus 0 --shm-size "16gb" --ipc host
|
| 256 |
+
steps:
|
| 257 |
+
- name: Checkout diffusers
|
| 258 |
+
uses: actions/checkout@v3
|
| 259 |
+
with:
|
| 260 |
+
fetch-depth: 2
|
| 261 |
+
|
| 262 |
+
- name: NVIDIA-SMI
|
| 263 |
+
run: |
|
| 264 |
+
nvidia-smi
|
| 265 |
+
- name: Install dependencies
|
| 266 |
+
run: |
|
| 267 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 268 |
+
pip uninstall transformers -y && python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
|
| 269 |
+
python -m uv pip install -e [quality,test,training]
|
| 270 |
+
|
| 271 |
+
- name: Environment
|
| 272 |
+
run: |
|
| 273 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 274 |
+
python utils/print_env.py
|
| 275 |
+
|
| 276 |
+
- name: Run example tests on GPU
|
| 277 |
+
env:
|
| 278 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 279 |
+
run: |
|
| 280 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 281 |
+
python -m uv pip install timm
|
| 282 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=examples_torch_cuda examples/
|
| 283 |
+
|
| 284 |
+
- name: Failure short reports
|
| 285 |
+
if: ${{ failure() }}
|
| 286 |
+
run: |
|
| 287 |
+
cat reports/examples_torch_cuda_stats.txt
|
| 288 |
+
cat reports/examples_torch_cuda_failures_short.txt
|
| 289 |
+
|
| 290 |
+
- name: Test suite reports artifacts
|
| 291 |
+
if: ${{ always() }}
|
| 292 |
+
uses: actions/upload-artifact@v4
|
| 293 |
+
with:
|
| 294 |
+
name: examples_test_reports
|
| 295 |
+
path: reports
|
| 296 |
+
|
diffusers/.github/workflows/pr_torch_dependency_test.yml
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Run Torch dependency tests
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
pull_request:
|
| 5 |
+
branches:
|
| 6 |
+
- main
|
| 7 |
+
paths:
|
| 8 |
+
- "src/diffusers/**.py"
|
| 9 |
+
push:
|
| 10 |
+
branches:
|
| 11 |
+
- main
|
| 12 |
+
|
| 13 |
+
concurrency:
|
| 14 |
+
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
|
| 15 |
+
cancel-in-progress: true
|
| 16 |
+
|
| 17 |
+
jobs:
|
| 18 |
+
check_torch_dependencies:
|
| 19 |
+
runs-on: ubuntu-22.04
|
| 20 |
+
steps:
|
| 21 |
+
- uses: actions/checkout@v3
|
| 22 |
+
- name: Set up Python
|
| 23 |
+
uses: actions/setup-python@v4
|
| 24 |
+
with:
|
| 25 |
+
python-version: "3.8"
|
| 26 |
+
- name: Install dependencies
|
| 27 |
+
run: |
|
| 28 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 29 |
+
python -m pip install --upgrade pip uv
|
| 30 |
+
python -m uv pip install -e .
|
| 31 |
+
python -m uv pip install torch torchvision torchaudio
|
| 32 |
+
python -m uv pip install pytest
|
| 33 |
+
- name: Check for soft dependencies
|
| 34 |
+
run: |
|
| 35 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 36 |
+
pytest tests/others/test_dependencies.py
|
diffusers/.github/workflows/push_tests.yml
ADDED
|
@@ -0,0 +1,294 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Fast GPU Tests on main
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
workflow_dispatch:
|
| 5 |
+
push:
|
| 6 |
+
branches:
|
| 7 |
+
- main
|
| 8 |
+
paths:
|
| 9 |
+
- "src/diffusers/**.py"
|
| 10 |
+
- "examples/**.py"
|
| 11 |
+
- "tests/**.py"
|
| 12 |
+
|
| 13 |
+
env:
|
| 14 |
+
DIFFUSERS_IS_CI: yes
|
| 15 |
+
OMP_NUM_THREADS: 8
|
| 16 |
+
MKL_NUM_THREADS: 8
|
| 17 |
+
HF_HUB_ENABLE_HF_TRANSFER: 1
|
| 18 |
+
PYTEST_TIMEOUT: 600
|
| 19 |
+
PIPELINE_USAGE_CUTOFF: 50000
|
| 20 |
+
|
| 21 |
+
jobs:
|
| 22 |
+
setup_torch_cuda_pipeline_matrix:
|
| 23 |
+
name: Setup Torch Pipelines CUDA Slow Tests Matrix
|
| 24 |
+
runs-on:
|
| 25 |
+
group: aws-general-8-plus
|
| 26 |
+
container:
|
| 27 |
+
image: diffusers/diffusers-pytorch-cpu
|
| 28 |
+
outputs:
|
| 29 |
+
pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
|
| 30 |
+
steps:
|
| 31 |
+
- name: Checkout diffusers
|
| 32 |
+
uses: actions/checkout@v3
|
| 33 |
+
with:
|
| 34 |
+
fetch-depth: 2
|
| 35 |
+
- name: Install dependencies
|
| 36 |
+
run: |
|
| 37 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 38 |
+
python -m uv pip install -e [quality,test]
|
| 39 |
+
- name: Environment
|
| 40 |
+
run: |
|
| 41 |
+
python utils/print_env.py
|
| 42 |
+
- name: Fetch Pipeline Matrix
|
| 43 |
+
id: fetch_pipeline_matrix
|
| 44 |
+
run: |
|
| 45 |
+
matrix=$(python utils/fetch_torch_cuda_pipeline_test_matrix.py)
|
| 46 |
+
echo $matrix
|
| 47 |
+
echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
|
| 48 |
+
- name: Pipeline Tests Artifacts
|
| 49 |
+
if: ${{ always() }}
|
| 50 |
+
uses: actions/upload-artifact@v4
|
| 51 |
+
with:
|
| 52 |
+
name: test-pipelines.json
|
| 53 |
+
path: reports
|
| 54 |
+
|
| 55 |
+
torch_pipelines_cuda_tests:
|
| 56 |
+
name: Torch Pipelines CUDA Tests
|
| 57 |
+
needs: setup_torch_cuda_pipeline_matrix
|
| 58 |
+
strategy:
|
| 59 |
+
fail-fast: false
|
| 60 |
+
max-parallel: 8
|
| 61 |
+
matrix:
|
| 62 |
+
module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
|
| 63 |
+
runs-on:
|
| 64 |
+
group: aws-g4dn-2xlarge
|
| 65 |
+
container:
|
| 66 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 67 |
+
options: --shm-size "16gb" --ipc host --gpus 0
|
| 68 |
+
steps:
|
| 69 |
+
- name: Checkout diffusers
|
| 70 |
+
uses: actions/checkout@v3
|
| 71 |
+
with:
|
| 72 |
+
fetch-depth: 2
|
| 73 |
+
- name: NVIDIA-SMI
|
| 74 |
+
run: |
|
| 75 |
+
nvidia-smi
|
| 76 |
+
- name: Install dependencies
|
| 77 |
+
run: |
|
| 78 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 79 |
+
python -m uv pip install -e [quality,test]
|
| 80 |
+
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
| 81 |
+
- name: Environment
|
| 82 |
+
run: |
|
| 83 |
+
python utils/print_env.py
|
| 84 |
+
- name: PyTorch CUDA checkpoint tests on Ubuntu
|
| 85 |
+
env:
|
| 86 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 87 |
+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
| 88 |
+
CUBLAS_WORKSPACE_CONFIG: :16:8
|
| 89 |
+
run: |
|
| 90 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
| 91 |
+
-s -v -k "not Flax and not Onnx" \
|
| 92 |
+
--make-reports=tests_pipeline_${{ matrix.module }}_cuda \
|
| 93 |
+
tests/pipelines/${{ matrix.module }}
|
| 94 |
+
- name: Failure short reports
|
| 95 |
+
if: ${{ failure() }}
|
| 96 |
+
run: |
|
| 97 |
+
cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
|
| 98 |
+
cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
|
| 99 |
+
- name: Test suite reports artifacts
|
| 100 |
+
if: ${{ always() }}
|
| 101 |
+
uses: actions/upload-artifact@v4
|
| 102 |
+
with:
|
| 103 |
+
name: pipeline_${{ matrix.module }}_test_reports
|
| 104 |
+
path: reports
|
| 105 |
+
|
| 106 |
+
torch_cuda_tests:
|
| 107 |
+
name: Torch CUDA Tests
|
| 108 |
+
runs-on:
|
| 109 |
+
group: aws-g4dn-2xlarge
|
| 110 |
+
container:
|
| 111 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 112 |
+
options: --shm-size "16gb" --ipc host --gpus 0
|
| 113 |
+
defaults:
|
| 114 |
+
run:
|
| 115 |
+
shell: bash
|
| 116 |
+
strategy:
|
| 117 |
+
fail-fast: false
|
| 118 |
+
max-parallel: 2
|
| 119 |
+
matrix:
|
| 120 |
+
module: [models, schedulers, lora, others, single_file]
|
| 121 |
+
steps:
|
| 122 |
+
- name: Checkout diffusers
|
| 123 |
+
uses: actions/checkout@v3
|
| 124 |
+
with:
|
| 125 |
+
fetch-depth: 2
|
| 126 |
+
|
| 127 |
+
- name: Install dependencies
|
| 128 |
+
run: |
|
| 129 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 130 |
+
python -m uv pip install -e [quality,test]
|
| 131 |
+
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
|
| 132 |
+
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
| 133 |
+
|
| 134 |
+
- name: Environment
|
| 135 |
+
run: |
|
| 136 |
+
python utils/print_env.py
|
| 137 |
+
|
| 138 |
+
- name: Run PyTorch CUDA tests
|
| 139 |
+
env:
|
| 140 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 141 |
+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
| 142 |
+
CUBLAS_WORKSPACE_CONFIG: :16:8
|
| 143 |
+
run: |
|
| 144 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
| 145 |
+
-s -v -k "not Flax and not Onnx" \
|
| 146 |
+
--make-reports=tests_torch_cuda_${{ matrix.module }} \
|
| 147 |
+
tests/${{ matrix.module }}
|
| 148 |
+
|
| 149 |
+
- name: Failure short reports
|
| 150 |
+
if: ${{ failure() }}
|
| 151 |
+
run: |
|
| 152 |
+
cat reports/tests_torch_cuda_${{ matrix.module }}_stats.txt
|
| 153 |
+
cat reports/tests_torch_cuda_${{ matrix.module }}_failures_short.txt
|
| 154 |
+
|
| 155 |
+
- name: Test suite reports artifacts
|
| 156 |
+
if: ${{ always() }}
|
| 157 |
+
uses: actions/upload-artifact@v4
|
| 158 |
+
with:
|
| 159 |
+
name: torch_cuda_test_reports_${{ matrix.module }}
|
| 160 |
+
path: reports
|
| 161 |
+
|
| 162 |
+
run_torch_compile_tests:
|
| 163 |
+
name: PyTorch Compile CUDA tests
|
| 164 |
+
|
| 165 |
+
runs-on:
|
| 166 |
+
group: aws-g4dn-2xlarge
|
| 167 |
+
|
| 168 |
+
container:
|
| 169 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 170 |
+
options: --gpus 0 --shm-size "16gb" --ipc host
|
| 171 |
+
|
| 172 |
+
steps:
|
| 173 |
+
- name: Checkout diffusers
|
| 174 |
+
uses: actions/checkout@v3
|
| 175 |
+
with:
|
| 176 |
+
fetch-depth: 2
|
| 177 |
+
|
| 178 |
+
- name: NVIDIA-SMI
|
| 179 |
+
run: |
|
| 180 |
+
nvidia-smi
|
| 181 |
+
- name: Install dependencies
|
| 182 |
+
run: |
|
| 183 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 184 |
+
python -m uv pip install -e [quality,test,training]
|
| 185 |
+
- name: Environment
|
| 186 |
+
run: |
|
| 187 |
+
python utils/print_env.py
|
| 188 |
+
- name: Run example tests on GPU
|
| 189 |
+
env:
|
| 190 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 191 |
+
RUN_COMPILE: yes
|
| 192 |
+
run: |
|
| 193 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
|
| 194 |
+
- name: Failure short reports
|
| 195 |
+
if: ${{ failure() }}
|
| 196 |
+
run: cat reports/tests_torch_compile_cuda_failures_short.txt
|
| 197 |
+
|
| 198 |
+
- name: Test suite reports artifacts
|
| 199 |
+
if: ${{ always() }}
|
| 200 |
+
uses: actions/upload-artifact@v4
|
| 201 |
+
with:
|
| 202 |
+
name: torch_compile_test_reports
|
| 203 |
+
path: reports
|
| 204 |
+
|
| 205 |
+
run_xformers_tests:
|
| 206 |
+
name: PyTorch xformers CUDA tests
|
| 207 |
+
|
| 208 |
+
runs-on:
|
| 209 |
+
group: aws-g4dn-2xlarge
|
| 210 |
+
|
| 211 |
+
container:
|
| 212 |
+
image: diffusers/diffusers-pytorch-xformers-cuda
|
| 213 |
+
options: --gpus 0 --shm-size "16gb" --ipc host
|
| 214 |
+
|
| 215 |
+
steps:
|
| 216 |
+
- name: Checkout diffusers
|
| 217 |
+
uses: actions/checkout@v3
|
| 218 |
+
with:
|
| 219 |
+
fetch-depth: 2
|
| 220 |
+
|
| 221 |
+
- name: NVIDIA-SMI
|
| 222 |
+
run: |
|
| 223 |
+
nvidia-smi
|
| 224 |
+
- name: Install dependencies
|
| 225 |
+
run: |
|
| 226 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 227 |
+
python -m uv pip install -e [quality,test,training]
|
| 228 |
+
- name: Environment
|
| 229 |
+
run: |
|
| 230 |
+
python utils/print_env.py
|
| 231 |
+
- name: Run example tests on GPU
|
| 232 |
+
env:
|
| 233 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 234 |
+
run: |
|
| 235 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "xformers" --make-reports=tests_torch_xformers_cuda tests/
|
| 236 |
+
- name: Failure short reports
|
| 237 |
+
if: ${{ failure() }}
|
| 238 |
+
run: cat reports/tests_torch_xformers_cuda_failures_short.txt
|
| 239 |
+
|
| 240 |
+
- name: Test suite reports artifacts
|
| 241 |
+
if: ${{ always() }}
|
| 242 |
+
uses: actions/upload-artifact@v4
|
| 243 |
+
with:
|
| 244 |
+
name: torch_xformers_test_reports
|
| 245 |
+
path: reports
|
| 246 |
+
|
| 247 |
+
run_examples_tests:
|
| 248 |
+
name: Examples PyTorch CUDA tests on Ubuntu
|
| 249 |
+
|
| 250 |
+
runs-on:
|
| 251 |
+
group: aws-g4dn-2xlarge
|
| 252 |
+
|
| 253 |
+
container:
|
| 254 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 255 |
+
options: --gpus 0 --shm-size "16gb" --ipc host
|
| 256 |
+
steps:
|
| 257 |
+
- name: Checkout diffusers
|
| 258 |
+
uses: actions/checkout@v3
|
| 259 |
+
with:
|
| 260 |
+
fetch-depth: 2
|
| 261 |
+
|
| 262 |
+
- name: NVIDIA-SMI
|
| 263 |
+
run: |
|
| 264 |
+
nvidia-smi
|
| 265 |
+
- name: Install dependencies
|
| 266 |
+
run: |
|
| 267 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 268 |
+
python -m uv pip install -e [quality,test,training]
|
| 269 |
+
|
| 270 |
+
- name: Environment
|
| 271 |
+
run: |
|
| 272 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 273 |
+
python utils/print_env.py
|
| 274 |
+
|
| 275 |
+
- name: Run example tests on GPU
|
| 276 |
+
env:
|
| 277 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 278 |
+
run: |
|
| 279 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 280 |
+
python -m uv pip install timm
|
| 281 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=examples_torch_cuda examples/
|
| 282 |
+
|
| 283 |
+
- name: Failure short reports
|
| 284 |
+
if: ${{ failure() }}
|
| 285 |
+
run: |
|
| 286 |
+
cat reports/examples_torch_cuda_stats.txt
|
| 287 |
+
cat reports/examples_torch_cuda_failures_short.txt
|
| 288 |
+
|
| 289 |
+
- name: Test suite reports artifacts
|
| 290 |
+
if: ${{ always() }}
|
| 291 |
+
uses: actions/upload-artifact@v4
|
| 292 |
+
with:
|
| 293 |
+
name: examples_test_reports
|
| 294 |
+
path: reports
|
diffusers/.github/workflows/push_tests_fast.yml
ADDED
|
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Fast tests on main
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
push:
|
| 5 |
+
branches:
|
| 6 |
+
- main
|
| 7 |
+
paths:
|
| 8 |
+
- "src/diffusers/**.py"
|
| 9 |
+
- "examples/**.py"
|
| 10 |
+
- "tests/**.py"
|
| 11 |
+
|
| 12 |
+
concurrency:
|
| 13 |
+
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
|
| 14 |
+
cancel-in-progress: true
|
| 15 |
+
|
| 16 |
+
env:
|
| 17 |
+
DIFFUSERS_IS_CI: yes
|
| 18 |
+
HF_HOME: /mnt/cache
|
| 19 |
+
OMP_NUM_THREADS: 8
|
| 20 |
+
MKL_NUM_THREADS: 8
|
| 21 |
+
HF_HUB_ENABLE_HF_TRANSFER: 1
|
| 22 |
+
PYTEST_TIMEOUT: 600
|
| 23 |
+
RUN_SLOW: no
|
| 24 |
+
|
| 25 |
+
jobs:
|
| 26 |
+
run_fast_tests:
|
| 27 |
+
strategy:
|
| 28 |
+
fail-fast: false
|
| 29 |
+
matrix:
|
| 30 |
+
config:
|
| 31 |
+
- name: Fast PyTorch CPU tests on Ubuntu
|
| 32 |
+
framework: pytorch
|
| 33 |
+
runner: aws-general-8-plus
|
| 34 |
+
image: diffusers/diffusers-pytorch-cpu
|
| 35 |
+
report: torch_cpu
|
| 36 |
+
- name: PyTorch Example CPU tests on Ubuntu
|
| 37 |
+
framework: pytorch_examples
|
| 38 |
+
runner: aws-general-8-plus
|
| 39 |
+
image: diffusers/diffusers-pytorch-cpu
|
| 40 |
+
report: torch_example_cpu
|
| 41 |
+
|
| 42 |
+
name: ${{ matrix.config.name }}
|
| 43 |
+
|
| 44 |
+
runs-on:
|
| 45 |
+
group: ${{ matrix.config.runner }}
|
| 46 |
+
|
| 47 |
+
container:
|
| 48 |
+
image: ${{ matrix.config.image }}
|
| 49 |
+
options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
|
| 50 |
+
|
| 51 |
+
defaults:
|
| 52 |
+
run:
|
| 53 |
+
shell: bash
|
| 54 |
+
|
| 55 |
+
steps:
|
| 56 |
+
- name: Checkout diffusers
|
| 57 |
+
uses: actions/checkout@v3
|
| 58 |
+
with:
|
| 59 |
+
fetch-depth: 2
|
| 60 |
+
|
| 61 |
+
- name: Install dependencies
|
| 62 |
+
run: |
|
| 63 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 64 |
+
python -m uv pip install -e [quality,test]
|
| 65 |
+
|
| 66 |
+
- name: Environment
|
| 67 |
+
run: |
|
| 68 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 69 |
+
python utils/print_env.py
|
| 70 |
+
|
| 71 |
+
- name: Run fast PyTorch CPU tests
|
| 72 |
+
if: ${{ matrix.config.framework == 'pytorch' }}
|
| 73 |
+
run: |
|
| 74 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 75 |
+
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
|
| 76 |
+
-s -v -k "not Flax and not Onnx" \
|
| 77 |
+
--make-reports=tests_${{ matrix.config.report }} \
|
| 78 |
+
tests/
|
| 79 |
+
|
| 80 |
+
- name: Run example PyTorch CPU tests
|
| 81 |
+
if: ${{ matrix.config.framework == 'pytorch_examples' }}
|
| 82 |
+
run: |
|
| 83 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 84 |
+
python -m uv pip install peft timm
|
| 85 |
+
python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
|
| 86 |
+
--make-reports=tests_${{ matrix.config.report }} \
|
| 87 |
+
examples
|
| 88 |
+
|
| 89 |
+
- name: Failure short reports
|
| 90 |
+
if: ${{ failure() }}
|
| 91 |
+
run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt
|
| 92 |
+
|
| 93 |
+
- name: Test suite reports artifacts
|
| 94 |
+
if: ${{ always() }}
|
| 95 |
+
uses: actions/upload-artifact@v4
|
| 96 |
+
with:
|
| 97 |
+
name: pr_${{ matrix.config.report }}_test_reports
|
| 98 |
+
path: reports
|
diffusers/.github/workflows/push_tests_mps.yml
ADDED
|
@@ -0,0 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Fast mps tests on main
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
workflow_dispatch:
|
| 5 |
+
|
| 6 |
+
env:
|
| 7 |
+
DIFFUSERS_IS_CI: yes
|
| 8 |
+
HF_HOME: /mnt/cache
|
| 9 |
+
OMP_NUM_THREADS: 8
|
| 10 |
+
MKL_NUM_THREADS: 8
|
| 11 |
+
HF_HUB_ENABLE_HF_TRANSFER: 1
|
| 12 |
+
PYTEST_TIMEOUT: 600
|
| 13 |
+
RUN_SLOW: no
|
| 14 |
+
|
| 15 |
+
concurrency:
|
| 16 |
+
group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
|
| 17 |
+
cancel-in-progress: true
|
| 18 |
+
|
| 19 |
+
jobs:
|
| 20 |
+
run_fast_tests_apple_m1:
|
| 21 |
+
name: Fast PyTorch MPS tests on MacOS
|
| 22 |
+
runs-on: macos-13-xlarge
|
| 23 |
+
|
| 24 |
+
steps:
|
| 25 |
+
- name: Checkout diffusers
|
| 26 |
+
uses: actions/checkout@v3
|
| 27 |
+
with:
|
| 28 |
+
fetch-depth: 2
|
| 29 |
+
|
| 30 |
+
- name: Clean checkout
|
| 31 |
+
shell: arch -arch arm64 bash {0}
|
| 32 |
+
run: |
|
| 33 |
+
git clean -fxd
|
| 34 |
+
|
| 35 |
+
- name: Setup miniconda
|
| 36 |
+
uses: ./.github/actions/setup-miniconda
|
| 37 |
+
with:
|
| 38 |
+
python-version: 3.9
|
| 39 |
+
|
| 40 |
+
- name: Install dependencies
|
| 41 |
+
shell: arch -arch arm64 bash {0}
|
| 42 |
+
run: |
|
| 43 |
+
${CONDA_RUN} python -m pip install --upgrade pip uv
|
| 44 |
+
${CONDA_RUN} python -m uv pip install -e ".[quality,test]"
|
| 45 |
+
${CONDA_RUN} python -m uv pip install torch torchvision torchaudio
|
| 46 |
+
${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
|
| 47 |
+
${CONDA_RUN} python -m uv pip install transformers --upgrade
|
| 48 |
+
|
| 49 |
+
- name: Environment
|
| 50 |
+
shell: arch -arch arm64 bash {0}
|
| 51 |
+
run: |
|
| 52 |
+
${CONDA_RUN} python utils/print_env.py
|
| 53 |
+
|
| 54 |
+
- name: Run fast PyTorch tests on M1 (MPS)
|
| 55 |
+
shell: arch -arch arm64 bash {0}
|
| 56 |
+
env:
|
| 57 |
+
HF_HOME: /System/Volumes/Data/mnt/cache
|
| 58 |
+
HF_TOKEN: ${{ secrets.HF_TOKEN }}
|
| 59 |
+
run: |
|
| 60 |
+
${CONDA_RUN} python -m pytest -n 0 -s -v --make-reports=tests_torch_mps tests/
|
| 61 |
+
|
| 62 |
+
- name: Failure short reports
|
| 63 |
+
if: ${{ failure() }}
|
| 64 |
+
run: cat reports/tests_torch_mps_failures_short.txt
|
| 65 |
+
|
| 66 |
+
- name: Test suite reports artifacts
|
| 67 |
+
if: ${{ always() }}
|
| 68 |
+
uses: actions/upload-artifact@v4
|
| 69 |
+
with:
|
| 70 |
+
name: pr_torch_mps_test_reports
|
| 71 |
+
path: reports
|
diffusers/.github/workflows/pypi_publish.yaml
ADDED
|
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Adapted from https://blog.deepjyoti30.dev/pypi-release-github-action
|
| 2 |
+
|
| 3 |
+
name: PyPI release
|
| 4 |
+
|
| 5 |
+
on:
|
| 6 |
+
workflow_dispatch:
|
| 7 |
+
push:
|
| 8 |
+
tags:
|
| 9 |
+
- "*"
|
| 10 |
+
|
| 11 |
+
jobs:
|
| 12 |
+
find-and-checkout-latest-branch:
|
| 13 |
+
runs-on: ubuntu-22.04
|
| 14 |
+
outputs:
|
| 15 |
+
latest_branch: ${{ steps.set_latest_branch.outputs.latest_branch }}
|
| 16 |
+
steps:
|
| 17 |
+
- name: Checkout Repo
|
| 18 |
+
uses: actions/checkout@v3
|
| 19 |
+
|
| 20 |
+
- name: Set up Python
|
| 21 |
+
uses: actions/setup-python@v4
|
| 22 |
+
with:
|
| 23 |
+
python-version: '3.8'
|
| 24 |
+
|
| 25 |
+
- name: Fetch latest branch
|
| 26 |
+
id: fetch_latest_branch
|
| 27 |
+
run: |
|
| 28 |
+
pip install -U requests packaging
|
| 29 |
+
LATEST_BRANCH=$(python utils/fetch_latest_release_branch.py)
|
| 30 |
+
echo "Latest branch: $LATEST_BRANCH"
|
| 31 |
+
echo "latest_branch=$LATEST_BRANCH" >> $GITHUB_ENV
|
| 32 |
+
|
| 33 |
+
- name: Set latest branch output
|
| 34 |
+
id: set_latest_branch
|
| 35 |
+
run: echo "::set-output name=latest_branch::${{ env.latest_branch }}"
|
| 36 |
+
|
| 37 |
+
release:
|
| 38 |
+
needs: find-and-checkout-latest-branch
|
| 39 |
+
runs-on: ubuntu-22.04
|
| 40 |
+
|
| 41 |
+
steps:
|
| 42 |
+
- name: Checkout Repo
|
| 43 |
+
uses: actions/checkout@v3
|
| 44 |
+
with:
|
| 45 |
+
ref: ${{ needs.find-and-checkout-latest-branch.outputs.latest_branch }}
|
| 46 |
+
|
| 47 |
+
- name: Setup Python
|
| 48 |
+
uses: actions/setup-python@v4
|
| 49 |
+
with:
|
| 50 |
+
python-version: "3.8"
|
| 51 |
+
|
| 52 |
+
- name: Install dependencies
|
| 53 |
+
run: |
|
| 54 |
+
python -m pip install --upgrade pip
|
| 55 |
+
pip install -U setuptools wheel twine
|
| 56 |
+
pip install -U torch --index-url https://download.pytorch.org/whl/cpu
|
| 57 |
+
pip install -U transformers
|
| 58 |
+
|
| 59 |
+
- name: Build the dist files
|
| 60 |
+
run: python setup.py bdist_wheel && python setup.py sdist
|
| 61 |
+
|
| 62 |
+
- name: Publish to the test PyPI
|
| 63 |
+
env:
|
| 64 |
+
TWINE_USERNAME: ${{ secrets.TEST_PYPI_USERNAME }}
|
| 65 |
+
TWINE_PASSWORD: ${{ secrets.TEST_PYPI_PASSWORD }}
|
| 66 |
+
run: twine upload dist/* -r pypitest --repository-url=https://test.pypi.org/legacy/
|
| 67 |
+
|
| 68 |
+
- name: Test installing diffusers and importing
|
| 69 |
+
run: |
|
| 70 |
+
pip install diffusers && pip uninstall diffusers -y
|
| 71 |
+
pip install -i https://test.pypi.org/simple/ diffusers
|
| 72 |
+
python -c "from diffusers import __version__; print(__version__)"
|
| 73 |
+
python -c "from diffusers import DiffusionPipeline; pipe = DiffusionPipeline.from_pretrained('fusing/unet-ldm-dummy-update'); pipe()"
|
| 74 |
+
python -c "from diffusers import DiffusionPipeline; pipe = DiffusionPipeline.from_pretrained('hf-internal-testing/tiny-stable-diffusion-pipe', safety_checker=None); pipe('ah suh du')"
|
| 75 |
+
python -c "from diffusers import *"
|
| 76 |
+
|
| 77 |
+
- name: Publish to PyPI
|
| 78 |
+
env:
|
| 79 |
+
TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
|
| 80 |
+
TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
|
| 81 |
+
run: twine upload dist/* -r pypi
|
diffusers/.github/workflows/release_tests_fast.yml
ADDED
|
@@ -0,0 +1,351 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Duplicate workflow to push_tests.yml that is meant to run on release/patch branches as a final check
|
| 2 |
+
# Creating a duplicate workflow here is simpler than adding complex path/branch parsing logic to push_tests.yml
|
| 3 |
+
# Needs to be updated if push_tests.yml updated
|
| 4 |
+
name: (Release) Fast GPU Tests on main
|
| 5 |
+
|
| 6 |
+
on:
|
| 7 |
+
push:
|
| 8 |
+
branches:
|
| 9 |
+
- "v*.*.*-release"
|
| 10 |
+
- "v*.*.*-patch"
|
| 11 |
+
|
| 12 |
+
env:
|
| 13 |
+
DIFFUSERS_IS_CI: yes
|
| 14 |
+
OMP_NUM_THREADS: 8
|
| 15 |
+
MKL_NUM_THREADS: 8
|
| 16 |
+
PYTEST_TIMEOUT: 600
|
| 17 |
+
PIPELINE_USAGE_CUTOFF: 50000
|
| 18 |
+
|
| 19 |
+
jobs:
|
| 20 |
+
setup_torch_cuda_pipeline_matrix:
|
| 21 |
+
name: Setup Torch Pipelines CUDA Slow Tests Matrix
|
| 22 |
+
runs-on:
|
| 23 |
+
group: aws-general-8-plus
|
| 24 |
+
container:
|
| 25 |
+
image: diffusers/diffusers-pytorch-cpu
|
| 26 |
+
outputs:
|
| 27 |
+
pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
|
| 28 |
+
steps:
|
| 29 |
+
- name: Checkout diffusers
|
| 30 |
+
uses: actions/checkout@v3
|
| 31 |
+
with:
|
| 32 |
+
fetch-depth: 2
|
| 33 |
+
- name: Install dependencies
|
| 34 |
+
run: |
|
| 35 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 36 |
+
python -m uv pip install -e [quality,test]
|
| 37 |
+
- name: Environment
|
| 38 |
+
run: |
|
| 39 |
+
python utils/print_env.py
|
| 40 |
+
- name: Fetch Pipeline Matrix
|
| 41 |
+
id: fetch_pipeline_matrix
|
| 42 |
+
run: |
|
| 43 |
+
matrix=$(python utils/fetch_torch_cuda_pipeline_test_matrix.py)
|
| 44 |
+
echo $matrix
|
| 45 |
+
echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
|
| 46 |
+
- name: Pipeline Tests Artifacts
|
| 47 |
+
if: ${{ always() }}
|
| 48 |
+
uses: actions/upload-artifact@v4
|
| 49 |
+
with:
|
| 50 |
+
name: test-pipelines.json
|
| 51 |
+
path: reports
|
| 52 |
+
|
| 53 |
+
torch_pipelines_cuda_tests:
|
| 54 |
+
name: Torch Pipelines CUDA Tests
|
| 55 |
+
needs: setup_torch_cuda_pipeline_matrix
|
| 56 |
+
strategy:
|
| 57 |
+
fail-fast: false
|
| 58 |
+
max-parallel: 8
|
| 59 |
+
matrix:
|
| 60 |
+
module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
|
| 61 |
+
runs-on:
|
| 62 |
+
group: aws-g4dn-2xlarge
|
| 63 |
+
container:
|
| 64 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 65 |
+
options: --shm-size "16gb" --ipc host --gpus 0
|
| 66 |
+
steps:
|
| 67 |
+
- name: Checkout diffusers
|
| 68 |
+
uses: actions/checkout@v3
|
| 69 |
+
with:
|
| 70 |
+
fetch-depth: 2
|
| 71 |
+
- name: NVIDIA-SMI
|
| 72 |
+
run: |
|
| 73 |
+
nvidia-smi
|
| 74 |
+
- name: Install dependencies
|
| 75 |
+
run: |
|
| 76 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 77 |
+
python -m uv pip install -e [quality,test]
|
| 78 |
+
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
| 79 |
+
- name: Environment
|
| 80 |
+
run: |
|
| 81 |
+
python utils/print_env.py
|
| 82 |
+
- name: Slow PyTorch CUDA checkpoint tests on Ubuntu
|
| 83 |
+
env:
|
| 84 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 85 |
+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
| 86 |
+
CUBLAS_WORKSPACE_CONFIG: :16:8
|
| 87 |
+
run: |
|
| 88 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
| 89 |
+
-s -v -k "not Flax and not Onnx" \
|
| 90 |
+
--make-reports=tests_pipeline_${{ matrix.module }}_cuda \
|
| 91 |
+
tests/pipelines/${{ matrix.module }}
|
| 92 |
+
- name: Failure short reports
|
| 93 |
+
if: ${{ failure() }}
|
| 94 |
+
run: |
|
| 95 |
+
cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
|
| 96 |
+
cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
|
| 97 |
+
- name: Test suite reports artifacts
|
| 98 |
+
if: ${{ always() }}
|
| 99 |
+
uses: actions/upload-artifact@v4
|
| 100 |
+
with:
|
| 101 |
+
name: pipeline_${{ matrix.module }}_test_reports
|
| 102 |
+
path: reports
|
| 103 |
+
|
| 104 |
+
torch_cuda_tests:
|
| 105 |
+
name: Torch CUDA Tests
|
| 106 |
+
runs-on:
|
| 107 |
+
group: aws-g4dn-2xlarge
|
| 108 |
+
container:
|
| 109 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 110 |
+
options: --shm-size "16gb" --ipc host --gpus 0
|
| 111 |
+
defaults:
|
| 112 |
+
run:
|
| 113 |
+
shell: bash
|
| 114 |
+
strategy:
|
| 115 |
+
fail-fast: false
|
| 116 |
+
max-parallel: 2
|
| 117 |
+
matrix:
|
| 118 |
+
module: [models, schedulers, lora, others, single_file]
|
| 119 |
+
steps:
|
| 120 |
+
- name: Checkout diffusers
|
| 121 |
+
uses: actions/checkout@v3
|
| 122 |
+
with:
|
| 123 |
+
fetch-depth: 2
|
| 124 |
+
|
| 125 |
+
- name: Install dependencies
|
| 126 |
+
run: |
|
| 127 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 128 |
+
python -m uv pip install -e [quality,test]
|
| 129 |
+
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
|
| 130 |
+
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
| 131 |
+
|
| 132 |
+
- name: Environment
|
| 133 |
+
run: |
|
| 134 |
+
python utils/print_env.py
|
| 135 |
+
|
| 136 |
+
- name: Run PyTorch CUDA tests
|
| 137 |
+
env:
|
| 138 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 139 |
+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
| 140 |
+
CUBLAS_WORKSPACE_CONFIG: :16:8
|
| 141 |
+
run: |
|
| 142 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
| 143 |
+
-s -v -k "not Flax and not Onnx" \
|
| 144 |
+
--make-reports=tests_torch_${{ matrix.module }}_cuda \
|
| 145 |
+
tests/${{ matrix.module }}
|
| 146 |
+
|
| 147 |
+
- name: Failure short reports
|
| 148 |
+
if: ${{ failure() }}
|
| 149 |
+
run: |
|
| 150 |
+
cat reports/tests_torch_${{ matrix.module }}_cuda_stats.txt
|
| 151 |
+
cat reports/tests_torch_${{ matrix.module }}_cuda_failures_short.txt
|
| 152 |
+
|
| 153 |
+
- name: Test suite reports artifacts
|
| 154 |
+
if: ${{ always() }}
|
| 155 |
+
uses: actions/upload-artifact@v4
|
| 156 |
+
with:
|
| 157 |
+
name: torch_cuda_${{ matrix.module }}_test_reports
|
| 158 |
+
path: reports
|
| 159 |
+
|
| 160 |
+
torch_minimum_version_cuda_tests:
|
| 161 |
+
name: Torch Minimum Version CUDA Tests
|
| 162 |
+
runs-on:
|
| 163 |
+
group: aws-g4dn-2xlarge
|
| 164 |
+
container:
|
| 165 |
+
image: diffusers/diffusers-pytorch-minimum-cuda
|
| 166 |
+
options: --shm-size "16gb" --ipc host --gpus 0
|
| 167 |
+
defaults:
|
| 168 |
+
run:
|
| 169 |
+
shell: bash
|
| 170 |
+
steps:
|
| 171 |
+
- name: Checkout diffusers
|
| 172 |
+
uses: actions/checkout@v3
|
| 173 |
+
with:
|
| 174 |
+
fetch-depth: 2
|
| 175 |
+
|
| 176 |
+
- name: Install dependencies
|
| 177 |
+
run: |
|
| 178 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 179 |
+
python -m uv pip install -e [quality,test]
|
| 180 |
+
python -m uv pip install peft@git+https://github.com/huggingface/peft.git
|
| 181 |
+
pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
|
| 182 |
+
|
| 183 |
+
- name: Environment
|
| 184 |
+
run: |
|
| 185 |
+
python utils/print_env.py
|
| 186 |
+
|
| 187 |
+
- name: Run PyTorch CUDA tests
|
| 188 |
+
env:
|
| 189 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 190 |
+
# https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
|
| 191 |
+
CUBLAS_WORKSPACE_CONFIG: :16:8
|
| 192 |
+
run: |
|
| 193 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
|
| 194 |
+
-s -v -k "not Flax and not Onnx" \
|
| 195 |
+
--make-reports=tests_torch_minimum_cuda \
|
| 196 |
+
tests/models/test_modeling_common.py \
|
| 197 |
+
tests/pipelines/test_pipelines_common.py \
|
| 198 |
+
tests/pipelines/test_pipeline_utils.py \
|
| 199 |
+
tests/pipelines/test_pipelines.py \
|
| 200 |
+
tests/pipelines/test_pipelines_auto.py \
|
| 201 |
+
tests/schedulers/test_schedulers.py \
|
| 202 |
+
tests/others
|
| 203 |
+
|
| 204 |
+
- name: Failure short reports
|
| 205 |
+
if: ${{ failure() }}
|
| 206 |
+
run: |
|
| 207 |
+
cat reports/tests_torch_minimum_version_cuda_stats.txt
|
| 208 |
+
cat reports/tests_torch_minimum_version_cuda_failures_short.txt
|
| 209 |
+
|
| 210 |
+
- name: Test suite reports artifacts
|
| 211 |
+
if: ${{ always() }}
|
| 212 |
+
uses: actions/upload-artifact@v4
|
| 213 |
+
with:
|
| 214 |
+
name: torch_minimum_version_cuda_test_reports
|
| 215 |
+
path: reports
|
| 216 |
+
|
| 217 |
+
run_torch_compile_tests:
|
| 218 |
+
name: PyTorch Compile CUDA tests
|
| 219 |
+
|
| 220 |
+
runs-on:
|
| 221 |
+
group: aws-g4dn-2xlarge
|
| 222 |
+
|
| 223 |
+
container:
|
| 224 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 225 |
+
options: --gpus 0 --shm-size "16gb" --ipc host
|
| 226 |
+
|
| 227 |
+
steps:
|
| 228 |
+
- name: Checkout diffusers
|
| 229 |
+
uses: actions/checkout@v3
|
| 230 |
+
with:
|
| 231 |
+
fetch-depth: 2
|
| 232 |
+
|
| 233 |
+
- name: NVIDIA-SMI
|
| 234 |
+
run: |
|
| 235 |
+
nvidia-smi
|
| 236 |
+
- name: Install dependencies
|
| 237 |
+
run: |
|
| 238 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 239 |
+
python -m uv pip install -e [quality,test,training]
|
| 240 |
+
- name: Environment
|
| 241 |
+
run: |
|
| 242 |
+
python utils/print_env.py
|
| 243 |
+
- name: Run torch compile tests on GPU
|
| 244 |
+
env:
|
| 245 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 246 |
+
RUN_COMPILE: yes
|
| 247 |
+
run: |
|
| 248 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
|
| 249 |
+
- name: Failure short reports
|
| 250 |
+
if: ${{ failure() }}
|
| 251 |
+
run: cat reports/tests_torch_compile_cuda_failures_short.txt
|
| 252 |
+
|
| 253 |
+
- name: Test suite reports artifacts
|
| 254 |
+
if: ${{ always() }}
|
| 255 |
+
uses: actions/upload-artifact@v4
|
| 256 |
+
with:
|
| 257 |
+
name: torch_compile_test_reports
|
| 258 |
+
path: reports
|
| 259 |
+
|
| 260 |
+
run_xformers_tests:
|
| 261 |
+
name: PyTorch xformers CUDA tests
|
| 262 |
+
|
| 263 |
+
runs-on:
|
| 264 |
+
group: aws-g4dn-2xlarge
|
| 265 |
+
|
| 266 |
+
container:
|
| 267 |
+
image: diffusers/diffusers-pytorch-xformers-cuda
|
| 268 |
+
options: --gpus 0 --shm-size "16gb" --ipc host
|
| 269 |
+
|
| 270 |
+
steps:
|
| 271 |
+
- name: Checkout diffusers
|
| 272 |
+
uses: actions/checkout@v3
|
| 273 |
+
with:
|
| 274 |
+
fetch-depth: 2
|
| 275 |
+
|
| 276 |
+
- name: NVIDIA-SMI
|
| 277 |
+
run: |
|
| 278 |
+
nvidia-smi
|
| 279 |
+
- name: Install dependencies
|
| 280 |
+
run: |
|
| 281 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 282 |
+
python -m uv pip install -e [quality,test,training]
|
| 283 |
+
- name: Environment
|
| 284 |
+
run: |
|
| 285 |
+
python utils/print_env.py
|
| 286 |
+
- name: Run example tests on GPU
|
| 287 |
+
env:
|
| 288 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 289 |
+
run: |
|
| 290 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "xformers" --make-reports=tests_torch_xformers_cuda tests/
|
| 291 |
+
- name: Failure short reports
|
| 292 |
+
if: ${{ failure() }}
|
| 293 |
+
run: cat reports/tests_torch_xformers_cuda_failures_short.txt
|
| 294 |
+
|
| 295 |
+
- name: Test suite reports artifacts
|
| 296 |
+
if: ${{ always() }}
|
| 297 |
+
uses: actions/upload-artifact@v4
|
| 298 |
+
with:
|
| 299 |
+
name: torch_xformers_test_reports
|
| 300 |
+
path: reports
|
| 301 |
+
|
| 302 |
+
run_examples_tests:
|
| 303 |
+
name: Examples PyTorch CUDA tests on Ubuntu
|
| 304 |
+
|
| 305 |
+
runs-on:
|
| 306 |
+
group: aws-g4dn-2xlarge
|
| 307 |
+
|
| 308 |
+
container:
|
| 309 |
+
image: diffusers/diffusers-pytorch-cuda
|
| 310 |
+
options: --gpus 0 --shm-size "16gb" --ipc host
|
| 311 |
+
|
| 312 |
+
steps:
|
| 313 |
+
- name: Checkout diffusers
|
| 314 |
+
uses: actions/checkout@v3
|
| 315 |
+
with:
|
| 316 |
+
fetch-depth: 2
|
| 317 |
+
|
| 318 |
+
- name: NVIDIA-SMI
|
| 319 |
+
run: |
|
| 320 |
+
nvidia-smi
|
| 321 |
+
|
| 322 |
+
- name: Install dependencies
|
| 323 |
+
run: |
|
| 324 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 325 |
+
python -m uv pip install -e [quality,test,training]
|
| 326 |
+
|
| 327 |
+
- name: Environment
|
| 328 |
+
run: |
|
| 329 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 330 |
+
python utils/print_env.py
|
| 331 |
+
|
| 332 |
+
- name: Run example tests on GPU
|
| 333 |
+
env:
|
| 334 |
+
HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
|
| 335 |
+
run: |
|
| 336 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 337 |
+
python -m uv pip install timm
|
| 338 |
+
python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=examples_torch_cuda examples/
|
| 339 |
+
|
| 340 |
+
- name: Failure short reports
|
| 341 |
+
if: ${{ failure() }}
|
| 342 |
+
run: |
|
| 343 |
+
cat reports/examples_torch_cuda_stats.txt
|
| 344 |
+
cat reports/examples_torch_cuda_failures_short.txt
|
| 345 |
+
|
| 346 |
+
- name: Test suite reports artifacts
|
| 347 |
+
if: ${{ always() }}
|
| 348 |
+
uses: actions/upload-artifact@v4
|
| 349 |
+
with:
|
| 350 |
+
name: examples_test_reports
|
| 351 |
+
path: reports
|
diffusers/.github/workflows/run_tests_from_a_pr.yml
ADDED
|
@@ -0,0 +1,74 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Check running SLOW tests from a PR (only GPU)
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
workflow_dispatch:
|
| 5 |
+
inputs:
|
| 6 |
+
docker_image:
|
| 7 |
+
default: 'diffusers/diffusers-pytorch-cuda'
|
| 8 |
+
description: 'Name of the Docker image'
|
| 9 |
+
required: true
|
| 10 |
+
pr_number:
|
| 11 |
+
description: 'PR number to test on'
|
| 12 |
+
required: true
|
| 13 |
+
test:
|
| 14 |
+
description: 'Tests to run (e.g.: `tests/models`).'
|
| 15 |
+
required: true
|
| 16 |
+
|
| 17 |
+
env:
|
| 18 |
+
DIFFUSERS_IS_CI: yes
|
| 19 |
+
IS_GITHUB_CI: "1"
|
| 20 |
+
HF_HOME: /mnt/cache
|
| 21 |
+
OMP_NUM_THREADS: 8
|
| 22 |
+
MKL_NUM_THREADS: 8
|
| 23 |
+
PYTEST_TIMEOUT: 600
|
| 24 |
+
RUN_SLOW: yes
|
| 25 |
+
|
| 26 |
+
jobs:
|
| 27 |
+
run_tests:
|
| 28 |
+
name: "Run a test on our runner from a PR"
|
| 29 |
+
runs-on:
|
| 30 |
+
group: aws-g4dn-2xlarge
|
| 31 |
+
container:
|
| 32 |
+
image: ${{ github.event.inputs.docker_image }}
|
| 33 |
+
options: --gpus 0 --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
|
| 34 |
+
|
| 35 |
+
steps:
|
| 36 |
+
- name: Validate test files input
|
| 37 |
+
id: validate_test_files
|
| 38 |
+
env:
|
| 39 |
+
PY_TEST: ${{ github.event.inputs.test }}
|
| 40 |
+
run: |
|
| 41 |
+
if [[ ! "$PY_TEST" =~ ^tests/ ]]; then
|
| 42 |
+
echo "Error: The input string must start with 'tests/'."
|
| 43 |
+
exit 1
|
| 44 |
+
fi
|
| 45 |
+
|
| 46 |
+
if [[ ! "$PY_TEST" =~ ^tests/(models|pipelines|lora) ]]; then
|
| 47 |
+
echo "Error: The input string must contain either 'models', 'pipelines', or 'lora' after 'tests/'."
|
| 48 |
+
exit 1
|
| 49 |
+
fi
|
| 50 |
+
|
| 51 |
+
if [[ "$PY_TEST" == *";"* ]]; then
|
| 52 |
+
echo "Error: The input string must not contain ';'."
|
| 53 |
+
exit 1
|
| 54 |
+
fi
|
| 55 |
+
echo "$PY_TEST"
|
| 56 |
+
|
| 57 |
+
shell: bash -e {0}
|
| 58 |
+
|
| 59 |
+
- name: Checkout PR branch
|
| 60 |
+
uses: actions/checkout@v4
|
| 61 |
+
with:
|
| 62 |
+
ref: refs/pull/${{ inputs.pr_number }}/head
|
| 63 |
+
|
| 64 |
+
- name: Install pytest
|
| 65 |
+
run: |
|
| 66 |
+
python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
|
| 67 |
+
python -m uv pip install -e [quality,test]
|
| 68 |
+
python -m uv pip install peft
|
| 69 |
+
|
| 70 |
+
- name: Run tests
|
| 71 |
+
env:
|
| 72 |
+
PY_TEST: ${{ github.event.inputs.test }}
|
| 73 |
+
run: |
|
| 74 |
+
pytest "$PY_TEST"
|
diffusers/.github/workflows/ssh-pr-runner.yml
ADDED
|
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: SSH into PR runners
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
workflow_dispatch:
|
| 5 |
+
inputs:
|
| 6 |
+
docker_image:
|
| 7 |
+
description: 'Name of the Docker image'
|
| 8 |
+
required: true
|
| 9 |
+
|
| 10 |
+
env:
|
| 11 |
+
IS_GITHUB_CI: "1"
|
| 12 |
+
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
|
| 13 |
+
HF_HOME: /mnt/cache
|
| 14 |
+
DIFFUSERS_IS_CI: yes
|
| 15 |
+
OMP_NUM_THREADS: 8
|
| 16 |
+
MKL_NUM_THREADS: 8
|
| 17 |
+
RUN_SLOW: yes
|
| 18 |
+
|
| 19 |
+
jobs:
|
| 20 |
+
ssh_runner:
|
| 21 |
+
name: "SSH"
|
| 22 |
+
runs-on:
|
| 23 |
+
group: aws-highmemory-32-plus
|
| 24 |
+
container:
|
| 25 |
+
image: ${{ github.event.inputs.docker_image }}
|
| 26 |
+
options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface/diffusers:/mnt/cache/ --privileged
|
| 27 |
+
|
| 28 |
+
steps:
|
| 29 |
+
- name: Checkout diffusers
|
| 30 |
+
uses: actions/checkout@v3
|
| 31 |
+
with:
|
| 32 |
+
fetch-depth: 2
|
| 33 |
+
|
| 34 |
+
- name: Tailscale # In order to be able to SSH when a test fails
|
| 35 |
+
uses: huggingface/tailscale-action@main
|
| 36 |
+
with:
|
| 37 |
+
authkey: ${{ secrets.TAILSCALE_SSH_AUTHKEY }}
|
| 38 |
+
slackChannel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}
|
| 39 |
+
slackToken: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
| 40 |
+
waitForSSH: true
|
diffusers/.github/workflows/ssh-runner.yml
ADDED
|
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: SSH into GPU runners
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
workflow_dispatch:
|
| 5 |
+
inputs:
|
| 6 |
+
runner_type:
|
| 7 |
+
description: 'Type of runner to test (aws-g6-4xlarge-plus: a10, aws-g4dn-2xlarge: t4, aws-g6e-xlarge-plus: L40)'
|
| 8 |
+
type: choice
|
| 9 |
+
required: true
|
| 10 |
+
options:
|
| 11 |
+
- aws-g6-4xlarge-plus
|
| 12 |
+
- aws-g4dn-2xlarge
|
| 13 |
+
- aws-g6e-xlarge-plus
|
| 14 |
+
docker_image:
|
| 15 |
+
description: 'Name of the Docker image'
|
| 16 |
+
required: true
|
| 17 |
+
|
| 18 |
+
env:
|
| 19 |
+
IS_GITHUB_CI: "1"
|
| 20 |
+
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
|
| 21 |
+
HF_HOME: /mnt/cache
|
| 22 |
+
DIFFUSERS_IS_CI: yes
|
| 23 |
+
OMP_NUM_THREADS: 8
|
| 24 |
+
MKL_NUM_THREADS: 8
|
| 25 |
+
RUN_SLOW: yes
|
| 26 |
+
|
| 27 |
+
jobs:
|
| 28 |
+
ssh_runner:
|
| 29 |
+
name: "SSH"
|
| 30 |
+
runs-on:
|
| 31 |
+
group: "${{ github.event.inputs.runner_type }}"
|
| 32 |
+
container:
|
| 33 |
+
image: ${{ github.event.inputs.docker_image }}
|
| 34 |
+
options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface/diffusers:/mnt/cache/ --gpus 0 --privileged
|
| 35 |
+
|
| 36 |
+
steps:
|
| 37 |
+
- name: Checkout diffusers
|
| 38 |
+
uses: actions/checkout@v3
|
| 39 |
+
with:
|
| 40 |
+
fetch-depth: 2
|
| 41 |
+
|
| 42 |
+
- name: NVIDIA-SMI
|
| 43 |
+
run: |
|
| 44 |
+
nvidia-smi
|
| 45 |
+
|
| 46 |
+
- name: Tailscale # In order to be able to SSH when a test fails
|
| 47 |
+
uses: huggingface/tailscale-action@main
|
| 48 |
+
with:
|
| 49 |
+
authkey: ${{ secrets.TAILSCALE_SSH_AUTHKEY }}
|
| 50 |
+
slackChannel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}
|
| 51 |
+
slackToken: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
|
| 52 |
+
waitForSSH: true
|
diffusers/.github/workflows/stale.yml
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Stale Bot
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
schedule:
|
| 5 |
+
- cron: "0 15 * * *"
|
| 6 |
+
|
| 7 |
+
jobs:
|
| 8 |
+
close_stale_issues:
|
| 9 |
+
name: Close Stale Issues
|
| 10 |
+
if: github.repository == 'huggingface/diffusers'
|
| 11 |
+
runs-on: ubuntu-22.04
|
| 12 |
+
permissions:
|
| 13 |
+
issues: write
|
| 14 |
+
pull-requests: write
|
| 15 |
+
env:
|
| 16 |
+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
| 17 |
+
steps:
|
| 18 |
+
- uses: actions/checkout@v2
|
| 19 |
+
|
| 20 |
+
- name: Setup Python
|
| 21 |
+
uses: actions/setup-python@v1
|
| 22 |
+
with:
|
| 23 |
+
python-version: 3.8
|
| 24 |
+
|
| 25 |
+
- name: Install requirements
|
| 26 |
+
run: |
|
| 27 |
+
pip install PyGithub
|
| 28 |
+
- name: Close stale issues
|
| 29 |
+
run: |
|
| 30 |
+
python utils/stale.py
|
diffusers/.github/workflows/trufflehog.yml
ADDED
|
@@ -0,0 +1,18 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
on:
|
| 2 |
+
push:
|
| 3 |
+
|
| 4 |
+
name: Secret Leaks
|
| 5 |
+
|
| 6 |
+
jobs:
|
| 7 |
+
trufflehog:
|
| 8 |
+
runs-on: ubuntu-22.04
|
| 9 |
+
steps:
|
| 10 |
+
- name: Checkout code
|
| 11 |
+
uses: actions/checkout@v4
|
| 12 |
+
with:
|
| 13 |
+
fetch-depth: 0
|
| 14 |
+
- name: Secret Scanning
|
| 15 |
+
uses: trufflesecurity/trufflehog@main
|
| 16 |
+
with:
|
| 17 |
+
extra_args: --results=verified,unknown
|
| 18 |
+
|
diffusers/.github/workflows/typos.yml
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Check typos
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
workflow_dispatch:
|
| 5 |
+
|
| 6 |
+
jobs:
|
| 7 |
+
build:
|
| 8 |
+
runs-on: ubuntu-22.04
|
| 9 |
+
|
| 10 |
+
steps:
|
| 11 |
+
- uses: actions/checkout@v3
|
| 12 |
+
|
| 13 |
+
- name: typos-action
|
| 14 |
+
uses: crate-ci/typos@v1.12.4
|
diffusers/.github/workflows/update_metadata.yml
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Update Diffusers metadata
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
workflow_dispatch:
|
| 5 |
+
push:
|
| 6 |
+
branches:
|
| 7 |
+
- main
|
| 8 |
+
- update_diffusers_metadata*
|
| 9 |
+
|
| 10 |
+
jobs:
|
| 11 |
+
update_metadata:
|
| 12 |
+
runs-on: ubuntu-22.04
|
| 13 |
+
defaults:
|
| 14 |
+
run:
|
| 15 |
+
shell: bash -l {0}
|
| 16 |
+
|
| 17 |
+
steps:
|
| 18 |
+
- uses: actions/checkout@v3
|
| 19 |
+
|
| 20 |
+
- name: Setup environment
|
| 21 |
+
run: |
|
| 22 |
+
pip install --upgrade pip
|
| 23 |
+
pip install datasets pandas
|
| 24 |
+
pip install .[torch]
|
| 25 |
+
|
| 26 |
+
- name: Update metadata
|
| 27 |
+
env:
|
| 28 |
+
HF_TOKEN: ${{ secrets.SAYAK_HF_TOKEN }}
|
| 29 |
+
run: |
|
| 30 |
+
python utils/update_metadata.py --commit_sha ${{ github.sha }}
|
diffusers/.github/workflows/upload_pr_documentation.yml
ADDED
|
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: Upload PR Documentation
|
| 2 |
+
|
| 3 |
+
on:
|
| 4 |
+
workflow_run:
|
| 5 |
+
workflows: ["Build PR Documentation"]
|
| 6 |
+
types:
|
| 7 |
+
- completed
|
| 8 |
+
|
| 9 |
+
jobs:
|
| 10 |
+
build:
|
| 11 |
+
uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@main
|
| 12 |
+
with:
|
| 13 |
+
package_name: diffusers
|
| 14 |
+
secrets:
|
| 15 |
+
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
|
| 16 |
+
comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}
|
diffusers/docs/source/_config.py
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# docstyle-ignore
|
| 2 |
+
INSTALL_CONTENT = """
|
| 3 |
+
# Diffusers installation
|
| 4 |
+
! pip install diffusers transformers datasets accelerate
|
| 5 |
+
# To install from source instead of the last release, comment the command above and uncomment the following one.
|
| 6 |
+
# ! pip install git+https://github.com/huggingface/diffusers.git
|
| 7 |
+
"""
|
| 8 |
+
|
| 9 |
+
notebook_first_cells = [{"type": "code", "content": INSTALL_CONTENT}]
|
diffusers/docs/source/en/_toctree.yml
ADDED
|
@@ -0,0 +1,701 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
- sections:
|
| 2 |
+
- local: index
|
| 3 |
+
title: 🧨 Diffusers
|
| 4 |
+
- local: quicktour
|
| 5 |
+
title: Quicktour
|
| 6 |
+
- local: stable_diffusion
|
| 7 |
+
title: Effective and efficient diffusion
|
| 8 |
+
- local: installation
|
| 9 |
+
title: Installation
|
| 10 |
+
title: Get started
|
| 11 |
+
- sections:
|
| 12 |
+
- local: tutorials/tutorial_overview
|
| 13 |
+
title: Overview
|
| 14 |
+
- local: using-diffusers/write_own_pipeline
|
| 15 |
+
title: Understanding pipelines, models and schedulers
|
| 16 |
+
- local: tutorials/autopipeline
|
| 17 |
+
title: AutoPipeline
|
| 18 |
+
- local: tutorials/basic_training
|
| 19 |
+
title: Train a diffusion model
|
| 20 |
+
title: Tutorials
|
| 21 |
+
- sections:
|
| 22 |
+
- local: using-diffusers/loading
|
| 23 |
+
title: Load pipelines
|
| 24 |
+
- local: using-diffusers/custom_pipeline_overview
|
| 25 |
+
title: Load community pipelines and components
|
| 26 |
+
- local: using-diffusers/schedulers
|
| 27 |
+
title: Load schedulers and models
|
| 28 |
+
- local: using-diffusers/other-formats
|
| 29 |
+
title: Model files and layouts
|
| 30 |
+
- local: using-diffusers/push_to_hub
|
| 31 |
+
title: Push files to the Hub
|
| 32 |
+
title: Load pipelines and adapters
|
| 33 |
+
- sections:
|
| 34 |
+
- local: tutorials/using_peft_for_inference
|
| 35 |
+
title: LoRA
|
| 36 |
+
- local: using-diffusers/ip_adapter
|
| 37 |
+
title: IP-Adapter
|
| 38 |
+
- local: using-diffusers/controlnet
|
| 39 |
+
title: ControlNet
|
| 40 |
+
- local: using-diffusers/t2i_adapter
|
| 41 |
+
title: T2I-Adapter
|
| 42 |
+
- local: using-diffusers/dreambooth
|
| 43 |
+
title: DreamBooth
|
| 44 |
+
- local: using-diffusers/textual_inversion_inference
|
| 45 |
+
title: Textual inversion
|
| 46 |
+
title: Adapters
|
| 47 |
+
isExpanded: false
|
| 48 |
+
- sections:
|
| 49 |
+
- local: using-diffusers/unconditional_image_generation
|
| 50 |
+
title: Unconditional image generation
|
| 51 |
+
- local: using-diffusers/conditional_image_generation
|
| 52 |
+
title: Text-to-image
|
| 53 |
+
- local: using-diffusers/img2img
|
| 54 |
+
title: Image-to-image
|
| 55 |
+
- local: using-diffusers/inpaint
|
| 56 |
+
title: Inpainting
|
| 57 |
+
- local: using-diffusers/text-img2vid
|
| 58 |
+
title: Video generation
|
| 59 |
+
- local: using-diffusers/depth2img
|
| 60 |
+
title: Depth-to-image
|
| 61 |
+
title: Generative tasks
|
| 62 |
+
- sections:
|
| 63 |
+
- local: using-diffusers/overview_techniques
|
| 64 |
+
title: Overview
|
| 65 |
+
- local: using-diffusers/create_a_server
|
| 66 |
+
title: Create a server
|
| 67 |
+
- local: using-diffusers/batched_inference
|
| 68 |
+
title: Batch inference
|
| 69 |
+
- local: training/distributed_inference
|
| 70 |
+
title: Distributed inference
|
| 71 |
+
- local: using-diffusers/scheduler_features
|
| 72 |
+
title: Scheduler features
|
| 73 |
+
- local: using-diffusers/callback
|
| 74 |
+
title: Pipeline callbacks
|
| 75 |
+
- local: using-diffusers/reusing_seeds
|
| 76 |
+
title: Reproducible pipelines
|
| 77 |
+
- local: using-diffusers/image_quality
|
| 78 |
+
title: Controlling image quality
|
| 79 |
+
- local: using-diffusers/weighted_prompts
|
| 80 |
+
title: Prompt techniques
|
| 81 |
+
title: Inference techniques
|
| 82 |
+
- sections:
|
| 83 |
+
- local: advanced_inference/outpaint
|
| 84 |
+
title: Outpainting
|
| 85 |
+
title: Advanced inference
|
| 86 |
+
- sections:
|
| 87 |
+
- local: hybrid_inference/overview
|
| 88 |
+
title: Overview
|
| 89 |
+
- local: hybrid_inference/vae_decode
|
| 90 |
+
title: VAE Decode
|
| 91 |
+
- local: hybrid_inference/vae_encode
|
| 92 |
+
title: VAE Encode
|
| 93 |
+
- local: hybrid_inference/api_reference
|
| 94 |
+
title: API Reference
|
| 95 |
+
title: Hybrid Inference
|
| 96 |
+
- sections:
|
| 97 |
+
- local: modular_diffusers/overview
|
| 98 |
+
title: Overview
|
| 99 |
+
- local: modular_diffusers/modular_pipeline
|
| 100 |
+
title: Modular Pipeline
|
| 101 |
+
- local: modular_diffusers/components_manager
|
| 102 |
+
title: Components Manager
|
| 103 |
+
- local: modular_diffusers/modular_diffusers_states
|
| 104 |
+
title: Modular Diffusers States
|
| 105 |
+
- local: modular_diffusers/pipeline_block
|
| 106 |
+
title: Pipeline Block
|
| 107 |
+
- local: modular_diffusers/sequential_pipeline_blocks
|
| 108 |
+
title: Sequential Pipeline Blocks
|
| 109 |
+
- local: modular_diffusers/loop_sequential_pipeline_blocks
|
| 110 |
+
title: Loop Sequential Pipeline Blocks
|
| 111 |
+
- local: modular_diffusers/auto_pipeline_blocks
|
| 112 |
+
title: Auto Pipeline Blocks
|
| 113 |
+
- local: modular_diffusers/end_to_end_guide
|
| 114 |
+
title: End-to-End Example
|
| 115 |
+
title: Modular Diffusers
|
| 116 |
+
- sections:
|
| 117 |
+
- local: using-diffusers/consisid
|
| 118 |
+
title: ConsisID
|
| 119 |
+
- local: using-diffusers/sdxl
|
| 120 |
+
title: Stable Diffusion XL
|
| 121 |
+
- local: using-diffusers/sdxl_turbo
|
| 122 |
+
title: SDXL Turbo
|
| 123 |
+
- local: using-diffusers/kandinsky
|
| 124 |
+
title: Kandinsky
|
| 125 |
+
- local: using-diffusers/omnigen
|
| 126 |
+
title: OmniGen
|
| 127 |
+
- local: using-diffusers/pag
|
| 128 |
+
title: PAG
|
| 129 |
+
- local: using-diffusers/inference_with_lcm
|
| 130 |
+
title: Latent Consistency Model
|
| 131 |
+
- local: using-diffusers/shap-e
|
| 132 |
+
title: Shap-E
|
| 133 |
+
- local: using-diffusers/diffedit
|
| 134 |
+
title: DiffEdit
|
| 135 |
+
- local: using-diffusers/inference_with_tcd_lora
|
| 136 |
+
title: Trajectory Consistency Distillation-LoRA
|
| 137 |
+
- local: using-diffusers/svd
|
| 138 |
+
title: Stable Video Diffusion
|
| 139 |
+
- local: using-diffusers/marigold_usage
|
| 140 |
+
title: Marigold Computer Vision
|
| 141 |
+
title: Specific pipeline examples
|
| 142 |
+
- sections:
|
| 143 |
+
- local: training/overview
|
| 144 |
+
title: Overview
|
| 145 |
+
- local: training/create_dataset
|
| 146 |
+
title: Create a dataset for training
|
| 147 |
+
- local: training/adapt_a_model
|
| 148 |
+
title: Adapt a model to a new task
|
| 149 |
+
- isExpanded: false
|
| 150 |
+
sections:
|
| 151 |
+
- local: training/unconditional_training
|
| 152 |
+
title: Unconditional image generation
|
| 153 |
+
- local: training/text2image
|
| 154 |
+
title: Text-to-image
|
| 155 |
+
- local: training/sdxl
|
| 156 |
+
title: Stable Diffusion XL
|
| 157 |
+
- local: training/kandinsky
|
| 158 |
+
title: Kandinsky 2.2
|
| 159 |
+
- local: training/wuerstchen
|
| 160 |
+
title: Wuerstchen
|
| 161 |
+
- local: training/controlnet
|
| 162 |
+
title: ControlNet
|
| 163 |
+
- local: training/t2i_adapters
|
| 164 |
+
title: T2I-Adapters
|
| 165 |
+
- local: training/instructpix2pix
|
| 166 |
+
title: InstructPix2Pix
|
| 167 |
+
- local: training/cogvideox
|
| 168 |
+
title: CogVideoX
|
| 169 |
+
title: Models
|
| 170 |
+
- isExpanded: false
|
| 171 |
+
sections:
|
| 172 |
+
- local: training/text_inversion
|
| 173 |
+
title: Textual Inversion
|
| 174 |
+
- local: training/dreambooth
|
| 175 |
+
title: DreamBooth
|
| 176 |
+
- local: training/lora
|
| 177 |
+
title: LoRA
|
| 178 |
+
- local: training/custom_diffusion
|
| 179 |
+
title: Custom Diffusion
|
| 180 |
+
- local: training/lcm_distill
|
| 181 |
+
title: Latent Consistency Distillation
|
| 182 |
+
- local: training/ddpo
|
| 183 |
+
title: Reinforcement learning training with DDPO
|
| 184 |
+
title: Methods
|
| 185 |
+
title: Training
|
| 186 |
+
- sections:
|
| 187 |
+
- local: quantization/overview
|
| 188 |
+
title: Getting Started
|
| 189 |
+
- local: quantization/bitsandbytes
|
| 190 |
+
title: bitsandbytes
|
| 191 |
+
- local: quantization/gguf
|
| 192 |
+
title: gguf
|
| 193 |
+
- local: quantization/torchao
|
| 194 |
+
title: torchao
|
| 195 |
+
- local: quantization/quanto
|
| 196 |
+
title: quanto
|
| 197 |
+
title: Quantization Methods
|
| 198 |
+
- sections:
|
| 199 |
+
- local: optimization/fp16
|
| 200 |
+
title: Accelerate inference
|
| 201 |
+
- local: optimization/cache
|
| 202 |
+
title: Caching
|
| 203 |
+
- local: optimization/memory
|
| 204 |
+
title: Reduce memory usage
|
| 205 |
+
- local: optimization/speed-memory-optims
|
| 206 |
+
title: Compile and offloading quantized models
|
| 207 |
+
- local: optimization/pruna
|
| 208 |
+
title: Pruna
|
| 209 |
+
- local: optimization/xformers
|
| 210 |
+
title: xFormers
|
| 211 |
+
- local: optimization/tome
|
| 212 |
+
title: Token merging
|
| 213 |
+
- local: optimization/deepcache
|
| 214 |
+
title: DeepCache
|
| 215 |
+
- local: optimization/tgate
|
| 216 |
+
title: TGATE
|
| 217 |
+
- local: optimization/xdit
|
| 218 |
+
title: xDiT
|
| 219 |
+
- local: optimization/para_attn
|
| 220 |
+
title: ParaAttention
|
| 221 |
+
- sections:
|
| 222 |
+
- local: using-diffusers/stable_diffusion_jax_how_to
|
| 223 |
+
title: JAX/Flax
|
| 224 |
+
- local: optimization/onnx
|
| 225 |
+
title: ONNX
|
| 226 |
+
- local: optimization/open_vino
|
| 227 |
+
title: OpenVINO
|
| 228 |
+
- local: optimization/coreml
|
| 229 |
+
title: Core ML
|
| 230 |
+
title: Optimized model formats
|
| 231 |
+
- sections:
|
| 232 |
+
- local: optimization/mps
|
| 233 |
+
title: Metal Performance Shaders (MPS)
|
| 234 |
+
- local: optimization/habana
|
| 235 |
+
title: Intel Gaudi
|
| 236 |
+
- local: optimization/neuron
|
| 237 |
+
title: AWS Neuron
|
| 238 |
+
title: Optimized hardware
|
| 239 |
+
title: Accelerate inference and reduce memory
|
| 240 |
+
- sections:
|
| 241 |
+
- local: conceptual/philosophy
|
| 242 |
+
title: Philosophy
|
| 243 |
+
- local: using-diffusers/controlling_generation
|
| 244 |
+
title: Controlled generation
|
| 245 |
+
- local: conceptual/contribution
|
| 246 |
+
title: How to contribute?
|
| 247 |
+
- local: conceptual/ethical_guidelines
|
| 248 |
+
title: Diffusers' Ethical Guidelines
|
| 249 |
+
- local: conceptual/evaluation
|
| 250 |
+
title: Evaluating Diffusion Models
|
| 251 |
+
title: Conceptual Guides
|
| 252 |
+
- sections:
|
| 253 |
+
- local: community_projects
|
| 254 |
+
title: Projects built with Diffusers
|
| 255 |
+
title: Community Projects
|
| 256 |
+
- sections:
|
| 257 |
+
- isExpanded: false
|
| 258 |
+
sections:
|
| 259 |
+
- local: api/configuration
|
| 260 |
+
title: Configuration
|
| 261 |
+
- local: api/logging
|
| 262 |
+
title: Logging
|
| 263 |
+
- local: api/outputs
|
| 264 |
+
title: Outputs
|
| 265 |
+
- local: api/quantization
|
| 266 |
+
title: Quantization
|
| 267 |
+
title: Main Classes
|
| 268 |
+
- isExpanded: false
|
| 269 |
+
sections:
|
| 270 |
+
- local: api/loaders/ip_adapter
|
| 271 |
+
title: IP-Adapter
|
| 272 |
+
- local: api/loaders/lora
|
| 273 |
+
title: LoRA
|
| 274 |
+
- local: api/loaders/single_file
|
| 275 |
+
title: Single files
|
| 276 |
+
- local: api/loaders/textual_inversion
|
| 277 |
+
title: Textual Inversion
|
| 278 |
+
- local: api/loaders/unet
|
| 279 |
+
title: UNet
|
| 280 |
+
- local: api/loaders/transformer_sd3
|
| 281 |
+
title: SD3Transformer2D
|
| 282 |
+
- local: api/loaders/peft
|
| 283 |
+
title: PEFT
|
| 284 |
+
title: Loaders
|
| 285 |
+
- isExpanded: false
|
| 286 |
+
sections:
|
| 287 |
+
- local: api/models/overview
|
| 288 |
+
title: Overview
|
| 289 |
+
- local: api/models/auto_model
|
| 290 |
+
title: AutoModel
|
| 291 |
+
- sections:
|
| 292 |
+
- local: api/models/controlnet
|
| 293 |
+
title: ControlNetModel
|
| 294 |
+
- local: api/models/controlnet_union
|
| 295 |
+
title: ControlNetUnionModel
|
| 296 |
+
- local: api/models/controlnet_flux
|
| 297 |
+
title: FluxControlNetModel
|
| 298 |
+
- local: api/models/controlnet_hunyuandit
|
| 299 |
+
title: HunyuanDiT2DControlNetModel
|
| 300 |
+
- local: api/models/controlnet_sana
|
| 301 |
+
title: SanaControlNetModel
|
| 302 |
+
- local: api/models/controlnet_sd3
|
| 303 |
+
title: SD3ControlNetModel
|
| 304 |
+
- local: api/models/controlnet_sparsectrl
|
| 305 |
+
title: SparseControlNetModel
|
| 306 |
+
title: ControlNets
|
| 307 |
+
- sections:
|
| 308 |
+
- local: api/models/allegro_transformer3d
|
| 309 |
+
title: AllegroTransformer3DModel
|
| 310 |
+
- local: api/models/aura_flow_transformer2d
|
| 311 |
+
title: AuraFlowTransformer2DModel
|
| 312 |
+
- local: api/models/chroma_transformer
|
| 313 |
+
title: ChromaTransformer2DModel
|
| 314 |
+
- local: api/models/cogvideox_transformer3d
|
| 315 |
+
title: CogVideoXTransformer3DModel
|
| 316 |
+
- local: api/models/cogview3plus_transformer2d
|
| 317 |
+
title: CogView3PlusTransformer2DModel
|
| 318 |
+
- local: api/models/cogview4_transformer2d
|
| 319 |
+
title: CogView4Transformer2DModel
|
| 320 |
+
- local: api/models/consisid_transformer3d
|
| 321 |
+
title: ConsisIDTransformer3DModel
|
| 322 |
+
- local: api/models/cosmos_transformer3d
|
| 323 |
+
title: CosmosTransformer3DModel
|
| 324 |
+
- local: api/models/dit_transformer2d
|
| 325 |
+
title: DiTTransformer2DModel
|
| 326 |
+
- local: api/models/easyanimate_transformer3d
|
| 327 |
+
title: EasyAnimateTransformer3DModel
|
| 328 |
+
- local: api/models/flux_transformer
|
| 329 |
+
title: FluxTransformer2DModel
|
| 330 |
+
- local: api/models/hidream_image_transformer
|
| 331 |
+
title: HiDreamImageTransformer2DModel
|
| 332 |
+
- local: api/models/hunyuan_transformer2d
|
| 333 |
+
title: HunyuanDiT2DModel
|
| 334 |
+
- local: api/models/hunyuan_video_transformer_3d
|
| 335 |
+
title: HunyuanVideoTransformer3DModel
|
| 336 |
+
- local: api/models/latte_transformer3d
|
| 337 |
+
title: LatteTransformer3DModel
|
| 338 |
+
- local: api/models/ltx_video_transformer3d
|
| 339 |
+
title: LTXVideoTransformer3DModel
|
| 340 |
+
- local: api/models/lumina2_transformer2d
|
| 341 |
+
title: Lumina2Transformer2DModel
|
| 342 |
+
- local: api/models/lumina_nextdit2d
|
| 343 |
+
title: LuminaNextDiT2DModel
|
| 344 |
+
- local: api/models/mochi_transformer3d
|
| 345 |
+
title: MochiTransformer3DModel
|
| 346 |
+
- local: api/models/omnigen_transformer
|
| 347 |
+
title: OmniGenTransformer2DModel
|
| 348 |
+
- local: api/models/pixart_transformer2d
|
| 349 |
+
title: PixArtTransformer2DModel
|
| 350 |
+
- local: api/models/prior_transformer
|
| 351 |
+
title: PriorTransformer
|
| 352 |
+
- local: api/models/sana_transformer2d
|
| 353 |
+
title: SanaTransformer2DModel
|
| 354 |
+
- local: api/models/sd3_transformer2d
|
| 355 |
+
title: SD3Transformer2DModel
|
| 356 |
+
- local: api/models/stable_audio_transformer
|
| 357 |
+
title: StableAudioDiTModel
|
| 358 |
+
- local: api/models/transformer2d
|
| 359 |
+
title: Transformer2DModel
|
| 360 |
+
- local: api/models/transformer_temporal
|
| 361 |
+
title: TransformerTemporalModel
|
| 362 |
+
- local: api/models/wan_transformer_3d
|
| 363 |
+
title: WanTransformer3DModel
|
| 364 |
+
title: Transformers
|
| 365 |
+
- sections:
|
| 366 |
+
- local: api/models/stable_cascade_unet
|
| 367 |
+
title: StableCascadeUNet
|
| 368 |
+
- local: api/models/unet
|
| 369 |
+
title: UNet1DModel
|
| 370 |
+
- local: api/models/unet2d-cond
|
| 371 |
+
title: UNet2DConditionModel
|
| 372 |
+
- local: api/models/unet2d
|
| 373 |
+
title: UNet2DModel
|
| 374 |
+
- local: api/models/unet3d-cond
|
| 375 |
+
title: UNet3DConditionModel
|
| 376 |
+
- local: api/models/unet-motion
|
| 377 |
+
title: UNetMotionModel
|
| 378 |
+
- local: api/models/uvit2d
|
| 379 |
+
title: UViT2DModel
|
| 380 |
+
title: UNets
|
| 381 |
+
- sections:
|
| 382 |
+
- local: api/models/asymmetricautoencoderkl
|
| 383 |
+
title: AsymmetricAutoencoderKL
|
| 384 |
+
- local: api/models/autoencoder_dc
|
| 385 |
+
title: AutoencoderDC
|
| 386 |
+
- local: api/models/autoencoderkl
|
| 387 |
+
title: AutoencoderKL
|
| 388 |
+
- local: api/models/autoencoderkl_allegro
|
| 389 |
+
title: AutoencoderKLAllegro
|
| 390 |
+
- local: api/models/autoencoderkl_cogvideox
|
| 391 |
+
title: AutoencoderKLCogVideoX
|
| 392 |
+
- local: api/models/autoencoderkl_cosmos
|
| 393 |
+
title: AutoencoderKLCosmos
|
| 394 |
+
- local: api/models/autoencoder_kl_hunyuan_video
|
| 395 |
+
title: AutoencoderKLHunyuanVideo
|
| 396 |
+
- local: api/models/autoencoderkl_ltx_video
|
| 397 |
+
title: AutoencoderKLLTXVideo
|
| 398 |
+
- local: api/models/autoencoderkl_magvit
|
| 399 |
+
title: AutoencoderKLMagvit
|
| 400 |
+
- local: api/models/autoencoderkl_mochi
|
| 401 |
+
title: AutoencoderKLMochi
|
| 402 |
+
- local: api/models/autoencoder_kl_wan
|
| 403 |
+
title: AutoencoderKLWan
|
| 404 |
+
- local: api/models/consistency_decoder_vae
|
| 405 |
+
title: ConsistencyDecoderVAE
|
| 406 |
+
- local: api/models/autoencoder_oobleck
|
| 407 |
+
title: Oobleck AutoEncoder
|
| 408 |
+
- local: api/models/autoencoder_tiny
|
| 409 |
+
title: Tiny AutoEncoder
|
| 410 |
+
- local: api/models/vq
|
| 411 |
+
title: VQModel
|
| 412 |
+
title: VAEs
|
| 413 |
+
title: Models
|
| 414 |
+
- isExpanded: false
|
| 415 |
+
sections:
|
| 416 |
+
- local: api/pipelines/overview
|
| 417 |
+
title: Overview
|
| 418 |
+
- local: api/pipelines/allegro
|
| 419 |
+
title: Allegro
|
| 420 |
+
- local: api/pipelines/amused
|
| 421 |
+
title: aMUSEd
|
| 422 |
+
- local: api/pipelines/animatediff
|
| 423 |
+
title: AnimateDiff
|
| 424 |
+
- local: api/pipelines/attend_and_excite
|
| 425 |
+
title: Attend-and-Excite
|
| 426 |
+
- local: api/pipelines/audioldm
|
| 427 |
+
title: AudioLDM
|
| 428 |
+
- local: api/pipelines/audioldm2
|
| 429 |
+
title: AudioLDM 2
|
| 430 |
+
- local: api/pipelines/aura_flow
|
| 431 |
+
title: AuraFlow
|
| 432 |
+
- local: api/pipelines/auto_pipeline
|
| 433 |
+
title: AutoPipeline
|
| 434 |
+
- local: api/pipelines/blip_diffusion
|
| 435 |
+
title: BLIP-Diffusion
|
| 436 |
+
- local: api/pipelines/chroma
|
| 437 |
+
title: Chroma
|
| 438 |
+
- local: api/pipelines/cogvideox
|
| 439 |
+
title: CogVideoX
|
| 440 |
+
- local: api/pipelines/cogview3
|
| 441 |
+
title: CogView3
|
| 442 |
+
- local: api/pipelines/cogview4
|
| 443 |
+
title: CogView4
|
| 444 |
+
- local: api/pipelines/consisid
|
| 445 |
+
title: ConsisID
|
| 446 |
+
- local: api/pipelines/consistency_models
|
| 447 |
+
title: Consistency Models
|
| 448 |
+
- local: api/pipelines/controlnet
|
| 449 |
+
title: ControlNet
|
| 450 |
+
- local: api/pipelines/controlnet_flux
|
| 451 |
+
title: ControlNet with Flux.1
|
| 452 |
+
- local: api/pipelines/controlnet_hunyuandit
|
| 453 |
+
title: ControlNet with Hunyuan-DiT
|
| 454 |
+
- local: api/pipelines/controlnet_sd3
|
| 455 |
+
title: ControlNet with Stable Diffusion 3
|
| 456 |
+
- local: api/pipelines/controlnet_sdxl
|
| 457 |
+
title: ControlNet with Stable Diffusion XL
|
| 458 |
+
- local: api/pipelines/controlnet_sana
|
| 459 |
+
title: ControlNet-Sana
|
| 460 |
+
- local: api/pipelines/controlnetxs
|
| 461 |
+
title: ControlNet-XS
|
| 462 |
+
- local: api/pipelines/controlnetxs_sdxl
|
| 463 |
+
title: ControlNet-XS with Stable Diffusion XL
|
| 464 |
+
- local: api/pipelines/controlnet_union
|
| 465 |
+
title: ControlNetUnion
|
| 466 |
+
- local: api/pipelines/cosmos
|
| 467 |
+
title: Cosmos
|
| 468 |
+
- local: api/pipelines/dance_diffusion
|
| 469 |
+
title: Dance Diffusion
|
| 470 |
+
- local: api/pipelines/ddim
|
| 471 |
+
title: DDIM
|
| 472 |
+
- local: api/pipelines/ddpm
|
| 473 |
+
title: DDPM
|
| 474 |
+
- local: api/pipelines/deepfloyd_if
|
| 475 |
+
title: DeepFloyd IF
|
| 476 |
+
- local: api/pipelines/diffedit
|
| 477 |
+
title: DiffEdit
|
| 478 |
+
- local: api/pipelines/dit
|
| 479 |
+
title: DiT
|
| 480 |
+
- local: api/pipelines/easyanimate
|
| 481 |
+
title: EasyAnimate
|
| 482 |
+
- local: api/pipelines/flux
|
| 483 |
+
title: Flux
|
| 484 |
+
- local: api/pipelines/control_flux_inpaint
|
| 485 |
+
title: FluxControlInpaint
|
| 486 |
+
- local: api/pipelines/framepack
|
| 487 |
+
title: Framepack
|
| 488 |
+
- local: api/pipelines/hidream
|
| 489 |
+
title: HiDream-I1
|
| 490 |
+
- local: api/pipelines/hunyuandit
|
| 491 |
+
title: Hunyuan-DiT
|
| 492 |
+
- local: api/pipelines/hunyuan_video
|
| 493 |
+
title: HunyuanVideo
|
| 494 |
+
- local: api/pipelines/i2vgenxl
|
| 495 |
+
title: I2VGen-XL
|
| 496 |
+
- local: api/pipelines/pix2pix
|
| 497 |
+
title: InstructPix2Pix
|
| 498 |
+
- local: api/pipelines/kandinsky
|
| 499 |
+
title: Kandinsky 2.1
|
| 500 |
+
- local: api/pipelines/kandinsky_v22
|
| 501 |
+
title: Kandinsky 2.2
|
| 502 |
+
- local: api/pipelines/kandinsky3
|
| 503 |
+
title: Kandinsky 3
|
| 504 |
+
- local: api/pipelines/kolors
|
| 505 |
+
title: Kolors
|
| 506 |
+
- local: api/pipelines/latent_consistency_models
|
| 507 |
+
title: Latent Consistency Models
|
| 508 |
+
- local: api/pipelines/latent_diffusion
|
| 509 |
+
title: Latent Diffusion
|
| 510 |
+
- local: api/pipelines/latte
|
| 511 |
+
title: Latte
|
| 512 |
+
- local: api/pipelines/ledits_pp
|
| 513 |
+
title: LEDITS++
|
| 514 |
+
- local: api/pipelines/ltx_video
|
| 515 |
+
title: LTXVideo
|
| 516 |
+
- local: api/pipelines/lumina2
|
| 517 |
+
title: Lumina 2.0
|
| 518 |
+
- local: api/pipelines/lumina
|
| 519 |
+
title: Lumina-T2X
|
| 520 |
+
- local: api/pipelines/marigold
|
| 521 |
+
title: Marigold
|
| 522 |
+
- local: api/pipelines/mochi
|
| 523 |
+
title: Mochi
|
| 524 |
+
- local: api/pipelines/panorama
|
| 525 |
+
title: MultiDiffusion
|
| 526 |
+
- local: api/pipelines/musicldm
|
| 527 |
+
title: MusicLDM
|
| 528 |
+
- local: api/pipelines/omnigen
|
| 529 |
+
title: OmniGen
|
| 530 |
+
- local: api/pipelines/pag
|
| 531 |
+
title: PAG
|
| 532 |
+
- local: api/pipelines/paint_by_example
|
| 533 |
+
title: Paint by Example
|
| 534 |
+
- local: api/pipelines/pia
|
| 535 |
+
title: Personalized Image Animator (PIA)
|
| 536 |
+
- local: api/pipelines/pixart
|
| 537 |
+
title: PixArt-α
|
| 538 |
+
- local: api/pipelines/pixart_sigma
|
| 539 |
+
title: PixArt-Σ
|
| 540 |
+
- local: api/pipelines/sana
|
| 541 |
+
title: Sana
|
| 542 |
+
- local: api/pipelines/sana_sprint
|
| 543 |
+
title: Sana Sprint
|
| 544 |
+
- local: api/pipelines/self_attention_guidance
|
| 545 |
+
title: Self-Attention Guidance
|
| 546 |
+
- local: api/pipelines/semantic_stable_diffusion
|
| 547 |
+
title: Semantic Guidance
|
| 548 |
+
- local: api/pipelines/shap_e
|
| 549 |
+
title: Shap-E
|
| 550 |
+
- local: api/pipelines/stable_audio
|
| 551 |
+
title: Stable Audio
|
| 552 |
+
- local: api/pipelines/stable_cascade
|
| 553 |
+
title: Stable Cascade
|
| 554 |
+
- sections:
|
| 555 |
+
- local: api/pipelines/stable_diffusion/overview
|
| 556 |
+
title: Overview
|
| 557 |
+
- local: api/pipelines/stable_diffusion/depth2img
|
| 558 |
+
title: Depth-to-image
|
| 559 |
+
- local: api/pipelines/stable_diffusion/gligen
|
| 560 |
+
title: GLIGEN (Grounded Language-to-Image Generation)
|
| 561 |
+
- local: api/pipelines/stable_diffusion/image_variation
|
| 562 |
+
title: Image variation
|
| 563 |
+
- local: api/pipelines/stable_diffusion/img2img
|
| 564 |
+
title: Image-to-image
|
| 565 |
+
- local: api/pipelines/stable_diffusion/svd
|
| 566 |
+
title: Image-to-video
|
| 567 |
+
- local: api/pipelines/stable_diffusion/inpaint
|
| 568 |
+
title: Inpainting
|
| 569 |
+
- local: api/pipelines/stable_diffusion/k_diffusion
|
| 570 |
+
title: K-Diffusion
|
| 571 |
+
- local: api/pipelines/stable_diffusion/latent_upscale
|
| 572 |
+
title: Latent upscaler
|
| 573 |
+
- local: api/pipelines/stable_diffusion/ldm3d_diffusion
|
| 574 |
+
title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
|
| 575 |
+
- local: api/pipelines/stable_diffusion/stable_diffusion_safe
|
| 576 |
+
title: Safe Stable Diffusion
|
| 577 |
+
- local: api/pipelines/stable_diffusion/sdxl_turbo
|
| 578 |
+
title: SDXL Turbo
|
| 579 |
+
- local: api/pipelines/stable_diffusion/stable_diffusion_2
|
| 580 |
+
title: Stable Diffusion 2
|
| 581 |
+
- local: api/pipelines/stable_diffusion/stable_diffusion_3
|
| 582 |
+
title: Stable Diffusion 3
|
| 583 |
+
- local: api/pipelines/stable_diffusion/stable_diffusion_xl
|
| 584 |
+
title: Stable Diffusion XL
|
| 585 |
+
- local: api/pipelines/stable_diffusion/upscale
|
| 586 |
+
title: Super-resolution
|
| 587 |
+
- local: api/pipelines/stable_diffusion/adapter
|
| 588 |
+
title: T2I-Adapter
|
| 589 |
+
- local: api/pipelines/stable_diffusion/text2img
|
| 590 |
+
title: Text-to-image
|
| 591 |
+
title: Stable Diffusion
|
| 592 |
+
- local: api/pipelines/stable_unclip
|
| 593 |
+
title: Stable unCLIP
|
| 594 |
+
- local: api/pipelines/text_to_video
|
| 595 |
+
title: Text-to-video
|
| 596 |
+
- local: api/pipelines/text_to_video_zero
|
| 597 |
+
title: Text2Video-Zero
|
| 598 |
+
- local: api/pipelines/unclip
|
| 599 |
+
title: unCLIP
|
| 600 |
+
- local: api/pipelines/unidiffuser
|
| 601 |
+
title: UniDiffuser
|
| 602 |
+
- local: api/pipelines/value_guided_sampling
|
| 603 |
+
title: Value-guided sampling
|
| 604 |
+
- local: api/pipelines/visualcloze
|
| 605 |
+
title: VisualCloze
|
| 606 |
+
- local: api/pipelines/wan
|
| 607 |
+
title: Wan
|
| 608 |
+
- local: api/pipelines/wuerstchen
|
| 609 |
+
title: Wuerstchen
|
| 610 |
+
title: Pipelines
|
| 611 |
+
- isExpanded: false
|
| 612 |
+
sections:
|
| 613 |
+
- local: api/schedulers/overview
|
| 614 |
+
title: Overview
|
| 615 |
+
- local: api/schedulers/cm_stochastic_iterative
|
| 616 |
+
title: CMStochasticIterativeScheduler
|
| 617 |
+
- local: api/schedulers/ddim_cogvideox
|
| 618 |
+
title: CogVideoXDDIMScheduler
|
| 619 |
+
- local: api/schedulers/multistep_dpm_solver_cogvideox
|
| 620 |
+
title: CogVideoXDPMScheduler
|
| 621 |
+
- local: api/schedulers/consistency_decoder
|
| 622 |
+
title: ConsistencyDecoderScheduler
|
| 623 |
+
- local: api/schedulers/cosine_dpm
|
| 624 |
+
title: CosineDPMSolverMultistepScheduler
|
| 625 |
+
- local: api/schedulers/ddim_inverse
|
| 626 |
+
title: DDIMInverseScheduler
|
| 627 |
+
- local: api/schedulers/ddim
|
| 628 |
+
title: DDIMScheduler
|
| 629 |
+
- local: api/schedulers/ddpm
|
| 630 |
+
title: DDPMScheduler
|
| 631 |
+
- local: api/schedulers/deis
|
| 632 |
+
title: DEISMultistepScheduler
|
| 633 |
+
- local: api/schedulers/multistep_dpm_solver_inverse
|
| 634 |
+
title: DPMSolverMultistepInverse
|
| 635 |
+
- local: api/schedulers/multistep_dpm_solver
|
| 636 |
+
title: DPMSolverMultistepScheduler
|
| 637 |
+
- local: api/schedulers/dpm_sde
|
| 638 |
+
title: DPMSolverSDEScheduler
|
| 639 |
+
- local: api/schedulers/singlestep_dpm_solver
|
| 640 |
+
title: DPMSolverSinglestepScheduler
|
| 641 |
+
- local: api/schedulers/edm_multistep_dpm_solver
|
| 642 |
+
title: EDMDPMSolverMultistepScheduler
|
| 643 |
+
- local: api/schedulers/edm_euler
|
| 644 |
+
title: EDMEulerScheduler
|
| 645 |
+
- local: api/schedulers/euler_ancestral
|
| 646 |
+
title: EulerAncestralDiscreteScheduler
|
| 647 |
+
- local: api/schedulers/euler
|
| 648 |
+
title: EulerDiscreteScheduler
|
| 649 |
+
- local: api/schedulers/flow_match_euler_discrete
|
| 650 |
+
title: FlowMatchEulerDiscreteScheduler
|
| 651 |
+
- local: api/schedulers/flow_match_heun_discrete
|
| 652 |
+
title: FlowMatchHeunDiscreteScheduler
|
| 653 |
+
- local: api/schedulers/heun
|
| 654 |
+
title: HeunDiscreteScheduler
|
| 655 |
+
- local: api/schedulers/ipndm
|
| 656 |
+
title: IPNDMScheduler
|
| 657 |
+
- local: api/schedulers/stochastic_karras_ve
|
| 658 |
+
title: KarrasVeScheduler
|
| 659 |
+
- local: api/schedulers/dpm_discrete_ancestral
|
| 660 |
+
title: KDPM2AncestralDiscreteScheduler
|
| 661 |
+
- local: api/schedulers/dpm_discrete
|
| 662 |
+
title: KDPM2DiscreteScheduler
|
| 663 |
+
- local: api/schedulers/lcm
|
| 664 |
+
title: LCMScheduler
|
| 665 |
+
- local: api/schedulers/lms_discrete
|
| 666 |
+
title: LMSDiscreteScheduler
|
| 667 |
+
- local: api/schedulers/pndm
|
| 668 |
+
title: PNDMScheduler
|
| 669 |
+
- local: api/schedulers/repaint
|
| 670 |
+
title: RePaintScheduler
|
| 671 |
+
- local: api/schedulers/score_sde_ve
|
| 672 |
+
title: ScoreSdeVeScheduler
|
| 673 |
+
- local: api/schedulers/score_sde_vp
|
| 674 |
+
title: ScoreSdeVpScheduler
|
| 675 |
+
- local: api/schedulers/tcd
|
| 676 |
+
title: TCDScheduler
|
| 677 |
+
- local: api/schedulers/unipc
|
| 678 |
+
title: UniPCMultistepScheduler
|
| 679 |
+
- local: api/schedulers/vq_diffusion
|
| 680 |
+
title: VQDiffusionScheduler
|
| 681 |
+
title: Schedulers
|
| 682 |
+
- isExpanded: false
|
| 683 |
+
sections:
|
| 684 |
+
- local: api/internal_classes_overview
|
| 685 |
+
title: Overview
|
| 686 |
+
- local: api/attnprocessor
|
| 687 |
+
title: Attention Processor
|
| 688 |
+
- local: api/activations
|
| 689 |
+
title: Custom activation functions
|
| 690 |
+
- local: api/cache
|
| 691 |
+
title: Caching methods
|
| 692 |
+
- local: api/normalization
|
| 693 |
+
title: Custom normalization layers
|
| 694 |
+
- local: api/utilities
|
| 695 |
+
title: Utilities
|
| 696 |
+
- local: api/image_processor
|
| 697 |
+
title: VAE Image Processor
|
| 698 |
+
- local: api/video_processor
|
| 699 |
+
title: Video Processor
|
| 700 |
+
title: Internal classes
|
| 701 |
+
title: API
|
diffusers/docs/source/en/community_projects.md
ADDED
|
@@ -0,0 +1,90 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
| 2 |
+
|
| 3 |
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
| 4 |
+
the License. You may obtain a copy of the License at
|
| 5 |
+
|
| 6 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 7 |
+
|
| 8 |
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
| 9 |
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
| 10 |
+
specific language governing permissions and limitations under the License.
|
| 11 |
+
-->
|
| 12 |
+
|
| 13 |
+
# Community Projects
|
| 14 |
+
|
| 15 |
+
Welcome to Community Projects. This space is dedicated to showcasing the incredible work and innovative applications created by our vibrant community using the `diffusers` library.
|
| 16 |
+
|
| 17 |
+
This section aims to:
|
| 18 |
+
|
| 19 |
+
- Highlight diverse and inspiring projects built with `diffusers`
|
| 20 |
+
- Foster knowledge sharing within our community
|
| 21 |
+
- Provide real-world examples of how `diffusers` can be leveraged
|
| 22 |
+
|
| 23 |
+
Happy exploring, and thank you for being part of the Diffusers community!
|
| 24 |
+
|
| 25 |
+
<table>
|
| 26 |
+
<tr>
|
| 27 |
+
<th>Project Name</th>
|
| 28 |
+
<th>Description</th>
|
| 29 |
+
</tr>
|
| 30 |
+
<tr style="border-top: 2px solid black">
|
| 31 |
+
<td><a href="https://github.com/carson-katri/dream-textures"> dream-textures </a></td>
|
| 32 |
+
<td>Stable Diffusion built-in to Blender</td>
|
| 33 |
+
</tr>
|
| 34 |
+
<tr style="border-top: 2px solid black">
|
| 35 |
+
<td><a href="https://github.com/megvii-research/HiDiffusion"> HiDiffusion </a></td>
|
| 36 |
+
<td>Increases the resolution and speed of your diffusion model by only adding a single line of code</td>
|
| 37 |
+
</tr>
|
| 38 |
+
<tr style="border-top: 2px solid black">
|
| 39 |
+
<td><a href="https://github.com/lllyasviel/IC-Light"> IC-Light </a></td>
|
| 40 |
+
<td>IC-Light is a project to manipulate the illumination of images</td>
|
| 41 |
+
</tr>
|
| 42 |
+
<tr style="border-top: 2px solid black">
|
| 43 |
+
<td><a href="https://github.com/InstantID/InstantID"> InstantID </a></td>
|
| 44 |
+
<td>InstantID : Zero-shot Identity-Preserving Generation in Seconds</td>
|
| 45 |
+
</tr>
|
| 46 |
+
<tr style="border-top: 2px solid black">
|
| 47 |
+
<td><a href="https://github.com/Sanster/IOPaint"> IOPaint </a></td>
|
| 48 |
+
<td>Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.</td>
|
| 49 |
+
</tr>
|
| 50 |
+
<tr style="border-top: 2px solid black">
|
| 51 |
+
<td><a href="https://github.com/bmaltais/kohya_ss"> Kohya </a></td>
|
| 52 |
+
<td>Gradio GUI for Kohya's Stable Diffusion trainers</td>
|
| 53 |
+
</tr>
|
| 54 |
+
<tr style="border-top: 2px solid black">
|
| 55 |
+
<td><a href="https://github.com/magic-research/magic-animate"> MagicAnimate </a></td>
|
| 56 |
+
<td>MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model</td>
|
| 57 |
+
</tr>
|
| 58 |
+
<tr style="border-top: 2px solid black">
|
| 59 |
+
<td><a href="https://github.com/levihsu/OOTDiffusion"> OOTDiffusion </a></td>
|
| 60 |
+
<td>Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on</td>
|
| 61 |
+
</tr>
|
| 62 |
+
<tr style="border-top: 2px solid black">
|
| 63 |
+
<td><a href="https://github.com/vladmandic/automatic"> SD.Next </a></td>
|
| 64 |
+
<td>SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models</td>
|
| 65 |
+
</tr>
|
| 66 |
+
<tr style="border-top: 2px solid black">
|
| 67 |
+
<td><a href="https://github.com/ashawkey/stable-dreamfusion"> stable-dreamfusion </a></td>
|
| 68 |
+
<td>Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion</td>
|
| 69 |
+
</tr>
|
| 70 |
+
<tr style="border-top: 2px solid black">
|
| 71 |
+
<td><a href="https://github.com/HVision-NKU/StoryDiffusion"> StoryDiffusion </a></td>
|
| 72 |
+
<td>StoryDiffusion can create a magic story by generating consistent images and videos.</td>
|
| 73 |
+
</tr>
|
| 74 |
+
<tr style="border-top: 2px solid black">
|
| 75 |
+
<td><a href="https://github.com/cumulo-autumn/StreamDiffusion"> StreamDiffusion </a></td>
|
| 76 |
+
<td>A Pipeline-Level Solution for Real-Time Interactive Generation</td>
|
| 77 |
+
</tr>
|
| 78 |
+
<tr style="border-top: 2px solid black">
|
| 79 |
+
<td><a href="https://github.com/Netwrck/stable-diffusion-server"> Stable Diffusion Server </a></td>
|
| 80 |
+
<td>A server configured for Inpainting/Generation/img2img with one stable diffusion model</td>
|
| 81 |
+
</tr>
|
| 82 |
+
<tr style="border-top: 2px solid black">
|
| 83 |
+
<td><a href="https://github.com/suzukimain/auto_diffusers"> Model Search </a></td>
|
| 84 |
+
<td>Search models on Civitai and Hugging Face</td>
|
| 85 |
+
</tr>
|
| 86 |
+
<tr style="border-top: 2px solid black">
|
| 87 |
+
<td><a href="https://github.com/beinsezii/skrample"> Skrample </a></td>
|
| 88 |
+
<td>Fully modular scheduler functions with 1st class diffusers integration.</td>
|
| 89 |
+
</tr>
|
| 90 |
+
</table>
|
diffusers/docs/source/en/conceptual/contribution.md
ADDED
|
@@ -0,0 +1,568 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
| 2 |
+
|
| 3 |
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
| 4 |
+
the License. You may obtain a copy of the License at
|
| 5 |
+
|
| 6 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 7 |
+
|
| 8 |
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
| 9 |
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
| 10 |
+
specific language governing permissions and limitations under the License.
|
| 11 |
+
-->
|
| 12 |
+
|
| 13 |
+
# How to contribute to Diffusers 🧨
|
| 14 |
+
|
| 15 |
+
We ❤️ contributions from the open-source community! Everyone is welcome, and all types of participation –not just code– are valued and appreciated. Answering questions, helping others, reaching out, and improving the documentation are all immensely valuable to the community, so don't be afraid and get involved if you're up for it!
|
| 16 |
+
|
| 17 |
+
Everyone is encouraged to start by saying 👋 in our public Discord channel. We discuss the latest trends in diffusion models, ask questions, show off personal projects, help each other with contributions, or just hang out ☕. <a href="https://Discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=discord&logoColor=white"></a>
|
| 18 |
+
|
| 19 |
+
Whichever way you choose to contribute, we strive to be part of an open, welcoming, and kind community. Please, read our [code of conduct](https://github.com/huggingface/diffusers/blob/main/CODE_OF_CONDUCT.md) and be mindful to respect it during your interactions. We also recommend you become familiar with the [ethical guidelines](https://huggingface.co/docs/diffusers/conceptual/ethical_guidelines) that guide our project and ask you to adhere to the same principles of transparency and responsibility.
|
| 20 |
+
|
| 21 |
+
We enormously value feedback from the community, so please do not be afraid to speak up if you believe you have valuable feedback that can help improve the library - every message, comment, issue, and pull request (PR) is read and considered.
|
| 22 |
+
|
| 23 |
+
## Overview
|
| 24 |
+
|
| 25 |
+
You can contribute in many ways ranging from answering questions on issues and discussions to adding new diffusion models to the core library.
|
| 26 |
+
|
| 27 |
+
In the following, we give an overview of different ways to contribute, ranked by difficulty in ascending order. All of them are valuable to the community.
|
| 28 |
+
|
| 29 |
+
* 1. Asking and answering questions on [the Diffusers discussion forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers) or on [Discord](https://discord.gg/G7tWnz98XR).
|
| 30 |
+
* 2. Opening new issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues/new/choose) or new discussions on [the GitHub Discussions tab](https://github.com/huggingface/diffusers/discussions/new/choose).
|
| 31 |
+
* 3. Answering issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues) or discussions on [the GitHub Discussions tab](https://github.com/huggingface/diffusers/discussions).
|
| 32 |
+
* 4. Fix a simple issue, marked by the "Good first issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
|
| 33 |
+
* 5. Contribute to the [documentation](https://github.com/huggingface/diffusers/tree/main/docs/source).
|
| 34 |
+
* 6. Contribute a [Community Pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3Acommunity-examples).
|
| 35 |
+
* 7. Contribute to the [examples](https://github.com/huggingface/diffusers/tree/main/examples).
|
| 36 |
+
* 8. Fix a more difficult issue, marked by the "Good second issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22Good+second+issue%22).
|
| 37 |
+
* 9. Add a new pipeline, model, or scheduler, see ["New Pipeline/Model"](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22) and ["New scheduler"](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22) issues. For this contribution, please have a look at [Design Philosophy](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md).
|
| 38 |
+
|
| 39 |
+
As said before, **all contributions are valuable to the community**.
|
| 40 |
+
In the following, we will explain each contribution a bit more in detail.
|
| 41 |
+
|
| 42 |
+
For all contributions 4 - 9, you will need to open a PR. It is explained in detail how to do so in [Opening a pull request](#how-to-open-a-pr).
|
| 43 |
+
|
| 44 |
+
### 1. Asking and answering questions on the Diffusers discussion forum or on the Diffusers Discord
|
| 45 |
+
|
| 46 |
+
Any question or comment related to the Diffusers library can be asked on the [discussion forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/) or on [Discord](https://discord.gg/G7tWnz98XR). Such questions and comments include (but are not limited to):
|
| 47 |
+
- Reports of training or inference experiments in an attempt to share knowledge
|
| 48 |
+
- Presentation of personal projects
|
| 49 |
+
- Questions to non-official training examples
|
| 50 |
+
- Project proposals
|
| 51 |
+
- General feedback
|
| 52 |
+
- Paper summaries
|
| 53 |
+
- Asking for help on personal projects that build on top of the Diffusers library
|
| 54 |
+
- General questions
|
| 55 |
+
- Ethical questions regarding diffusion models
|
| 56 |
+
- ...
|
| 57 |
+
|
| 58 |
+
Every question that is asked on the forum or on Discord actively encourages the community to publicly
|
| 59 |
+
share knowledge and might very well help a beginner in the future who has the same question you're
|
| 60 |
+
having. Please do pose any questions you might have.
|
| 61 |
+
In the same spirit, you are of immense help to the community by answering such questions because this way you are publicly documenting knowledge for everybody to learn from.
|
| 62 |
+
|
| 63 |
+
**Please** keep in mind that the more effort you put into asking or answering a question, the higher
|
| 64 |
+
the quality of the publicly documented knowledge. In the same way, well-posed and well-answered questions create a high-quality knowledge database accessible to everybody, while badly posed questions or answers reduce the overall quality of the public knowledge database.
|
| 65 |
+
In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formatted/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
|
| 66 |
+
|
| 67 |
+
**NOTE about channels**:
|
| 68 |
+
[*The forum*](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) is much better indexed by search engines, such as Google. Posts are ranked by popularity rather than chronologically. Hence, it's easier to look up questions and answers that we posted some time ago.
|
| 69 |
+
In addition, questions and answers posted in the forum can easily be linked to.
|
| 70 |
+
In contrast, *Discord* has a chat-like format that invites fast back-and-forth communication.
|
| 71 |
+
While it will most likely take less time for you to get an answer to your question on Discord, your
|
| 72 |
+
question won't be visible anymore over time. Also, it's much harder to find information that was posted a while back on Discord. We therefore strongly recommend using the forum for high-quality questions and answers in an attempt to create long-lasting knowledge for the community. If discussions on Discord lead to very interesting answers and conclusions, we recommend posting the results on the forum to make the information more available for future readers.
|
| 73 |
+
|
| 74 |
+
### 2. Opening new issues on the GitHub issues tab
|
| 75 |
+
|
| 76 |
+
The 🧨 Diffusers library is robust and reliable thanks to the users who notify us of
|
| 77 |
+
the problems they encounter. So thank you for reporting an issue.
|
| 78 |
+
|
| 79 |
+
Remember, GitHub issues are reserved for technical questions directly related to the Diffusers library, bug reports, feature requests, or feedback on the library design.
|
| 80 |
+
|
| 81 |
+
In a nutshell, this means that everything that is **not** related to the **code of the Diffusers library** (including the documentation) should **not** be asked on GitHub, but rather on either the [forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) or [Discord](https://discord.gg/G7tWnz98XR).
|
| 82 |
+
|
| 83 |
+
**Please consider the following guidelines when opening a new issue**:
|
| 84 |
+
- Make sure you have searched whether your issue has already been asked before (use the search bar on GitHub under Issues).
|
| 85 |
+
- Please never report a new issue on another (related) issue. If another issue is highly related, please
|
| 86 |
+
open a new issue nevertheless and link to the related issue.
|
| 87 |
+
- Make sure your issue is written in English. Please use one of the great, free online translation services, such as [DeepL](https://www.deepl.com/translator) to translate from your native language to English if you are not comfortable in English.
|
| 88 |
+
- Check whether your issue might be solved by updating to the newest Diffusers version. Before posting your issue, please make sure that `python -c "import diffusers; print(diffusers.__version__)"` is higher or matches the latest Diffusers version.
|
| 89 |
+
- Remember that the more effort you put into opening a new issue, the higher the quality of your answer will be and the better the overall quality of the Diffusers issues.
|
| 90 |
+
|
| 91 |
+
New issues usually include the following.
|
| 92 |
+
|
| 93 |
+
#### 2.1. Reproducible, minimal bug reports
|
| 94 |
+
|
| 95 |
+
A bug report should always have a reproducible code snippet and be as minimal and concise as possible.
|
| 96 |
+
This means in more detail:
|
| 97 |
+
- Narrow the bug down as much as you can, **do not just dump your whole code file**.
|
| 98 |
+
- Format your code.
|
| 99 |
+
- Do not include any external libraries except for Diffusers depending on them.
|
| 100 |
+
- **Always** provide all necessary information about your environment; for this, you can run: `diffusers-cli env` in your shell and copy-paste the displayed information to the issue.
|
| 101 |
+
- Explain the issue. If the reader doesn't know what the issue is and why it is an issue, (s)he cannot solve it.
|
| 102 |
+
- **Always** make sure the reader can reproduce your issue with as little effort as possible. If your code snippet cannot be run because of missing libraries or undefined variables, the reader cannot help you. Make sure your reproducible code snippet is as minimal as possible and can be copy-pasted into a simple Python shell.
|
| 103 |
+
- If in order to reproduce your issue a model and/or dataset is required, make sure the reader has access to that model or dataset. You can always upload your model or dataset to the [Hub](https://huggingface.co) to make it easily downloadable. Try to keep your model and dataset as small as possible, to make the reproduction of your issue as effortless as possible.
|
| 104 |
+
|
| 105 |
+
For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
|
| 106 |
+
|
| 107 |
+
You can open a bug report [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=bug&projects=&template=bug-report.yml).
|
| 108 |
+
|
| 109 |
+
#### 2.2. Feature requests
|
| 110 |
+
|
| 111 |
+
A world-class feature request addresses the following points:
|
| 112 |
+
|
| 113 |
+
1. Motivation first:
|
| 114 |
+
* Is it related to a problem/frustration with the library? If so, please explain
|
| 115 |
+
why. Providing a code snippet that demonstrates the problem is best.
|
| 116 |
+
* Is it related to something you would need for a project? We'd love to hear
|
| 117 |
+
about it!
|
| 118 |
+
* Is it something you worked on and think could benefit the community?
|
| 119 |
+
Awesome! Tell us what problem it solved for you.
|
| 120 |
+
2. Write a *full paragraph* describing the feature;
|
| 121 |
+
3. Provide a **code snippet** that demonstrates its future use;
|
| 122 |
+
4. In case this is related to a paper, please attach a link;
|
| 123 |
+
5. Attach any additional information (drawings, screenshots, etc.) you think may help.
|
| 124 |
+
|
| 125 |
+
You can open a feature request [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=).
|
| 126 |
+
|
| 127 |
+
#### 2.3 Feedback
|
| 128 |
+
|
| 129 |
+
Feedback about the library design and why it is good or not good helps the core maintainers immensely to build a user-friendly library. To understand the philosophy behind the current design philosophy, please have a look [here](https://huggingface.co/docs/diffusers/conceptual/philosophy). If you feel like a certain design choice does not fit with the current design philosophy, please explain why and how it should be changed. If a certain design choice follows the design philosophy too much, hence restricting use cases, explain why and how it should be changed.
|
| 130 |
+
If a certain design choice is very useful for you, please also leave a note as this is great feedback for future design decisions.
|
| 131 |
+
|
| 132 |
+
You can open an issue about feedback [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=).
|
| 133 |
+
|
| 134 |
+
#### 2.4 Technical questions
|
| 135 |
+
|
| 136 |
+
Technical questions are mainly about why certain code of the library was written in a certain way, or what a certain part of the code does. Please make sure to link to the code in question and please provide details on
|
| 137 |
+
why this part of the code is difficult to understand.
|
| 138 |
+
|
| 139 |
+
You can open an issue about a technical question [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=bug&template=bug-report.yml).
|
| 140 |
+
|
| 141 |
+
#### 2.5 Proposal to add a new model, scheduler, or pipeline
|
| 142 |
+
|
| 143 |
+
If the diffusion model community released a new model, pipeline, or scheduler that you would like to see in the Diffusers library, please provide the following information:
|
| 144 |
+
|
| 145 |
+
* Short description of the diffusion pipeline, model, or scheduler and link to the paper or public release.
|
| 146 |
+
* Link to any of its open-source implementation(s).
|
| 147 |
+
* Link to the model weights if they are available.
|
| 148 |
+
|
| 149 |
+
If you are willing to contribute to the model yourself, let us know so we can best guide you. Also, don't forget
|
| 150 |
+
to tag the original author of the component (model, scheduler, pipeline, etc.) by GitHub handle if you can find it.
|
| 151 |
+
|
| 152 |
+
You can open a request for a model/pipeline/scheduler [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=New+model%2Fpipeline%2Fscheduler&template=new-model-addition.yml).
|
| 153 |
+
|
| 154 |
+
### 3. Answering issues on the GitHub issues tab
|
| 155 |
+
|
| 156 |
+
Answering issues on GitHub might require some technical knowledge of Diffusers, but we encourage everybody to give it a try even if you are not 100% certain that your answer is correct.
|
| 157 |
+
Some tips to give a high-quality answer to an issue:
|
| 158 |
+
- Be as concise and minimal as possible.
|
| 159 |
+
- Stay on topic. An answer to the issue should concern the issue and only the issue.
|
| 160 |
+
- Provide links to code, papers, or other sources that prove or encourage your point.
|
| 161 |
+
- Answer in code. If a simple code snippet is the answer to the issue or shows how the issue can be solved, please provide a fully reproducible code snippet.
|
| 162 |
+
|
| 163 |
+
Also, many issues tend to be simply off-topic, duplicates of other issues, or irrelevant. It is of great
|
| 164 |
+
help to the maintainers if you can answer such issues, encouraging the author of the issue to be
|
| 165 |
+
more precise, provide the link to a duplicated issue or redirect them to [the forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) or [Discord](https://discord.gg/G7tWnz98XR).
|
| 166 |
+
|
| 167 |
+
If you have verified that the issued bug report is correct and requires a correction in the source code,
|
| 168 |
+
please have a look at the next sections.
|
| 169 |
+
|
| 170 |
+
For all of the following contributions, you will need to open a PR. It is explained in detail how to do so in the [Opening a pull request](#how-to-open-a-pr) section.
|
| 171 |
+
|
| 172 |
+
### 4. Fixing a "Good first issue"
|
| 173 |
+
|
| 174 |
+
*Good first issues* are marked by the [Good first issue](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) label. Usually, the issue already
|
| 175 |
+
explains how a potential solution should look so that it is easier to fix.
|
| 176 |
+
If the issue hasn't been closed and you would like to try to fix this issue, you can just leave a message "I would like to try this issue.". There are usually three scenarios:
|
| 177 |
+
- a.) The issue description already proposes a fix. In this case and if the solution makes sense to you, you can open a PR or draft PR to fix it.
|
| 178 |
+
- b.) The issue description does not propose a fix. In this case, you can ask what a proposed fix could look like and someone from the Diffusers team should answer shortly. If you have a good idea of how to fix it, feel free to directly open a PR.
|
| 179 |
+
- c.) There is already an open PR to fix the issue, but the issue hasn't been closed yet. If the PR has gone stale, you can simply open a new PR and link to the stale PR. PRs often go stale if the original contributor who wanted to fix the issue suddenly cannot find the time anymore to proceed. This often happens in open-source and is very normal. In this case, the community will be very happy if you give it a new try and leverage the knowledge of the existing PR. If there is already a PR and it is active, you can help the author by giving suggestions, reviewing the PR or even asking whether you can contribute to the PR.
|
| 180 |
+
|
| 181 |
+
|
| 182 |
+
### 5. Contribute to the documentation
|
| 183 |
+
|
| 184 |
+
A good library **always** has good documentation! The official documentation is often one of the first points of contact for new users of the library, and therefore contributing to the documentation is a **highly
|
| 185 |
+
valuable contribution**.
|
| 186 |
+
|
| 187 |
+
Contributing to the library can have many forms:
|
| 188 |
+
|
| 189 |
+
- Correcting spelling or grammatical errors.
|
| 190 |
+
- Correct incorrect formatting of the docstring. If you see that the official documentation is weirdly displayed or a link is broken, we would be very happy if you take some time to correct it.
|
| 191 |
+
- Correct the shape or dimensions of a docstring input or output tensor.
|
| 192 |
+
- Clarify documentation that is hard to understand or incorrect.
|
| 193 |
+
- Update outdated code examples.
|
| 194 |
+
- Translating the documentation to another language.
|
| 195 |
+
|
| 196 |
+
Anything displayed on [the official Diffusers doc page](https://huggingface.co/docs/diffusers/index) is part of the official documentation and can be corrected, adjusted in the respective [documentation source](https://github.com/huggingface/diffusers/tree/main/docs/source).
|
| 197 |
+
|
| 198 |
+
Please have a look at [this page](https://github.com/huggingface/diffusers/tree/main/docs) on how to verify changes made to the documentation locally.
|
| 199 |
+
|
| 200 |
+
### 6. Contribute a community pipeline
|
| 201 |
+
|
| 202 |
+
> [!TIP]
|
| 203 |
+
> Read the [Community pipelines](../using-diffusers/custom_pipeline_overview#community-pipelines) guide to learn more about the difference between a GitHub and Hugging Face Hub community pipeline. If you're interested in why we have community pipelines, take a look at GitHub Issue [#841](https://github.com/huggingface/diffusers/issues/841) (basically, we can't maintain all the possible ways diffusion models can be used for inference but we also don't want to prevent the community from building them).
|
| 204 |
+
|
| 205 |
+
Contributing a community pipeline is a great way to share your creativity and work with the community. It lets you build on top of the [`DiffusionPipeline`] so that anyone can load and use it by setting the `custom_pipeline` parameter. This section will walk you through how to create a simple pipeline where the UNet only does a single forward pass and calls the scheduler once (a "one-step" pipeline).
|
| 206 |
+
|
| 207 |
+
1. Create a one_step_unet.py file for your community pipeline. This file can contain whatever package you want to use as long as it's installed by the user. Make sure you only have one pipeline class that inherits from [`DiffusionPipeline`] to load model weights and the scheduler configuration from the Hub. Add a UNet and scheduler to the `__init__` function.
|
| 208 |
+
|
| 209 |
+
You should also add the `register_modules` function to ensure your pipeline and its components can be saved with [`~DiffusionPipeline.save_pretrained`].
|
| 210 |
+
|
| 211 |
+
```py
|
| 212 |
+
from diffusers import DiffusionPipeline
|
| 213 |
+
import torch
|
| 214 |
+
|
| 215 |
+
class UnetSchedulerOneForwardPipeline(DiffusionPipeline):
|
| 216 |
+
def __init__(self, unet, scheduler):
|
| 217 |
+
super().__init__()
|
| 218 |
+
|
| 219 |
+
self.register_modules(unet=unet, scheduler=scheduler)
|
| 220 |
+
```
|
| 221 |
+
|
| 222 |
+
1. In the forward pass (which we recommend defining as `__call__`), you can add any feature you'd like. For the "one-step" pipeline, create a random image and call the UNet and scheduler once by setting `timestep=1`.
|
| 223 |
+
|
| 224 |
+
```py
|
| 225 |
+
from diffusers import DiffusionPipeline
|
| 226 |
+
import torch
|
| 227 |
+
|
| 228 |
+
class UnetSchedulerOneForwardPipeline(DiffusionPipeline):
|
| 229 |
+
def __init__(self, unet, scheduler):
|
| 230 |
+
super().__init__()
|
| 231 |
+
|
| 232 |
+
self.register_modules(unet=unet, scheduler=scheduler)
|
| 233 |
+
|
| 234 |
+
def __call__(self):
|
| 235 |
+
image = torch.randn(
|
| 236 |
+
(1, self.unet.config.in_channels, self.unet.config.sample_size, self.unet.config.sample_size),
|
| 237 |
+
)
|
| 238 |
+
timestep = 1
|
| 239 |
+
|
| 240 |
+
model_output = self.unet(image, timestep).sample
|
| 241 |
+
scheduler_output = self.scheduler.step(model_output, timestep, image).prev_sample
|
| 242 |
+
|
| 243 |
+
return scheduler_output
|
| 244 |
+
```
|
| 245 |
+
|
| 246 |
+
Now you can run the pipeline by passing a UNet and scheduler to it or load pretrained weights if the pipeline structure is identical.
|
| 247 |
+
|
| 248 |
+
```py
|
| 249 |
+
from diffusers import DDPMScheduler, UNet2DModel
|
| 250 |
+
|
| 251 |
+
scheduler = DDPMScheduler()
|
| 252 |
+
unet = UNet2DModel()
|
| 253 |
+
|
| 254 |
+
pipeline = UnetSchedulerOneForwardPipeline(unet=unet, scheduler=scheduler)
|
| 255 |
+
output = pipeline()
|
| 256 |
+
# load pretrained weights
|
| 257 |
+
pipeline = UnetSchedulerOneForwardPipeline.from_pretrained("google/ddpm-cifar10-32", use_safetensors=True)
|
| 258 |
+
output = pipeline()
|
| 259 |
+
```
|
| 260 |
+
|
| 261 |
+
You can either share your pipeline as a GitHub community pipeline or Hub community pipeline.
|
| 262 |
+
|
| 263 |
+
<hfoptions id="pipeline type">
|
| 264 |
+
<hfoption id="GitHub pipeline">
|
| 265 |
+
|
| 266 |
+
Share your GitHub pipeline by opening a pull request on the Diffusers [repository](https://github.com/huggingface/diffusers) and add the one_step_unet.py file to the [examples/community](https://github.com/huggingface/diffusers/tree/main/examples/community) subfolder.
|
| 267 |
+
|
| 268 |
+
</hfoption>
|
| 269 |
+
<hfoption id="Hub pipeline">
|
| 270 |
+
|
| 271 |
+
Share your Hub pipeline by creating a model repository on the Hub and uploading the one_step_unet.py file to it.
|
| 272 |
+
|
| 273 |
+
</hfoption>
|
| 274 |
+
</hfoptions>
|
| 275 |
+
|
| 276 |
+
### 7. Contribute to training examples
|
| 277 |
+
|
| 278 |
+
Diffusers examples are a collection of training scripts that reside in [examples](https://github.com/huggingface/diffusers/tree/main/examples).
|
| 279 |
+
|
| 280 |
+
We support two types of training examples:
|
| 281 |
+
|
| 282 |
+
- Official training examples
|
| 283 |
+
- Research training examples
|
| 284 |
+
|
| 285 |
+
Research training examples are located in [examples/research_projects](https://github.com/huggingface/diffusers/tree/main/examples/research_projects) whereas official training examples include all folders under [examples](https://github.com/huggingface/diffusers/tree/main/examples) except the `research_projects` and `community` folders.
|
| 286 |
+
The official training examples are maintained by the Diffusers' core maintainers whereas the research training examples are maintained by the community.
|
| 287 |
+
This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models.
|
| 288 |
+
If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author.
|
| 289 |
+
|
| 290 |
+
Both official training and research examples consist of a directory that contains one or more training scripts, a `requirements.txt` file, and a `README.md` file. In order for the user to make use of the
|
| 291 |
+
training examples, it is required to clone the repository:
|
| 292 |
+
|
| 293 |
+
```bash
|
| 294 |
+
git clone https://github.com/huggingface/diffusers
|
| 295 |
+
```
|
| 296 |
+
|
| 297 |
+
as well as to install all additional dependencies required for training:
|
| 298 |
+
|
| 299 |
+
```bash
|
| 300 |
+
cd diffusers
|
| 301 |
+
pip install -r examples/<your-example-folder>/requirements.txt
|
| 302 |
+
```
|
| 303 |
+
|
| 304 |
+
Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt).
|
| 305 |
+
|
| 306 |
+
Training examples of the Diffusers library should adhere to the following philosophy:
|
| 307 |
+
- All the code necessary to run the examples should be found in a single Python file.
|
| 308 |
+
- One should be able to run the example from the command line with `python <your-example>.py --args`.
|
| 309 |
+
- Examples should be kept simple and serve as **an example** on how to use Diffusers for training. The purpose of example scripts is **not** to create state-of-the-art diffusion models, but rather to reproduce known training schemes without adding too much custom logic. As a byproduct of this point, our examples also strive to serve as good educational materials.
|
| 310 |
+
|
| 311 |
+
To contribute an example, it is highly recommended to look at already existing examples such as [dreambooth](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py) to get an idea of how they should look like.
|
| 312 |
+
We strongly advise contributors to make use of the [Accelerate library](https://github.com/huggingface/accelerate) as it's tightly integrated
|
| 313 |
+
with Diffusers.
|
| 314 |
+
Once an example script works, please make sure to add a comprehensive `README.md` that states how to use the example exactly. This README should include:
|
| 315 |
+
- An example command on how to run the example script as shown [here](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#running-locally-with-pytorch).
|
| 316 |
+
- A link to some training results (logs, models, etc.) that show what the user can expect as shown [here](https://api.wandb.ai/report/patrickvonplaten/xm6cd5q5).
|
| 317 |
+
- If you are adding a non-official/research training example, **please don't forget** to add a sentence that you are maintaining this training example which includes your git handle as shown [here](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/intel_opts#diffusers-examples-with-intel-optimizations).
|
| 318 |
+
|
| 319 |
+
If you are contributing to the official training examples, please also make sure to add a test to its folder such as [examples/dreambooth/test_dreambooth.py](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/test_dreambooth.py). This is not necessary for non-official training examples.
|
| 320 |
+
|
| 321 |
+
### 8. Fixing a "Good second issue"
|
| 322 |
+
|
| 323 |
+
*Good second issues* are marked by the [Good second issue](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22Good+second+issue%22) label. Good second issues are
|
| 324 |
+
usually more complicated to solve than [Good first issues](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
|
| 325 |
+
The issue description usually gives less guidance on how to fix the issue and requires
|
| 326 |
+
a decent understanding of the library by the interested contributor.
|
| 327 |
+
If you are interested in tackling a good second issue, feel free to open a PR to fix it and link the PR to the issue. If you see that a PR has already been opened for this issue but did not get merged, have a look to understand why it wasn't merged and try to open an improved PR.
|
| 328 |
+
Good second issues are usually more difficult to get merged compared to good first issues, so don't hesitate to ask for help from the core maintainers. If your PR is almost finished the core maintainers can also jump into your PR and commit to it in order to get it merged.
|
| 329 |
+
|
| 330 |
+
### 9. Adding pipelines, models, schedulers
|
| 331 |
+
|
| 332 |
+
Pipelines, models, and schedulers are the most important pieces of the Diffusers library.
|
| 333 |
+
They provide easy access to state-of-the-art diffusion technologies and thus allow the community to
|
| 334 |
+
build powerful generative AI applications.
|
| 335 |
+
|
| 336 |
+
By adding a new model, pipeline, or scheduler you might enable a new powerful use case for any of the user interfaces relying on Diffusers which can be of immense value for the whole generative AI ecosystem.
|
| 337 |
+
|
| 338 |
+
Diffusers has a couple of open feature requests for all three components - feel free to gloss over them
|
| 339 |
+
if you don't know yet what specific component you would like to add:
|
| 340 |
+
- [Model or pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22)
|
| 341 |
+
- [Scheduler](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22)
|
| 342 |
+
|
| 343 |
+
Before adding any of the three components, it is strongly recommended that you give the [Philosophy guide](philosophy) a read to better understand the design of any of the three components. Please be aware that we cannot merge model, scheduler, or pipeline additions that strongly diverge from our design philosophy
|
| 344 |
+
as it will lead to API inconsistencies. If you fundamentally disagree with a design choice, please open a [Feedback issue](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=) instead so that it can be discussed whether a certain design pattern/design choice shall be changed everywhere in the library and whether we shall update our design philosophy. Consistency across the library is very important for us.
|
| 345 |
+
|
| 346 |
+
Please make sure to add links to the original codebase/paper to the PR and ideally also ping the original author directly on the PR so that they can follow the progress and potentially help with questions.
|
| 347 |
+
|
| 348 |
+
If you are unsure or stuck in the PR, don't hesitate to leave a message to ask for a first review or help.
|
| 349 |
+
|
| 350 |
+
#### Copied from mechanism
|
| 351 |
+
|
| 352 |
+
A unique and important feature to understand when adding any pipeline, model or scheduler code is the `# Copied from` mechanism. You'll see this all over the Diffusers codebase, and the reason we use it is to keep the codebase easy to understand and maintain. Marking code with the `# Copied from` mechanism forces the marked code to be identical to the code it was copied from. This makes it easy to update and propagate changes across many files whenever you run `make fix-copies`.
|
| 353 |
+
|
| 354 |
+
For example, in the code example below, [`~diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput`] is the original code and `AltDiffusionPipelineOutput` uses the `# Copied from` mechanism to copy it. The only difference is changing the class prefix from `Stable` to `Alt`.
|
| 355 |
+
|
| 356 |
+
```py
|
| 357 |
+
# Copied from diffusers.pipelines.stable_diffusion.pipeline_output.StableDiffusionPipelineOutput with Stable->Alt
|
| 358 |
+
class AltDiffusionPipelineOutput(BaseOutput):
|
| 359 |
+
"""
|
| 360 |
+
Output class for Alt Diffusion pipelines.
|
| 361 |
+
|
| 362 |
+
Args:
|
| 363 |
+
images (`List[PIL.Image.Image]` or `np.ndarray`)
|
| 364 |
+
List of denoised PIL images of length `batch_size` or NumPy array of shape `(batch_size, height, width,
|
| 365 |
+
num_channels)`.
|
| 366 |
+
nsfw_content_detected (`List[bool]`)
|
| 367 |
+
List indicating whether the corresponding generated image contains "not-safe-for-work" (nsfw) content or
|
| 368 |
+
`None` if safety checking could not be performed.
|
| 369 |
+
"""
|
| 370 |
+
```
|
| 371 |
+
|
| 372 |
+
To learn more, read this section of the [~Don't~ Repeat Yourself*](https://huggingface.co/blog/transformers-design-philosophy#4-machine-learning-models-are-static) blog post.
|
| 373 |
+
|
| 374 |
+
## How to write a good issue
|
| 375 |
+
|
| 376 |
+
**The better your issue is written, the higher the chances that it will be quickly resolved.**
|
| 377 |
+
|
| 378 |
+
1. Make sure that you've used the correct template for your issue. You can pick between *Bug Report*, *Feature Request*, *Feedback about API Design*, *New model/pipeline/scheduler addition*, *Forum*, or a blank issue. Make sure to pick the correct one when opening [a new issue](https://github.com/huggingface/diffusers/issues/new/choose).
|
| 379 |
+
2. **Be precise**: Give your issue a fitting title. Try to formulate your issue description as simple as possible. The more precise you are when submitting an issue, the less time it takes to understand the issue and potentially solve it. Make sure to open an issue for one issue only and not for multiple issues. If you found multiple issues, simply open multiple issues. If your issue is a bug, try to be as precise as possible about what bug it is - you should not just write "Error in diffusers".
|
| 380 |
+
3. **Reproducibility**: No reproducible code snippet == no solution. If you encounter a bug, maintainers **have to be able to reproduce** it. Make sure that you include a code snippet that can be copy-pasted into a Python interpreter to reproduce the issue. Make sure that your code snippet works, *i.e.* that there are no missing imports or missing links to images, ... Your issue should contain an error message **and** a code snippet that can be copy-pasted without any changes to reproduce the exact same error message. If your issue is using local model weights or local data that cannot be accessed by the reader, the issue cannot be solved. If you cannot share your data or model, try to make a dummy model or dummy data.
|
| 381 |
+
4. **Minimalistic**: Try to help the reader as much as you can to understand the issue as quickly as possible by staying as concise as possible. Remove all code / all information that is irrelevant to the issue. If you have found a bug, try to create the easiest code example you can to demonstrate your issue, do not just dump your whole workflow into the issue as soon as you have found a bug. E.g., if you train a model and get an error at some point during the training, you should first try to understand what part of the training code is responsible for the error and try to reproduce it with a couple of lines. Try to use dummy data instead of full datasets.
|
| 382 |
+
5. Add links. If you are referring to a certain naming, method, or model make sure to provide a link so that the reader can better understand what you mean. If you are referring to a specific PR or issue, make sure to link it to your issue. Do not assume that the reader knows what you are talking about. The more links you add to your issue the better.
|
| 383 |
+
6. Formatting. Make sure to nicely format your issue by formatting code into Python code syntax, and error messages into normal code syntax. See the [official GitHub formatting docs](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) for more information.
|
| 384 |
+
7. Think of your issue not as a ticket to be solved, but rather as a beautiful entry to a well-written encyclopedia. Every added issue is a contribution to publicly available knowledge. By adding a nicely written issue you not only make it easier for maintainers to solve your issue, but you are helping the whole community to better understand a certain aspect of the library.
|
| 385 |
+
|
| 386 |
+
## How to write a good PR
|
| 387 |
+
|
| 388 |
+
1. Be a chameleon. Understand existing design patterns and syntax and make sure your code additions flow seamlessly into the existing code base. Pull requests that significantly diverge from existing design patterns or user interfaces will not be merged.
|
| 389 |
+
2. Be laser focused. A pull request should solve one problem and one problem only. Make sure to not fall into the trap of "also fixing another problem while we're adding it". It is much more difficult to review pull requests that solve multiple, unrelated problems at once.
|
| 390 |
+
3. If helpful, try to add a code snippet that displays an example of how your addition can be used.
|
| 391 |
+
4. The title of your pull request should be a summary of its contribution.
|
| 392 |
+
5. If your pull request addresses an issue, please mention the issue number in
|
| 393 |
+
the pull request description to make sure they are linked (and people
|
| 394 |
+
consulting the issue know you are working on it);
|
| 395 |
+
6. To indicate a work in progress please prefix the title with `[WIP]`. These
|
| 396 |
+
are useful to avoid duplicated work, and to differentiate it from PRs ready
|
| 397 |
+
to be merged;
|
| 398 |
+
7. Try to formulate and format your text as explained in [How to write a good issue](#how-to-write-a-good-issue).
|
| 399 |
+
8. Make sure existing tests pass;
|
| 400 |
+
9. Add high-coverage tests. No quality testing = no merge.
|
| 401 |
+
- If you are adding new `@slow` tests, make sure they pass using
|
| 402 |
+
`RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`.
|
| 403 |
+
CircleCI does not run the slow tests, but GitHub Actions does every night!
|
| 404 |
+
10. All public methods must have informative docstrings that work nicely with markdown. See [`pipeline_latent_diffusion.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py) for an example.
|
| 405 |
+
11. Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos, and other non-text files. We prefer to leverage a hf.co hosted `dataset` like
|
| 406 |
+
[`hf-internal-testing`](https://huggingface.co/hf-internal-testing) or [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images) to place these files.
|
| 407 |
+
If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images
|
| 408 |
+
to this dataset.
|
| 409 |
+
|
| 410 |
+
## How to open a PR
|
| 411 |
+
|
| 412 |
+
Before writing code, we strongly advise you to search through the existing PRs or
|
| 413 |
+
issues to make sure that nobody is already working on the same thing. If you are
|
| 414 |
+
unsure, it is always a good idea to open an issue to get some feedback.
|
| 415 |
+
|
| 416 |
+
You will need basic `git` proficiency to be able to contribute to
|
| 417 |
+
🧨 Diffusers. `git` is not the easiest tool to use but it has the greatest
|
| 418 |
+
manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
|
| 419 |
+
Git](https://git-scm.com/book/en/v2) is a very good reference.
|
| 420 |
+
|
| 421 |
+
Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/83bc6c94eaeb6f7704a2a428931cf2d9ad973ae9/setup.py#L270)):
|
| 422 |
+
|
| 423 |
+
1. Fork the [repository](https://github.com/huggingface/diffusers) by
|
| 424 |
+
clicking on the 'Fork' button on the repository's page. This creates a copy of the code
|
| 425 |
+
under your GitHub user account.
|
| 426 |
+
|
| 427 |
+
2. Clone your fork to your local disk, and add the base repository as a remote:
|
| 428 |
+
|
| 429 |
+
```bash
|
| 430 |
+
$ git clone git@github.com:<your GitHub handle>/diffusers.git
|
| 431 |
+
$ cd diffusers
|
| 432 |
+
$ git remote add upstream https://github.com/huggingface/diffusers.git
|
| 433 |
+
```
|
| 434 |
+
|
| 435 |
+
3. Create a new branch to hold your development changes:
|
| 436 |
+
|
| 437 |
+
```bash
|
| 438 |
+
$ git checkout -b a-descriptive-name-for-my-changes
|
| 439 |
+
```
|
| 440 |
+
|
| 441 |
+
**Do not** work on the `main` branch.
|
| 442 |
+
|
| 443 |
+
4. Set up a development environment by running the following command in a virtual environment:
|
| 444 |
+
|
| 445 |
+
```bash
|
| 446 |
+
$ pip install -e ".[dev]"
|
| 447 |
+
```
|
| 448 |
+
|
| 449 |
+
If you have already cloned the repo, you might need to `git pull` to get the most recent changes in the
|
| 450 |
+
library.
|
| 451 |
+
|
| 452 |
+
5. Develop the features on your branch.
|
| 453 |
+
|
| 454 |
+
As you work on the features, you should make sure that the test suite
|
| 455 |
+
passes. You should run the tests impacted by your changes like this:
|
| 456 |
+
|
| 457 |
+
```bash
|
| 458 |
+
$ pytest tests/<TEST_TO_RUN>.py
|
| 459 |
+
```
|
| 460 |
+
|
| 461 |
+
Before you run the tests, please make sure you install the dependencies required for testing. You can do so
|
| 462 |
+
with this command:
|
| 463 |
+
|
| 464 |
+
```bash
|
| 465 |
+
$ pip install -e ".[test]"
|
| 466 |
+
```
|
| 467 |
+
|
| 468 |
+
You can also run the full test suite with the following command, but it takes
|
| 469 |
+
a beefy machine to produce a result in a decent amount of time now that
|
| 470 |
+
Diffusers has grown a lot. Here is the command for it:
|
| 471 |
+
|
| 472 |
+
```bash
|
| 473 |
+
$ make test
|
| 474 |
+
```
|
| 475 |
+
|
| 476 |
+
🧨 Diffusers relies on `black` and `isort` to format its source code
|
| 477 |
+
consistently. After you make changes, apply automatic style corrections and code verifications
|
| 478 |
+
that can't be automated in one go with:
|
| 479 |
+
|
| 480 |
+
```bash
|
| 481 |
+
$ make style
|
| 482 |
+
```
|
| 483 |
+
|
| 484 |
+
🧨 Diffusers also uses `ruff` and a few custom scripts to check for coding mistakes. Quality
|
| 485 |
+
control runs in CI, however, you can also run the same checks with:
|
| 486 |
+
|
| 487 |
+
```bash
|
| 488 |
+
$ make quality
|
| 489 |
+
```
|
| 490 |
+
|
| 491 |
+
Once you're happy with your changes, add changed files using `git add` and
|
| 492 |
+
make a commit with `git commit` to record your changes locally:
|
| 493 |
+
|
| 494 |
+
```bash
|
| 495 |
+
$ git add modified_file.py
|
| 496 |
+
$ git commit -m "A descriptive message about your changes."
|
| 497 |
+
```
|
| 498 |
+
|
| 499 |
+
It is a good idea to sync your copy of the code with the original
|
| 500 |
+
repository regularly. This way you can quickly account for changes:
|
| 501 |
+
|
| 502 |
+
```bash
|
| 503 |
+
$ git pull upstream main
|
| 504 |
+
```
|
| 505 |
+
|
| 506 |
+
Push the changes to your account using:
|
| 507 |
+
|
| 508 |
+
```bash
|
| 509 |
+
$ git push -u origin a-descriptive-name-for-my-changes
|
| 510 |
+
```
|
| 511 |
+
|
| 512 |
+
6. Once you are satisfied, go to the
|
| 513 |
+
webpage of your fork on GitHub. Click on 'Pull request' to send your changes
|
| 514 |
+
to the project maintainers for review.
|
| 515 |
+
|
| 516 |
+
7. It's OK if maintainers ask you for changes. It happens to core contributors
|
| 517 |
+
too! So everyone can see the changes in the Pull request, work in your local
|
| 518 |
+
branch and push the changes to your fork. They will automatically appear in
|
| 519 |
+
the pull request.
|
| 520 |
+
|
| 521 |
+
### Tests
|
| 522 |
+
|
| 523 |
+
An extensive test suite is included to test the library behavior and several examples. Library tests can be found in
|
| 524 |
+
the [tests folder](https://github.com/huggingface/diffusers/tree/main/tests).
|
| 525 |
+
|
| 526 |
+
We like `pytest` and `pytest-xdist` because it's faster. From the root of the
|
| 527 |
+
repository, here's how to run tests with `pytest` for the library:
|
| 528 |
+
|
| 529 |
+
```bash
|
| 530 |
+
$ python -m pytest -n auto --dist=loadfile -s -v ./tests/
|
| 531 |
+
```
|
| 532 |
+
|
| 533 |
+
In fact, that's how `make test` is implemented!
|
| 534 |
+
|
| 535 |
+
You can specify a smaller set of tests in order to test only the feature
|
| 536 |
+
you're working on.
|
| 537 |
+
|
| 538 |
+
By default, slow tests are skipped. Set the `RUN_SLOW` environment variable to
|
| 539 |
+
`yes` to run them. This will download many gigabytes of models — make sure you
|
| 540 |
+
have enough disk space and a good Internet connection, or a lot of patience!
|
| 541 |
+
|
| 542 |
+
```bash
|
| 543 |
+
$ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./tests/
|
| 544 |
+
```
|
| 545 |
+
|
| 546 |
+
`unittest` is fully supported, here's how to run tests with it:
|
| 547 |
+
|
| 548 |
+
```bash
|
| 549 |
+
$ python -m unittest discover -s tests -t . -v
|
| 550 |
+
$ python -m unittest discover -s examples -t examples -v
|
| 551 |
+
```
|
| 552 |
+
|
| 553 |
+
### Syncing forked main with upstream (HuggingFace) main
|
| 554 |
+
|
| 555 |
+
To avoid pinging the upstream repository which adds reference notes to each upstream PR and sends unnecessary notifications to the developers involved in these PRs,
|
| 556 |
+
when syncing the main branch of a forked repository, please, follow these steps:
|
| 557 |
+
1. When possible, avoid syncing with the upstream using a branch and PR on the forked repository. Instead, merge directly into the forked main.
|
| 558 |
+
2. If a PR is absolutely necessary, use the following steps after checking out your branch:
|
| 559 |
+
```bash
|
| 560 |
+
$ git checkout -b your-branch-for-syncing
|
| 561 |
+
$ git pull --squash --no-commit upstream main
|
| 562 |
+
$ git commit -m '<your message without GitHub references>'
|
| 563 |
+
$ git push --set-upstream origin your-branch-for-syncing
|
| 564 |
+
```
|
| 565 |
+
|
| 566 |
+
### Style guide
|
| 567 |
+
|
| 568 |
+
For documentation strings, 🧨 Diffusers follows the [Google style](https://google.github.io/styleguide/pyguide.html).
|
diffusers/docs/source/en/conceptual/ethical_guidelines.md
ADDED
|
@@ -0,0 +1,63 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
| 2 |
+
|
| 3 |
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
| 4 |
+
the License. You may obtain a copy of the License at
|
| 5 |
+
|
| 6 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 7 |
+
|
| 8 |
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
| 9 |
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
| 10 |
+
specific language governing permissions and limitations under the License.
|
| 11 |
+
-->
|
| 12 |
+
|
| 13 |
+
# 🧨 Diffusers’ Ethical Guidelines
|
| 14 |
+
|
| 15 |
+
## Preamble
|
| 16 |
+
|
| 17 |
+
[Diffusers](https://huggingface.co/docs/diffusers/index) provides pre-trained diffusion models and serves as a modular toolbox for inference and training.
|
| 18 |
+
|
| 19 |
+
Given its real case applications in the world and potential negative impacts on society, we think it is important to provide the project with ethical guidelines to guide the development, users’ contributions, and usage of the Diffusers library.
|
| 20 |
+
|
| 21 |
+
The risks associated with using this technology are still being examined, but to name a few: copyrights issues for artists; deep-fake exploitation; sexual content generation in inappropriate contexts; non-consensual impersonation; harmful social biases perpetuating the oppression of marginalized groups.
|
| 22 |
+
We will keep tracking risks and adapt the following guidelines based on the community's responsiveness and valuable feedback.
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
## Scope
|
| 26 |
+
|
| 27 |
+
The Diffusers community will apply the following ethical guidelines to the project’s development and help coordinate how the community will integrate the contributions, especially concerning sensitive topics related to ethical concerns.
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
## Ethical guidelines
|
| 31 |
+
|
| 32 |
+
The following ethical guidelines apply generally, but we will primarily implement them when dealing with ethically sensitive issues while making a technical choice. Furthermore, we commit to adapting those ethical principles over time following emerging harms related to the state of the art of the technology in question.
|
| 33 |
+
|
| 34 |
+
- **Transparency**: we are committed to being transparent in managing PRs, explaining our choices to users, and making technical decisions.
|
| 35 |
+
|
| 36 |
+
- **Consistency**: we are committed to guaranteeing our users the same level of attention in project management, keeping it technically stable and consistent.
|
| 37 |
+
|
| 38 |
+
- **Simplicity**: with a desire to make it easy to use and exploit the Diffusers library, we are committed to keeping the project’s goals lean and coherent.
|
| 39 |
+
|
| 40 |
+
- **Accessibility**: the Diffusers project helps lower the entry bar for contributors who can help run it even without technical expertise. Doing so makes research artifacts more accessible to the community.
|
| 41 |
+
|
| 42 |
+
- **Reproducibility**: we aim to be transparent about the reproducibility of upstream code, models, and datasets when made available through the Diffusers library.
|
| 43 |
+
|
| 44 |
+
- **Responsibility**: as a community and through teamwork, we hold a collective responsibility to our users by anticipating and mitigating this technology's potential risks and dangers.
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
## Examples of implementations: Safety features and Mechanisms
|
| 48 |
+
|
| 49 |
+
The team works daily to make the technical and non-technical tools available to deal with the potential ethical and social risks associated with diffusion technology. Moreover, the community's input is invaluable in ensuring these features' implementation and raising awareness with us.
|
| 50 |
+
|
| 51 |
+
- [**Community tab**](https://huggingface.co/docs/hub/repositories-pull-requests-discussions): it enables the community to discuss and better collaborate on a project.
|
| 52 |
+
|
| 53 |
+
- **Bias exploration and evaluation**: the Hugging Face team provides a [space](https://huggingface.co/spaces/society-ethics/DiffusionBiasExplorer) to demonstrate the biases in Stable Diffusion interactively. In this sense, we support and encourage bias explorers and evaluations.
|
| 54 |
+
|
| 55 |
+
- **Encouraging safety in deployment**
|
| 56 |
+
|
| 57 |
+
- [**Safe Stable Diffusion**](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_safe): It mitigates the well-known issue that models, like Stable Diffusion, that are trained on unfiltered, web-crawled datasets tend to suffer from inappropriate degeneration. Related paper: [Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models](https://huggingface.co/papers/2211.05105).
|
| 58 |
+
|
| 59 |
+
- [**Safety Checker**](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py): It checks and compares the class probability of a set of hard-coded harmful concepts in the embedding space against an image after it has been generated. The harmful concepts are intentionally hidden to prevent reverse engineering of the checker.
|
| 60 |
+
|
| 61 |
+
- **Staged released on the Hub**: in particularly sensitive situations, access to some repositories should be restricted. This staged release is an intermediary step that allows the repository’s authors to have more control over its use.
|
| 62 |
+
|
| 63 |
+
- **Licensing**: [OpenRAILs](https://huggingface.co/blog/open_rail), a new type of licensing, allow us to ensure free access while having a set of restrictions that ensure more responsible use.
|
diffusers/docs/source/en/conceptual/evaluation.md
ADDED
|
@@ -0,0 +1,578 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
| 2 |
+
|
| 3 |
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
| 4 |
+
the License. You may obtain a copy of the License at
|
| 5 |
+
|
| 6 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 7 |
+
|
| 8 |
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
| 9 |
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
| 10 |
+
specific language governing permissions and limitations under the License.
|
| 11 |
+
-->
|
| 12 |
+
|
| 13 |
+
# Evaluating Diffusion Models
|
| 14 |
+
|
| 15 |
+
<a target="_blank" href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/evaluation.ipynb">
|
| 16 |
+
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
| 17 |
+
</a>
|
| 18 |
+
|
| 19 |
+
> [!TIP]
|
| 20 |
+
> This document has now grown outdated given the emergence of existing evaluation frameworks for diffusion models for image generation. Please check
|
| 21 |
+
> out works like [HEIM](https://crfm.stanford.edu/helm/heim/latest/), [T2I-Compbench](https://huggingface.co/papers/2307.06350),
|
| 22 |
+
> [GenEval](https://huggingface.co/papers/2310.11513).
|
| 23 |
+
|
| 24 |
+
Evaluation of generative models like [Stable Diffusion](https://huggingface.co/docs/diffusers/stable_diffusion) is subjective in nature. But as practitioners and researchers, we often have to make careful choices amongst many different possibilities. So, when working with different generative models (like GANs, Diffusion, etc.), how do we choose one over the other?
|
| 25 |
+
|
| 26 |
+
Qualitative evaluation of such models can be error-prone and might incorrectly influence a decision.
|
| 27 |
+
However, quantitative metrics don't necessarily correspond to image quality. So, usually, a combination
|
| 28 |
+
of both qualitative and quantitative evaluations provides a stronger signal when choosing one model
|
| 29 |
+
over the other.
|
| 30 |
+
|
| 31 |
+
In this document, we provide a non-exhaustive overview of qualitative and quantitative methods to evaluate Diffusion models. For quantitative methods, we specifically focus on how to implement them alongside `diffusers`.
|
| 32 |
+
|
| 33 |
+
The methods shown in this document can also be used to evaluate different [noise schedulers](https://huggingface.co/docs/diffusers/main/en/api/schedulers/overview) keeping the underlying generation model fixed.
|
| 34 |
+
|
| 35 |
+
## Scenarios
|
| 36 |
+
|
| 37 |
+
We cover Diffusion models with the following pipelines:
|
| 38 |
+
|
| 39 |
+
- Text-guided image generation (such as the [`StableDiffusionPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/text2img)).
|
| 40 |
+
- Text-guided image generation, additionally conditioned on an input image (such as the [`StableDiffusionImg2ImgPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/img2img) and [`StableDiffusionInstructPix2PixPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pix2pix)).
|
| 41 |
+
- Class-conditioned image generation models (such as the [`DiTPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/dit)).
|
| 42 |
+
|
| 43 |
+
## Qualitative Evaluation
|
| 44 |
+
|
| 45 |
+
Qualitative evaluation typically involves human assessment of generated images. Quality is measured across aspects such as compositionality, image-text alignment, and spatial relations. Common prompts provide a degree of uniformity for subjective metrics.
|
| 46 |
+
DrawBench and PartiPrompts are prompt datasets used for qualitative benchmarking. DrawBench and PartiPrompts were introduced by [Imagen](https://imagen.research.google/) and [Parti](https://parti.research.google/) respectively.
|
| 47 |
+
|
| 48 |
+
From the [official Parti website](https://parti.research.google/):
|
| 49 |
+
|
| 50 |
+
> PartiPrompts (P2) is a rich set of over 1600 prompts in English that we release as part of this work. P2 can be used to measure model capabilities across various categories and challenge aspects.
|
| 51 |
+
|
| 52 |
+

|
| 53 |
+
|
| 54 |
+
PartiPrompts has the following columns:
|
| 55 |
+
|
| 56 |
+
- Prompt
|
| 57 |
+
- Category of the prompt (such as “Abstract”, “World Knowledge”, etc.)
|
| 58 |
+
- Challenge reflecting the difficulty (such as “Basic”, “Complex”, “Writing & Symbols”, etc.)
|
| 59 |
+
|
| 60 |
+
These benchmarks allow for side-by-side human evaluation of different image generation models.
|
| 61 |
+
|
| 62 |
+
For this, the 🧨 Diffusers team has built **Open Parti Prompts**, which is a community-driven qualitative benchmark based on Parti Prompts to compare state-of-the-art open-source diffusion models:
|
| 63 |
+
- [Open Parti Prompts Game](https://huggingface.co/spaces/OpenGenAI/open-parti-prompts): For 10 parti prompts, 4 generated images are shown and the user selects the image that suits the prompt best.
|
| 64 |
+
- [Open Parti Prompts Leaderboard](https://huggingface.co/spaces/OpenGenAI/parti-prompts-leaderboard): The leaderboard comparing the currently best open-sourced diffusion models to each other.
|
| 65 |
+
|
| 66 |
+
To manually compare images, let’s see how we can use `diffusers` on a couple of PartiPrompts.
|
| 67 |
+
|
| 68 |
+
Below we show some prompts sampled across different challenges: Basic, Complex, Linguistic Structures, Imagination, and Writing & Symbols. Here we are using PartiPrompts as a [dataset](https://huggingface.co/datasets/nateraw/parti-prompts).
|
| 69 |
+
|
| 70 |
+
```python
|
| 71 |
+
from datasets import load_dataset
|
| 72 |
+
|
| 73 |
+
# prompts = load_dataset("nateraw/parti-prompts", split="train")
|
| 74 |
+
# prompts = prompts.shuffle()
|
| 75 |
+
# sample_prompts = [prompts[i]["Prompt"] for i in range(5)]
|
| 76 |
+
|
| 77 |
+
# Fixing these sample prompts in the interest of reproducibility.
|
| 78 |
+
sample_prompts = [
|
| 79 |
+
"a corgi",
|
| 80 |
+
"a hot air balloon with a yin-yang symbol, with the moon visible in the daytime sky",
|
| 81 |
+
"a car with no windows",
|
| 82 |
+
"a cube made of porcupine",
|
| 83 |
+
'The saying "BE EXCELLENT TO EACH OTHER" written on a red brick wall with a graffiti image of a green alien wearing a tuxedo. A yellow fire hydrant is on a sidewalk in the foreground.',
|
| 84 |
+
]
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
Now we can use these prompts to generate some images using Stable Diffusion ([v1-4 checkpoint](https://huggingface.co/CompVis/stable-diffusion-v1-4)):
|
| 88 |
+
|
| 89 |
+
```python
|
| 90 |
+
import torch
|
| 91 |
+
|
| 92 |
+
seed = 0
|
| 93 |
+
generator = torch.manual_seed(seed)
|
| 94 |
+
|
| 95 |
+
images = sd_pipeline(sample_prompts, num_images_per_prompt=1, generator=generator).images
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+

|
| 99 |
+
|
| 100 |
+
We can also set `num_images_per_prompt` accordingly to compare different images for the same prompt. Running the same pipeline but with a different checkpoint ([v1-5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5)), yields:
|
| 101 |
+
|
| 102 |
+

|
| 103 |
+
|
| 104 |
+
Once several images are generated from all the prompts using multiple models (under evaluation), these results are presented to human evaluators for scoring. For
|
| 105 |
+
more details on the DrawBench and PartiPrompts benchmarks, refer to their respective papers.
|
| 106 |
+
|
| 107 |
+
<Tip>
|
| 108 |
+
|
| 109 |
+
It is useful to look at some inference samples while a model is training to measure the
|
| 110 |
+
training progress. In our [training scripts](https://github.com/huggingface/diffusers/tree/main/examples/), we support this utility with additional support for
|
| 111 |
+
logging to TensorBoard and Weights & Biases.
|
| 112 |
+
|
| 113 |
+
</Tip>
|
| 114 |
+
|
| 115 |
+
## Quantitative Evaluation
|
| 116 |
+
|
| 117 |
+
In this section, we will walk you through how to evaluate three different diffusion pipelines using:
|
| 118 |
+
|
| 119 |
+
- CLIP score
|
| 120 |
+
- CLIP directional similarity
|
| 121 |
+
- FID
|
| 122 |
+
|
| 123 |
+
### Text-guided image generation
|
| 124 |
+
|
| 125 |
+
[CLIP score](https://huggingface.co/papers/2104.08718) measures the compatibility of image-caption pairs. Higher CLIP scores imply higher compatibility 🔼. The CLIP score is a quantitative measurement of the qualitative concept "compatibility". Image-caption pair compatibility can also be thought of as the semantic similarity between the image and the caption. CLIP score was found to have high correlation with human judgement.
|
| 126 |
+
|
| 127 |
+
Let's first load a [`StableDiffusionPipeline`]:
|
| 128 |
+
|
| 129 |
+
```python
|
| 130 |
+
from diffusers import StableDiffusionPipeline
|
| 131 |
+
import torch
|
| 132 |
+
|
| 133 |
+
model_ckpt = "CompVis/stable-diffusion-v1-4"
|
| 134 |
+
sd_pipeline = StableDiffusionPipeline.from_pretrained(model_ckpt, torch_dtype=torch.float16).to("cuda")
|
| 135 |
+
```
|
| 136 |
+
|
| 137 |
+
Generate some images with multiple prompts:
|
| 138 |
+
|
| 139 |
+
```python
|
| 140 |
+
prompts = [
|
| 141 |
+
"a photo of an astronaut riding a horse on mars",
|
| 142 |
+
"A high tech solarpunk utopia in the Amazon rainforest",
|
| 143 |
+
"A pikachu fine dining with a view to the Eiffel Tower",
|
| 144 |
+
"A mecha robot in a favela in expressionist style",
|
| 145 |
+
"an insect robot preparing a delicious meal",
|
| 146 |
+
"A small cabin on top of a snowy mountain in the style of Disney, artstation",
|
| 147 |
+
]
|
| 148 |
+
|
| 149 |
+
images = sd_pipeline(prompts, num_images_per_prompt=1, output_type="np").images
|
| 150 |
+
|
| 151 |
+
print(images.shape)
|
| 152 |
+
# (6, 512, 512, 3)
|
| 153 |
+
```
|
| 154 |
+
|
| 155 |
+
And then, we calculate the CLIP score.
|
| 156 |
+
|
| 157 |
+
```python
|
| 158 |
+
from torchmetrics.functional.multimodal import clip_score
|
| 159 |
+
from functools import partial
|
| 160 |
+
|
| 161 |
+
clip_score_fn = partial(clip_score, model_name_or_path="openai/clip-vit-base-patch16")
|
| 162 |
+
|
| 163 |
+
def calculate_clip_score(images, prompts):
|
| 164 |
+
images_int = (images * 255).astype("uint8")
|
| 165 |
+
clip_score = clip_score_fn(torch.from_numpy(images_int).permute(0, 3, 1, 2), prompts).detach()
|
| 166 |
+
return round(float(clip_score), 4)
|
| 167 |
+
|
| 168 |
+
sd_clip_score = calculate_clip_score(images, prompts)
|
| 169 |
+
print(f"CLIP score: {sd_clip_score}")
|
| 170 |
+
# CLIP score: 35.7038
|
| 171 |
+
```
|
| 172 |
+
|
| 173 |
+
In the above example, we generated one image per prompt. If we generated multiple images per prompt, we would have to take the average score from the generated images per prompt.
|
| 174 |
+
|
| 175 |
+
Now, if we wanted to compare two checkpoints compatible with the [`StableDiffusionPipeline`] we should pass a generator while calling the pipeline. First, we generate images with a
|
| 176 |
+
fixed seed with the [v1-4 Stable Diffusion checkpoint](https://huggingface.co/CompVis/stable-diffusion-v1-4):
|
| 177 |
+
|
| 178 |
+
```python
|
| 179 |
+
seed = 0
|
| 180 |
+
generator = torch.manual_seed(seed)
|
| 181 |
+
|
| 182 |
+
images = sd_pipeline(prompts, num_images_per_prompt=1, generator=generator, output_type="np").images
|
| 183 |
+
```
|
| 184 |
+
|
| 185 |
+
Then we load the [v1-5 checkpoint](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) to generate images:
|
| 186 |
+
|
| 187 |
+
```python
|
| 188 |
+
model_ckpt_1_5 = "stable-diffusion-v1-5/stable-diffusion-v1-5"
|
| 189 |
+
sd_pipeline_1_5 = StableDiffusionPipeline.from_pretrained(model_ckpt_1_5, torch_dtype=torch.float16).to("cuda")
|
| 190 |
+
|
| 191 |
+
images_1_5 = sd_pipeline_1_5(prompts, num_images_per_prompt=1, generator=generator, output_type="np").images
|
| 192 |
+
```
|
| 193 |
+
|
| 194 |
+
And finally, we compare their CLIP scores:
|
| 195 |
+
|
| 196 |
+
```python
|
| 197 |
+
sd_clip_score_1_4 = calculate_clip_score(images, prompts)
|
| 198 |
+
print(f"CLIP Score with v-1-4: {sd_clip_score_1_4}")
|
| 199 |
+
# CLIP Score with v-1-4: 34.9102
|
| 200 |
+
|
| 201 |
+
sd_clip_score_1_5 = calculate_clip_score(images_1_5, prompts)
|
| 202 |
+
print(f"CLIP Score with v-1-5: {sd_clip_score_1_5}")
|
| 203 |
+
# CLIP Score with v-1-5: 36.2137
|
| 204 |
+
```
|
| 205 |
+
|
| 206 |
+
It seems like the [v1-5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) checkpoint performs better than its predecessor. Note, however, that the number of prompts we used to compute the CLIP scores is quite low. For a more practical evaluation, this number should be way higher, and the prompts should be diverse.
|
| 207 |
+
|
| 208 |
+
<Tip warning={true}>
|
| 209 |
+
|
| 210 |
+
By construction, there are some limitations in this score. The captions in the training dataset
|
| 211 |
+
were crawled from the web and extracted from `alt` and similar tags associated an image on the internet.
|
| 212 |
+
They are not necessarily representative of what a human being would use to describe an image. Hence we
|
| 213 |
+
had to "engineer" some prompts here.
|
| 214 |
+
|
| 215 |
+
</Tip>
|
| 216 |
+
|
| 217 |
+
### Image-conditioned text-to-image generation
|
| 218 |
+
|
| 219 |
+
In this case, we condition the generation pipeline with an input image as well as a text prompt. Let's take the [`StableDiffusionInstructPix2PixPipeline`], as an example. It takes an edit instruction as an input prompt and an input image to be edited.
|
| 220 |
+
|
| 221 |
+
Here is one example:
|
| 222 |
+
|
| 223 |
+

|
| 224 |
+
|
| 225 |
+
One strategy to evaluate such a model is to measure the consistency of the change between the two images (in [CLIP](https://huggingface.co/docs/transformers/model_doc/clip) space) with the change between the two image captions (as shown in [CLIP-Guided Domain Adaptation of Image Generators](https://huggingface.co/papers/2108.00946)). This is referred to as the "**CLIP directional similarity**".
|
| 226 |
+
|
| 227 |
+
- Caption 1 corresponds to the input image (image 1) that is to be edited.
|
| 228 |
+
- Caption 2 corresponds to the edited image (image 2). It should reflect the edit instruction.
|
| 229 |
+
|
| 230 |
+
Following is a pictorial overview:
|
| 231 |
+
|
| 232 |
+

|
| 233 |
+
|
| 234 |
+
We have prepared a mini dataset to implement this metric. Let's first load the dataset.
|
| 235 |
+
|
| 236 |
+
```python
|
| 237 |
+
from datasets import load_dataset
|
| 238 |
+
|
| 239 |
+
dataset = load_dataset("sayakpaul/instructpix2pix-demo", split="train")
|
| 240 |
+
dataset.features
|
| 241 |
+
```
|
| 242 |
+
|
| 243 |
+
```bash
|
| 244 |
+
{'input': Value(dtype='string', id=None),
|
| 245 |
+
'edit': Value(dtype='string', id=None),
|
| 246 |
+
'output': Value(dtype='string', id=None),
|
| 247 |
+
'image': Image(decode=True, id=None)}
|
| 248 |
+
```
|
| 249 |
+
|
| 250 |
+
Here we have:
|
| 251 |
+
|
| 252 |
+
- `input` is a caption corresponding to the `image`.
|
| 253 |
+
- `edit` denotes the edit instruction.
|
| 254 |
+
- `output` denotes the modified caption reflecting the `edit` instruction.
|
| 255 |
+
|
| 256 |
+
Let's take a look at a sample.
|
| 257 |
+
|
| 258 |
+
```python
|
| 259 |
+
idx = 0
|
| 260 |
+
print(f"Original caption: {dataset[idx]['input']}")
|
| 261 |
+
print(f"Edit instruction: {dataset[idx]['edit']}")
|
| 262 |
+
print(f"Modified caption: {dataset[idx]['output']}")
|
| 263 |
+
```
|
| 264 |
+
|
| 265 |
+
```bash
|
| 266 |
+
Original caption: 2. FAROE ISLANDS: An archipelago of 18 mountainous isles in the North Atlantic Ocean between Norway and Iceland, the Faroe Islands has 'everything you could hope for', according to Big 7 Travel. It boasts 'crystal clear waterfalls, rocky cliffs that seem to jut out of nowhere and velvety green hills'
|
| 267 |
+
Edit instruction: make the isles all white marble
|
| 268 |
+
Modified caption: 2. WHITE MARBLE ISLANDS: An archipelago of 18 mountainous white marble isles in the North Atlantic Ocean between Norway and Iceland, the White Marble Islands has 'everything you could hope for', according to Big 7 Travel. It boasts 'crystal clear waterfalls, rocky cliffs that seem to jut out of nowhere and velvety green hills'
|
| 269 |
+
```
|
| 270 |
+
|
| 271 |
+
And here is the image:
|
| 272 |
+
|
| 273 |
+
```python
|
| 274 |
+
dataset[idx]["image"]
|
| 275 |
+
```
|
| 276 |
+
|
| 277 |
+

|
| 278 |
+
|
| 279 |
+
We will first edit the images of our dataset with the edit instruction and compute the directional similarity.
|
| 280 |
+
|
| 281 |
+
Let's first load the [`StableDiffusionInstructPix2PixPipeline`]:
|
| 282 |
+
|
| 283 |
+
```python
|
| 284 |
+
from diffusers import StableDiffusionInstructPix2PixPipeline
|
| 285 |
+
|
| 286 |
+
instruct_pix2pix_pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained(
|
| 287 |
+
"timbrooks/instruct-pix2pix", torch_dtype=torch.float16
|
| 288 |
+
).to("cuda")
|
| 289 |
+
```
|
| 290 |
+
|
| 291 |
+
Now, we perform the edits:
|
| 292 |
+
|
| 293 |
+
```python
|
| 294 |
+
import numpy as np
|
| 295 |
+
|
| 296 |
+
|
| 297 |
+
def edit_image(input_image, instruction):
|
| 298 |
+
image = instruct_pix2pix_pipeline(
|
| 299 |
+
instruction,
|
| 300 |
+
image=input_image,
|
| 301 |
+
output_type="np",
|
| 302 |
+
generator=generator,
|
| 303 |
+
).images[0]
|
| 304 |
+
return image
|
| 305 |
+
|
| 306 |
+
input_images = []
|
| 307 |
+
original_captions = []
|
| 308 |
+
modified_captions = []
|
| 309 |
+
edited_images = []
|
| 310 |
+
|
| 311 |
+
for idx in range(len(dataset)):
|
| 312 |
+
input_image = dataset[idx]["image"]
|
| 313 |
+
edit_instruction = dataset[idx]["edit"]
|
| 314 |
+
edited_image = edit_image(input_image, edit_instruction)
|
| 315 |
+
|
| 316 |
+
input_images.append(np.array(input_image))
|
| 317 |
+
original_captions.append(dataset[idx]["input"])
|
| 318 |
+
modified_captions.append(dataset[idx]["output"])
|
| 319 |
+
edited_images.append(edited_image)
|
| 320 |
+
```
|
| 321 |
+
|
| 322 |
+
To measure the directional similarity, we first load CLIP's image and text encoders:
|
| 323 |
+
|
| 324 |
+
```python
|
| 325 |
+
from transformers import (
|
| 326 |
+
CLIPTokenizer,
|
| 327 |
+
CLIPTextModelWithProjection,
|
| 328 |
+
CLIPVisionModelWithProjection,
|
| 329 |
+
CLIPImageProcessor,
|
| 330 |
+
)
|
| 331 |
+
|
| 332 |
+
clip_id = "openai/clip-vit-large-patch14"
|
| 333 |
+
tokenizer = CLIPTokenizer.from_pretrained(clip_id)
|
| 334 |
+
text_encoder = CLIPTextModelWithProjection.from_pretrained(clip_id).to("cuda")
|
| 335 |
+
image_processor = CLIPImageProcessor.from_pretrained(clip_id)
|
| 336 |
+
image_encoder = CLIPVisionModelWithProjection.from_pretrained(clip_id).to("cuda")
|
| 337 |
+
```
|
| 338 |
+
|
| 339 |
+
Notice that we are using a particular CLIP checkpoint, i.e., `openai/clip-vit-large-patch14`. This is because the Stable Diffusion pre-training was performed with this CLIP variant. For more details, refer to the [documentation](https://huggingface.co/docs/transformers/model_doc/clip).
|
| 340 |
+
|
| 341 |
+
Next, we prepare a PyTorch `nn.Module` to compute directional similarity:
|
| 342 |
+
|
| 343 |
+
```python
|
| 344 |
+
import torch.nn as nn
|
| 345 |
+
import torch.nn.functional as F
|
| 346 |
+
|
| 347 |
+
|
| 348 |
+
class DirectionalSimilarity(nn.Module):
|
| 349 |
+
def __init__(self, tokenizer, text_encoder, image_processor, image_encoder):
|
| 350 |
+
super().__init__()
|
| 351 |
+
self.tokenizer = tokenizer
|
| 352 |
+
self.text_encoder = text_encoder
|
| 353 |
+
self.image_processor = image_processor
|
| 354 |
+
self.image_encoder = image_encoder
|
| 355 |
+
|
| 356 |
+
def preprocess_image(self, image):
|
| 357 |
+
image = self.image_processor(image, return_tensors="pt")["pixel_values"]
|
| 358 |
+
return {"pixel_values": image.to("cuda")}
|
| 359 |
+
|
| 360 |
+
def tokenize_text(self, text):
|
| 361 |
+
inputs = self.tokenizer(
|
| 362 |
+
text,
|
| 363 |
+
max_length=self.tokenizer.model_max_length,
|
| 364 |
+
padding="max_length",
|
| 365 |
+
truncation=True,
|
| 366 |
+
return_tensors="pt",
|
| 367 |
+
)
|
| 368 |
+
return {"input_ids": inputs.input_ids.to("cuda")}
|
| 369 |
+
|
| 370 |
+
def encode_image(self, image):
|
| 371 |
+
preprocessed_image = self.preprocess_image(image)
|
| 372 |
+
image_features = self.image_encoder(**preprocessed_image).image_embeds
|
| 373 |
+
image_features = image_features / image_features.norm(dim=1, keepdim=True)
|
| 374 |
+
return image_features
|
| 375 |
+
|
| 376 |
+
def encode_text(self, text):
|
| 377 |
+
tokenized_text = self.tokenize_text(text)
|
| 378 |
+
text_features = self.text_encoder(**tokenized_text).text_embeds
|
| 379 |
+
text_features = text_features / text_features.norm(dim=1, keepdim=True)
|
| 380 |
+
return text_features
|
| 381 |
+
|
| 382 |
+
def compute_directional_similarity(self, img_feat_one, img_feat_two, text_feat_one, text_feat_two):
|
| 383 |
+
sim_direction = F.cosine_similarity(img_feat_two - img_feat_one, text_feat_two - text_feat_one)
|
| 384 |
+
return sim_direction
|
| 385 |
+
|
| 386 |
+
def forward(self, image_one, image_two, caption_one, caption_two):
|
| 387 |
+
img_feat_one = self.encode_image(image_one)
|
| 388 |
+
img_feat_two = self.encode_image(image_two)
|
| 389 |
+
text_feat_one = self.encode_text(caption_one)
|
| 390 |
+
text_feat_two = self.encode_text(caption_two)
|
| 391 |
+
directional_similarity = self.compute_directional_similarity(
|
| 392 |
+
img_feat_one, img_feat_two, text_feat_one, text_feat_two
|
| 393 |
+
)
|
| 394 |
+
return directional_similarity
|
| 395 |
+
```
|
| 396 |
+
|
| 397 |
+
Let's put `DirectionalSimilarity` to use now.
|
| 398 |
+
|
| 399 |
+
```python
|
| 400 |
+
dir_similarity = DirectionalSimilarity(tokenizer, text_encoder, image_processor, image_encoder)
|
| 401 |
+
scores = []
|
| 402 |
+
|
| 403 |
+
for i in range(len(input_images)):
|
| 404 |
+
original_image = input_images[i]
|
| 405 |
+
original_caption = original_captions[i]
|
| 406 |
+
edited_image = edited_images[i]
|
| 407 |
+
modified_caption = modified_captions[i]
|
| 408 |
+
|
| 409 |
+
similarity_score = dir_similarity(original_image, edited_image, original_caption, modified_caption)
|
| 410 |
+
scores.append(float(similarity_score.detach().cpu()))
|
| 411 |
+
|
| 412 |
+
print(f"CLIP directional similarity: {np.mean(scores)}")
|
| 413 |
+
# CLIP directional similarity: 0.0797976553440094
|
| 414 |
+
```
|
| 415 |
+
|
| 416 |
+
Like the CLIP Score, the higher the CLIP directional similarity, the better it is.
|
| 417 |
+
|
| 418 |
+
It should be noted that the `StableDiffusionInstructPix2PixPipeline` exposes two arguments, namely, `image_guidance_scale` and `guidance_scale` that let you control the quality of the final edited image. We encourage you to experiment with these two arguments and see the impact of that on the directional similarity.
|
| 419 |
+
|
| 420 |
+
We can extend the idea of this metric to measure how similar the original image and edited version are. To do that, we can just do `F.cosine_similarity(img_feat_two, img_feat_one)`. For these kinds of edits, we would still want the primary semantics of the images to be preserved as much as possible, i.e., a high similarity score.
|
| 421 |
+
|
| 422 |
+
We can use these metrics for similar pipelines such as the [`StableDiffusionPix2PixZeroPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pix2pix_zero#diffusers.StableDiffusionPix2PixZeroPipeline).
|
| 423 |
+
|
| 424 |
+
<Tip>
|
| 425 |
+
|
| 426 |
+
Both CLIP score and CLIP direction similarity rely on the CLIP model, which can make the evaluations biased.
|
| 427 |
+
|
| 428 |
+
</Tip>
|
| 429 |
+
|
| 430 |
+
***Extending metrics like IS, FID (discussed later), or KID can be difficult*** when the model under evaluation was pre-trained on a large image-captioning dataset (such as the [LAION-5B dataset](https://laion.ai/blog/laion-5b/)). This is because underlying these metrics is an InceptionNet (pre-trained on the ImageNet-1k dataset) used for extracting intermediate image features. The pre-training dataset of Stable Diffusion may have limited overlap with the pre-training dataset of InceptionNet, so it is not a good candidate here for feature extraction.
|
| 431 |
+
|
| 432 |
+
***Using the above metrics helps evaluate models that are class-conditioned. For example, [DiT](https://huggingface.co/docs/diffusers/main/en/api/pipelines/dit). It was pre-trained being conditioned on the ImageNet-1k classes.***
|
| 433 |
+
|
| 434 |
+
### Class-conditioned image generation
|
| 435 |
+
|
| 436 |
+
Class-conditioned generative models are usually pre-trained on a class-labeled dataset such as [ImageNet-1k](https://huggingface.co/datasets/imagenet-1k). Popular metrics for evaluating these models include Fréchet Inception Distance (FID), Kernel Inception Distance (KID), and Inception Score (IS). In this document, we focus on FID ([Heusel et al.](https://huggingface.co/papers/1706.08500)). We show how to compute it with the [`DiTPipeline`](https://huggingface.co/docs/diffusers/api/pipelines/dit), which uses the [DiT model](https://huggingface.co/papers/2212.09748) under the hood.
|
| 437 |
+
|
| 438 |
+
FID aims to measure how similar are two datasets of images. As per [this resource](https://mmgeneration.readthedocs.io/en/latest/quick_run.html#fid):
|
| 439 |
+
|
| 440 |
+
> Fréchet Inception Distance is a measure of similarity between two datasets of images. It was shown to correlate well with the human judgment of visual quality and is most often used to evaluate the quality of samples of Generative Adversarial Networks. FID is calculated by computing the Fréchet distance between two Gaussians fitted to feature representations of the Inception network.
|
| 441 |
+
|
| 442 |
+
These two datasets are essentially the dataset of real images and the dataset of fake images (generated images in our case). FID is usually calculated with two large datasets. However, for this document, we will work with two mini datasets.
|
| 443 |
+
|
| 444 |
+
Let's first download a few images from the ImageNet-1k training set:
|
| 445 |
+
|
| 446 |
+
```python
|
| 447 |
+
from zipfile import ZipFile
|
| 448 |
+
import requests
|
| 449 |
+
|
| 450 |
+
|
| 451 |
+
def download(url, local_filepath):
|
| 452 |
+
r = requests.get(url)
|
| 453 |
+
with open(local_filepath, "wb") as f:
|
| 454 |
+
f.write(r.content)
|
| 455 |
+
return local_filepath
|
| 456 |
+
|
| 457 |
+
dummy_dataset_url = "https://hf.co/datasets/sayakpaul/sample-datasets/resolve/main/sample-imagenet-images.zip"
|
| 458 |
+
local_filepath = download(dummy_dataset_url, dummy_dataset_url.split("/")[-1])
|
| 459 |
+
|
| 460 |
+
with ZipFile(local_filepath, "r") as zipper:
|
| 461 |
+
zipper.extractall(".")
|
| 462 |
+
```
|
| 463 |
+
|
| 464 |
+
```python
|
| 465 |
+
from PIL import Image
|
| 466 |
+
import os
|
| 467 |
+
import numpy as np
|
| 468 |
+
|
| 469 |
+
dataset_path = "sample-imagenet-images"
|
| 470 |
+
image_paths = sorted([os.path.join(dataset_path, x) for x in os.listdir(dataset_path)])
|
| 471 |
+
|
| 472 |
+
real_images = [np.array(Image.open(path).convert("RGB")) for path in image_paths]
|
| 473 |
+
```
|
| 474 |
+
|
| 475 |
+
These are 10 images from the following ImageNet-1k classes: "cassette_player", "chain_saw" (x2), "church", "gas_pump" (x3), "parachute" (x2), and "tench".
|
| 476 |
+
|
| 477 |
+
<p align="center">
|
| 478 |
+
<img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/real-images.png" alt="real-images"><br>
|
| 479 |
+
<em>Real images.</em>
|
| 480 |
+
</p>
|
| 481 |
+
|
| 482 |
+
Now that the images are loaded, let's apply some lightweight pre-processing on them to use them for FID calculation.
|
| 483 |
+
|
| 484 |
+
```python
|
| 485 |
+
from torchvision.transforms import functional as F
|
| 486 |
+
import torch
|
| 487 |
+
|
| 488 |
+
|
| 489 |
+
def preprocess_image(image):
|
| 490 |
+
image = torch.tensor(image).unsqueeze(0)
|
| 491 |
+
image = image.permute(0, 3, 1, 2) / 255.0
|
| 492 |
+
return F.center_crop(image, (256, 256))
|
| 493 |
+
|
| 494 |
+
real_images = torch.cat([preprocess_image(image) for image in real_images])
|
| 495 |
+
print(real_images.shape)
|
| 496 |
+
# torch.Size([10, 3, 256, 256])
|
| 497 |
+
```
|
| 498 |
+
|
| 499 |
+
We now load the [`DiTPipeline`](https://huggingface.co/docs/diffusers/api/pipelines/dit) to generate images conditioned on the above-mentioned classes.
|
| 500 |
+
|
| 501 |
+
```python
|
| 502 |
+
from diffusers import DiTPipeline, DPMSolverMultistepScheduler
|
| 503 |
+
|
| 504 |
+
dit_pipeline = DiTPipeline.from_pretrained("facebook/DiT-XL-2-256", torch_dtype=torch.float16)
|
| 505 |
+
dit_pipeline.scheduler = DPMSolverMultistepScheduler.from_config(dit_pipeline.scheduler.config)
|
| 506 |
+
dit_pipeline = dit_pipeline.to("cuda")
|
| 507 |
+
|
| 508 |
+
seed = 0
|
| 509 |
+
generator = torch.manual_seed(seed)
|
| 510 |
+
|
| 511 |
+
|
| 512 |
+
words = [
|
| 513 |
+
"cassette player",
|
| 514 |
+
"chainsaw",
|
| 515 |
+
"chainsaw",
|
| 516 |
+
"church",
|
| 517 |
+
"gas pump",
|
| 518 |
+
"gas pump",
|
| 519 |
+
"gas pump",
|
| 520 |
+
"parachute",
|
| 521 |
+
"parachute",
|
| 522 |
+
"tench",
|
| 523 |
+
]
|
| 524 |
+
|
| 525 |
+
class_ids = dit_pipeline.get_label_ids(words)
|
| 526 |
+
output = dit_pipeline(class_labels=class_ids, generator=generator, output_type="np")
|
| 527 |
+
|
| 528 |
+
fake_images = output.images
|
| 529 |
+
fake_images = torch.tensor(fake_images)
|
| 530 |
+
fake_images = fake_images.permute(0, 3, 1, 2)
|
| 531 |
+
print(fake_images.shape)
|
| 532 |
+
# torch.Size([10, 3, 256, 256])
|
| 533 |
+
```
|
| 534 |
+
|
| 535 |
+
Now, we can compute the FID using [`torchmetrics`](https://torchmetrics.readthedocs.io/).
|
| 536 |
+
|
| 537 |
+
```python
|
| 538 |
+
from torchmetrics.image.fid import FrechetInceptionDistance
|
| 539 |
+
|
| 540 |
+
fid = FrechetInceptionDistance(normalize=True)
|
| 541 |
+
fid.update(real_images, real=True)
|
| 542 |
+
fid.update(fake_images, real=False)
|
| 543 |
+
|
| 544 |
+
print(f"FID: {float(fid.compute())}")
|
| 545 |
+
# FID: 177.7147216796875
|
| 546 |
+
```
|
| 547 |
+
|
| 548 |
+
The lower the FID, the better it is. Several things can influence FID here:
|
| 549 |
+
|
| 550 |
+
- Number of images (both real and fake)
|
| 551 |
+
- Randomness induced in the diffusion process
|
| 552 |
+
- Number of inference steps in the diffusion process
|
| 553 |
+
- The scheduler being used in the diffusion process
|
| 554 |
+
|
| 555 |
+
For the last two points, it is, therefore, a good practice to run the evaluation across different seeds and inference steps, and then report an average result.
|
| 556 |
+
|
| 557 |
+
<Tip warning={true}>
|
| 558 |
+
|
| 559 |
+
FID results tend to be fragile as they depend on a lot of factors:
|
| 560 |
+
|
| 561 |
+
* The specific Inception model used during computation.
|
| 562 |
+
* The implementation accuracy of the computation.
|
| 563 |
+
* The image format (not the same if we start from PNGs vs JPGs).
|
| 564 |
+
|
| 565 |
+
Keeping that in mind, FID is often most useful when comparing similar runs, but it is
|
| 566 |
+
hard to reproduce paper results unless the authors carefully disclose the FID
|
| 567 |
+
measurement code.
|
| 568 |
+
|
| 569 |
+
These points apply to other related metrics too, such as KID and IS.
|
| 570 |
+
|
| 571 |
+
</Tip>
|
| 572 |
+
|
| 573 |
+
As a final step, let's visually inspect the `fake_images`.
|
| 574 |
+
|
| 575 |
+
<p align="center">
|
| 576 |
+
<img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/fake-images.png" alt="fake-images"><br>
|
| 577 |
+
<em>Fake images.</em>
|
| 578 |
+
</p>
|
diffusers/docs/source/en/conceptual/philosophy.md
ADDED
|
@@ -0,0 +1,110 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
| 2 |
+
|
| 3 |
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
| 4 |
+
the License. You may obtain a copy of the License at
|
| 5 |
+
|
| 6 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 7 |
+
|
| 8 |
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
| 9 |
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
| 10 |
+
specific language governing permissions and limitations under the License.
|
| 11 |
+
-->
|
| 12 |
+
|
| 13 |
+
# Philosophy
|
| 14 |
+
|
| 15 |
+
🧨 Diffusers provides **state-of-the-art** pretrained diffusion models across multiple modalities.
|
| 16 |
+
Its purpose is to serve as a **modular toolbox** for both inference and training.
|
| 17 |
+
|
| 18 |
+
We aim at building a library that stands the test of time and therefore take API design very seriously.
|
| 19 |
+
|
| 20 |
+
In a nutshell, Diffusers is built to be a natural extension of PyTorch. Therefore, most of our design choices are based on [PyTorch's Design Principles](https://pytorch.org/docs/stable/community/design.html#pytorch-design-philosophy). Let's go over the most important ones:
|
| 21 |
+
|
| 22 |
+
## Usability over Performance
|
| 23 |
+
|
| 24 |
+
- While Diffusers has many built-in performance-enhancing features (see [Memory and Speed](https://huggingface.co/docs/diffusers/optimization/fp16)), models are always loaded with the highest precision and lowest optimization. Therefore, by default diffusion pipelines are always instantiated on CPU with float32 precision if not otherwise defined by the user. This ensures usability across different platforms and accelerators and means that no complex installations are required to run the library.
|
| 25 |
+
- Diffusers aims to be a **light-weight** package and therefore has very few required dependencies, but many soft dependencies that can improve performance (such as `accelerate`, `safetensors`, `onnx`, etc...). We strive to keep the library as lightweight as possible so that it can be added without much concern as a dependency on other packages.
|
| 26 |
+
- Diffusers prefers simple, self-explainable code over condensed, magic code. This means that short-hand code syntaxes such as lambda functions, and advanced PyTorch operators are often not desired.
|
| 27 |
+
|
| 28 |
+
## Simple over easy
|
| 29 |
+
|
| 30 |
+
As PyTorch states, **explicit is better than implicit** and **simple is better than complex**. This design philosophy is reflected in multiple parts of the library:
|
| 31 |
+
- We follow PyTorch's API with methods like [`DiffusionPipeline.to`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.to) to let the user handle device management.
|
| 32 |
+
- Raising concise error messages is preferred to silently correct erroneous input. Diffusers aims at teaching the user, rather than making the library as easy to use as possible.
|
| 33 |
+
- Complex model vs. scheduler logic is exposed instead of magically handled inside. Schedulers/Samplers are separated from diffusion models with minimal dependencies on each other. This forces the user to write the unrolled denoising loop. However, the separation allows for easier debugging and gives the user more control over adapting the denoising process or switching out diffusion models or schedulers.
|
| 34 |
+
- Separately trained components of the diffusion pipeline, *e.g.* the text encoder, the unet, and the variational autoencoder, each have their own model class. This forces the user to handle the interaction between the different model components, and the serialization format separates the model components into different files. However, this allows for easier debugging and customization. DreamBooth or Textual Inversion training
|
| 35 |
+
is very simple thanks to Diffusers' ability to separate single components of the diffusion pipeline.
|
| 36 |
+
|
| 37 |
+
## Tweakable, contributor-friendly over abstraction
|
| 38 |
+
|
| 39 |
+
For large parts of the library, Diffusers adopts an important design principle of the [Transformers library](https://github.com/huggingface/transformers), which is to prefer copy-pasted code over hasty abstractions. This design principle is very opinionated and stands in stark contrast to popular design principles such as [Don't repeat yourself (DRY)](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself).
|
| 40 |
+
In short, just like Transformers does for modeling files, Diffusers prefers to keep an extremely low level of abstraction and very self-contained code for pipelines and schedulers.
|
| 41 |
+
Functions, long code blocks, and even classes can be copied across multiple files which at first can look like a bad, sloppy design choice that makes the library unmaintainable.
|
| 42 |
+
**However**, this design has proven to be extremely successful for Transformers and makes a lot of sense for community-driven, open-source machine learning libraries because:
|
| 43 |
+
- Machine Learning is an extremely fast-moving field in which paradigms, model architectures, and algorithms are changing rapidly, which therefore makes it very difficult to define long-lasting code abstractions.
|
| 44 |
+
- Machine Learning practitioners like to be able to quickly tweak existing code for ideation and research and therefore prefer self-contained code over one that contains many abstractions.
|
| 45 |
+
- Open-source libraries rely on community contributions and therefore must build a library that is easy to contribute to. The more abstract the code, the more dependencies, the harder to read, and the harder to contribute to. Contributors simply stop contributing to very abstract libraries out of fear of breaking vital functionality. If contributing to a library cannot break other fundamental code, not only is it more inviting for potential new contributors, but it is also easier to review and contribute to multiple parts in parallel.
|
| 46 |
+
|
| 47 |
+
At Hugging Face, we call this design the **single-file policy** which means that almost all of the code of a certain class should be written in a single, self-contained file. To read more about the philosophy, you can have a look
|
| 48 |
+
at [this blog post](https://huggingface.co/blog/transformers-design-philosophy).
|
| 49 |
+
|
| 50 |
+
In Diffusers, we follow this philosophy for both pipelines and schedulers, but only partly for diffusion models. The reason we don't follow this design fully for diffusion models is because almost all diffusion pipelines, such
|
| 51 |
+
as [DDPM](https://huggingface.co/docs/diffusers/api/pipelines/ddpm), [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview#stable-diffusion-pipelines), [unCLIP (DALL·E 2)](https://huggingface.co/docs/diffusers/api/pipelines/unclip) and [Imagen](https://imagen.research.google/) all rely on the same diffusion model, the [UNet](https://huggingface.co/docs/diffusers/api/models/unet2d-cond).
|
| 52 |
+
|
| 53 |
+
Great, now you should have generally understood why 🧨 Diffusers is designed the way it is 🤗.
|
| 54 |
+
We try to apply these design principles consistently across the library. Nevertheless, there are some minor exceptions to the philosophy or some unlucky design choices. If you have feedback regarding the design, we would ❤️ to hear it [directly on GitHub](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=).
|
| 55 |
+
|
| 56 |
+
## Design Philosophy in Details
|
| 57 |
+
|
| 58 |
+
Now, let's look a bit into the nitty-gritty details of the design philosophy. Diffusers essentially consists of three major classes: [pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines), [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models), and [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
|
| 59 |
+
Let's walk through more in-detail design decisions for each class.
|
| 60 |
+
|
| 61 |
+
### Pipelines
|
| 62 |
+
|
| 63 |
+
Pipelines are designed to be easy to use (therefore do not follow [*Simple over easy*](#simple-over-easy) 100%), are not feature complete, and should loosely be seen as examples of how to use [models](#models) and [schedulers](#schedulers) for inference.
|
| 64 |
+
|
| 65 |
+
The following design principles are followed:
|
| 66 |
+
- Pipelines follow the single-file policy. All pipelines can be found in individual directories under src/diffusers/pipelines. One pipeline folder corresponds to one diffusion paper/project/release. Multiple pipeline files can be gathered in one pipeline folder, as it’s done for [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion). If pipelines share similar functionality, one can make use of the [# Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251).
|
| 67 |
+
- Pipelines all inherit from [`DiffusionPipeline`].
|
| 68 |
+
- Every pipeline consists of different model and scheduler components, that are documented in the [`model_index.json` file](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/model_index.json), are accessible under the same name as attributes of the pipeline and can be shared between pipelines with [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) function.
|
| 69 |
+
- Every pipeline should be loadable via the [`DiffusionPipeline.from_pretrained`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained) function.
|
| 70 |
+
- Pipelines should be used **only** for inference.
|
| 71 |
+
- Pipelines should be very readable, self-explanatory, and easy to tweak.
|
| 72 |
+
- Pipelines should be designed to build on top of each other and be easy to integrate into higher-level APIs.
|
| 73 |
+
- Pipelines are **not** intended to be feature-complete user interfaces. For feature-complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner).
|
| 74 |
+
- Every pipeline should have one and only one way to run it via a `__call__` method. The naming of the `__call__` arguments should be shared across all pipelines.
|
| 75 |
+
- Pipelines should be named after the task they are intended to solve.
|
| 76 |
+
- In almost all cases, novel diffusion pipelines shall be implemented in a new pipeline folder/file.
|
| 77 |
+
|
| 78 |
+
### Models
|
| 79 |
+
|
| 80 |
+
Models are designed as configurable toolboxes that are natural extensions of [PyTorch's Module class](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). They only partly follow the **single-file policy**.
|
| 81 |
+
|
| 82 |
+
The following design principles are followed:
|
| 83 |
+
- Models correspond to **a type of model architecture**. *E.g.* the [`UNet2DConditionModel`] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context.
|
| 84 |
+
- All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unets/unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_condition.py), [`transformers/transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformers/transformer_2d.py), etc...
|
| 85 |
+
- Models **do not** follow the single-file policy and should make use of smaller model building blocks, such as [`attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py), [`resnet.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py), [`embeddings.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py), etc... **Note**: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy.
|
| 86 |
+
- Models intend to expose complexity, just like PyTorch's `Module` class, and give clear error messages.
|
| 87 |
+
- Models all inherit from `ModelMixin` and `ConfigMixin`.
|
| 88 |
+
- Models can be optimized for performance when it doesn’t demand major code changes, keeps backward compatibility, and gives significant memory or compute gain.
|
| 89 |
+
- Models should by default have the highest precision and lowest performance setting.
|
| 90 |
+
- To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different.
|
| 91 |
+
- Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work.
|
| 92 |
+
- The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and
|
| 93 |
+
readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
|
| 94 |
+
|
| 95 |
+
### Schedulers
|
| 96 |
+
|
| 97 |
+
Schedulers are responsible to guide the denoising process for inference as well as to define a noise schedule for training. They are designed as individual classes with loadable configuration files and strongly follow the **single-file policy**.
|
| 98 |
+
|
| 99 |
+
The following design principles are followed:
|
| 100 |
+
- All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
|
| 101 |
+
- Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained.
|
| 102 |
+
- One scheduler Python file corresponds to one scheduler algorithm (as might be defined in a paper).
|
| 103 |
+
- If schedulers share similar functionalities, we can make use of the `# Copied from` mechanism.
|
| 104 |
+
- Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`.
|
| 105 |
+
- Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](../using-diffusers/schedulers).
|
| 106 |
+
- Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called.
|
| 107 |
+
- Every scheduler exposes the timesteps to be "looped over" via a `timesteps` attribute, which is an array of timesteps the model will be called upon.
|
| 108 |
+
- The `step(...)` function takes a predicted model output and the "current" sample (x_t) and returns the "previous", slightly more denoised sample (x_t-1).
|
| 109 |
+
- Given the complexity of diffusion schedulers, the `step` function does not expose all the complexity and can be a bit of a "black box".
|
| 110 |
+
- In almost all cases, novel schedulers shall be implemented in a new scheduling file.
|
diffusers/docs/source/en/index.md
ADDED
|
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
| 2 |
+
|
| 3 |
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
| 4 |
+
the License. You may obtain a copy of the License at
|
| 5 |
+
|
| 6 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 7 |
+
|
| 8 |
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
| 9 |
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
| 10 |
+
specific language governing permissions and limitations under the License.
|
| 11 |
+
-->
|
| 12 |
+
|
| 13 |
+
<p align="center">
|
| 14 |
+
<br>
|
| 15 |
+
<img src="https://raw.githubusercontent.com/huggingface/diffusers/77aadfee6a891ab9fcfb780f87c693f7a5beeb8e/docs/source/imgs/diffusers_library.jpg" width="400"/>
|
| 16 |
+
<br>
|
| 17 |
+
</p>
|
| 18 |
+
|
| 19 |
+
# Diffusers
|
| 20 |
+
|
| 21 |
+
🤗 Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or want to train your own diffusion model, 🤗 Diffusers is a modular toolbox that supports both. Our library is designed with a focus on [usability over performance](conceptual/philosophy#usability-over-performance), [simple over easy](conceptual/philosophy#simple-over-easy), and [customizability over abstractions](conceptual/philosophy#tweakable-contributorfriendly-over-abstraction).
|
| 22 |
+
|
| 23 |
+
The library has three main components:
|
| 24 |
+
|
| 25 |
+
- State-of-the-art diffusion pipelines for inference with just a few lines of code. There are many pipelines in 🤗 Diffusers, check out the table in the pipeline [overview](api/pipelines/overview) for a complete list of available pipelines and the task they solve.
|
| 26 |
+
- Interchangeable [noise schedulers](api/schedulers/overview) for balancing trade-offs between generation speed and quality.
|
| 27 |
+
- Pretrained [models](api/models) that can be used as building blocks, and combined with schedulers, for creating your own end-to-end diffusion systems.
|
| 28 |
+
|
| 29 |
+
<div class="mt-10">
|
| 30 |
+
<div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-2 md:gap-y-4 md:gap-x-5">
|
| 31 |
+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./tutorials/tutorial_overview"
|
| 32 |
+
><div class="w-full text-center bg-gradient-to-br from-blue-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Tutorials</div>
|
| 33 |
+
<p class="text-gray-700">Learn the fundamental skills you need to start generating outputs, build your own diffusion system, and train a diffusion model. We recommend starting here if you're using 🤗 Diffusers for the first time!</p>
|
| 34 |
+
</a>
|
| 35 |
+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./using-diffusers/loading_overview"
|
| 36 |
+
><div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">How-to guides</div>
|
| 37 |
+
<p class="text-gray-700">Practical guides for helping you load pipelines, models, and schedulers. You'll also learn how to use pipelines for specific tasks, control how outputs are generated, optimize for inference speed, and different training techniques.</p>
|
| 38 |
+
</a>
|
| 39 |
+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./conceptual/philosophy"
|
| 40 |
+
><div class="w-full text-center bg-gradient-to-br from-pink-400 to-pink-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Conceptual guides</div>
|
| 41 |
+
<p class="text-gray-700">Understand why the library was designed the way it was, and learn more about the ethical guidelines and safety implementations for using the library.</p>
|
| 42 |
+
</a>
|
| 43 |
+
<a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./api/models/overview"
|
| 44 |
+
><div class="w-full text-center bg-gradient-to-br from-purple-400 to-purple-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Reference</div>
|
| 45 |
+
<p class="text-gray-700">Technical descriptions of how 🤗 Diffusers classes and methods work.</p>
|
| 46 |
+
</a>
|
| 47 |
+
</div>
|
| 48 |
+
</div>
|
diffusers/docs/source/en/installation.md
ADDED
|
@@ -0,0 +1,194 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
| 2 |
+
|
| 3 |
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
| 4 |
+
the License. You may obtain a copy of the License at
|
| 5 |
+
|
| 6 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 7 |
+
|
| 8 |
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
| 9 |
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
| 10 |
+
specific language governing permissions and limitations under the License.
|
| 11 |
+
-->
|
| 12 |
+
|
| 13 |
+
# Installation
|
| 14 |
+
|
| 15 |
+
🤗 Diffusers is tested on Python 3.8+, PyTorch 1.7.0+, and Flax. Follow the installation instructions below for the deep learning library you are using:
|
| 16 |
+
|
| 17 |
+
- [PyTorch](https://pytorch.org/get-started/locally/) installation instructions
|
| 18 |
+
- [Flax](https://flax.readthedocs.io/en/latest/) installation instructions
|
| 19 |
+
|
| 20 |
+
## Install with pip
|
| 21 |
+
|
| 22 |
+
You should install 🤗 Diffusers in a [virtual environment](https://docs.python.org/3/library/venv.html).
|
| 23 |
+
If you're unfamiliar with Python virtual environments, take a look at this [guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
|
| 24 |
+
A virtual environment makes it easier to manage different projects and avoid compatibility issues between dependencies.
|
| 25 |
+
|
| 26 |
+
Create a virtual environment with Python or [uv](https://docs.astral.sh/uv/) (refer to [Installation](https://docs.astral.sh/uv/getting-started/installation/) for installation instructions), a fast Rust-based Python package and project manager.
|
| 27 |
+
|
| 28 |
+
<hfoptions id="install">
|
| 29 |
+
<hfoption id="uv">
|
| 30 |
+
|
| 31 |
+
```bash
|
| 32 |
+
uv venv my-env
|
| 33 |
+
source my-env/bin/activate
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
</hfoption>
|
| 37 |
+
<hfoption id="Python">
|
| 38 |
+
|
| 39 |
+
```bash
|
| 40 |
+
python -m venv my-env
|
| 41 |
+
source my-env/bin/activate
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
</hfoption>
|
| 45 |
+
</hfoptions>
|
| 46 |
+
|
| 47 |
+
You should also install 🤗 Transformers because 🤗 Diffusers relies on its models.
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
<frameworkcontent>
|
| 51 |
+
<pt>
|
| 52 |
+
|
| 53 |
+
PyTorch only supports Python 3.8 - 3.11 on Windows. Install Diffusers with uv.
|
| 54 |
+
|
| 55 |
+
```bash
|
| 56 |
+
uv install diffusers["torch"] transformers
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
You can also install Diffusers with pip.
|
| 60 |
+
|
| 61 |
+
```bash
|
| 62 |
+
pip install diffusers["torch"] transformers
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
</pt>
|
| 66 |
+
<jax>
|
| 67 |
+
|
| 68 |
+
Install Diffusers with uv.
|
| 69 |
+
|
| 70 |
+
```bash
|
| 71 |
+
uv pip install diffusers["flax"] transformers
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
You can also install Diffusers with pip.
|
| 75 |
+
|
| 76 |
+
```bash
|
| 77 |
+
pip install diffusers["flax"] transformers
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
</jax>
|
| 81 |
+
</frameworkcontent>
|
| 82 |
+
|
| 83 |
+
## Install with conda
|
| 84 |
+
|
| 85 |
+
After activating your virtual environment, with `conda` (maintained by the community):
|
| 86 |
+
|
| 87 |
+
```bash
|
| 88 |
+
conda install -c conda-forge diffusers
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
## Install from source
|
| 92 |
+
|
| 93 |
+
Before installing 🤗 Diffusers from source, make sure you have PyTorch and 🤗 Accelerate installed.
|
| 94 |
+
|
| 95 |
+
To install 🤗 Accelerate:
|
| 96 |
+
|
| 97 |
+
```bash
|
| 98 |
+
pip install accelerate
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
Then install 🤗 Diffusers from source:
|
| 102 |
+
|
| 103 |
+
```bash
|
| 104 |
+
pip install git+https://github.com/huggingface/diffusers
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
This command installs the bleeding edge `main` version rather than the latest `stable` version.
|
| 108 |
+
The `main` version is useful for staying up-to-date with the latest developments.
|
| 109 |
+
For instance, if a bug has been fixed since the last official release but a new release hasn't been rolled out yet.
|
| 110 |
+
However, this means the `main` version may not always be stable.
|
| 111 |
+
We strive to keep the `main` version operational, and most issues are usually resolved within a few hours or a day.
|
| 112 |
+
If you run into a problem, please open an [Issue](https://github.com/huggingface/diffusers/issues/new/choose) so we can fix it even sooner!
|
| 113 |
+
|
| 114 |
+
## Editable install
|
| 115 |
+
|
| 116 |
+
You will need an editable install if you'd like to:
|
| 117 |
+
|
| 118 |
+
* Use the `main` version of the source code.
|
| 119 |
+
* Contribute to 🤗 Diffusers and need to test changes in the code.
|
| 120 |
+
|
| 121 |
+
Clone the repository and install 🤗 Diffusers with the following commands:
|
| 122 |
+
|
| 123 |
+
```bash
|
| 124 |
+
git clone https://github.com/huggingface/diffusers.git
|
| 125 |
+
cd diffusers
|
| 126 |
+
```
|
| 127 |
+
|
| 128 |
+
<frameworkcontent>
|
| 129 |
+
<pt>
|
| 130 |
+
```bash
|
| 131 |
+
pip install -e ".[torch]"
|
| 132 |
+
```
|
| 133 |
+
</pt>
|
| 134 |
+
<jax>
|
| 135 |
+
```bash
|
| 136 |
+
pip install -e ".[flax]"
|
| 137 |
+
```
|
| 138 |
+
</jax>
|
| 139 |
+
</frameworkcontent>
|
| 140 |
+
|
| 141 |
+
These commands will link the folder you cloned the repository to and your Python library paths.
|
| 142 |
+
Python will now look inside the folder you cloned to in addition to the normal library paths.
|
| 143 |
+
For example, if your Python packages are typically installed in `~/anaconda3/envs/main/lib/python3.10/site-packages/`, Python will also search the `~/diffusers/` folder you cloned to.
|
| 144 |
+
|
| 145 |
+
<Tip warning={true}>
|
| 146 |
+
|
| 147 |
+
You must keep the `diffusers` folder if you want to keep using the library.
|
| 148 |
+
|
| 149 |
+
</Tip>
|
| 150 |
+
|
| 151 |
+
Now you can easily update your clone to the latest version of 🤗 Diffusers with the following command:
|
| 152 |
+
|
| 153 |
+
```bash
|
| 154 |
+
cd ~/diffusers/
|
| 155 |
+
git pull
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
+
Your Python environment will find the `main` version of 🤗 Diffusers on the next run.
|
| 159 |
+
|
| 160 |
+
## Cache
|
| 161 |
+
|
| 162 |
+
Model weights and files are downloaded from the Hub to a cache which is usually your home directory. You can change the cache location by specifying the `HF_HOME` or `HUGGINFACE_HUB_CACHE` environment variables or configuring the `cache_dir` parameter in methods like [`~DiffusionPipeline.from_pretrained`].
|
| 163 |
+
|
| 164 |
+
Cached files allow you to run 🤗 Diffusers offline. To prevent 🤗 Diffusers from connecting to the internet, set the `HF_HUB_OFFLINE` environment variable to `1` and 🤗 Diffusers will only load previously downloaded files in the cache.
|
| 165 |
+
|
| 166 |
+
```shell
|
| 167 |
+
export HF_HUB_OFFLINE=1
|
| 168 |
+
```
|
| 169 |
+
|
| 170 |
+
For more details about managing and cleaning the cache, take a look at the [caching](https://huggingface.co/docs/huggingface_hub/guides/manage-cache) guide.
|
| 171 |
+
|
| 172 |
+
## Telemetry logging
|
| 173 |
+
|
| 174 |
+
Our library gathers telemetry information during [`~DiffusionPipeline.from_pretrained`] requests.
|
| 175 |
+
The data gathered includes the version of 🤗 Diffusers and PyTorch/Flax, the requested model or pipeline class,
|
| 176 |
+
and the path to a pretrained checkpoint if it is hosted on the Hugging Face Hub.
|
| 177 |
+
This usage data helps us debug issues and prioritize new features.
|
| 178 |
+
Telemetry is only sent when loading models and pipelines from the Hub,
|
| 179 |
+
and it is not collected if you're loading local files.
|
| 180 |
+
|
| 181 |
+
We understand that not everyone wants to share additional information,and we respect your privacy.
|
| 182 |
+
You can disable telemetry collection by setting the `HF_HUB_DISABLE_TELEMETRY` environment variable from your terminal:
|
| 183 |
+
|
| 184 |
+
On Linux/MacOS:
|
| 185 |
+
|
| 186 |
+
```bash
|
| 187 |
+
export HF_HUB_DISABLE_TELEMETRY=1
|
| 188 |
+
```
|
| 189 |
+
|
| 190 |
+
On Windows:
|
| 191 |
+
|
| 192 |
+
```bash
|
| 193 |
+
set HF_HUB_DISABLE_TELEMETRY=1
|
| 194 |
+
```
|
diffusers/docs/source/en/quicktour.md
ADDED
|
@@ -0,0 +1,323 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
| 2 |
+
|
| 3 |
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
| 4 |
+
the License. You may obtain a copy of the License at
|
| 5 |
+
|
| 6 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 7 |
+
|
| 8 |
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
| 9 |
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
| 10 |
+
specific language governing permissions and limitations under the License.
|
| 11 |
+
-->
|
| 12 |
+
|
| 13 |
+
[[open-in-colab]]
|
| 14 |
+
|
| 15 |
+
# Quicktour
|
| 16 |
+
|
| 17 |
+
Diffusion models are trained to denoise random Gaussian noise step-by-step to generate a sample of interest, such as an image or audio. This has sparked a tremendous amount of interest in generative AI, and you have probably seen examples of diffusion generated images on the internet. 🧨 Diffusers is a library aimed at making diffusion models widely accessible to everyone.
|
| 18 |
+
|
| 19 |
+
Whether you're a developer or an everyday user, this quicktour will introduce you to 🧨 Diffusers and help you get up and generating quickly! There are three main components of the library to know about:
|
| 20 |
+
|
| 21 |
+
* The [`DiffusionPipeline`] is a high-level end-to-end class designed to rapidly generate samples from pretrained diffusion models for inference.
|
| 22 |
+
* Popular pretrained [model](./api/models) architectures and modules that can be used as building blocks for creating diffusion systems.
|
| 23 |
+
* Many different [schedulers](./api/schedulers/overview) - algorithms that control how noise is added for training, and how to generate denoised images during inference.
|
| 24 |
+
|
| 25 |
+
The quicktour will show you how to use the [`DiffusionPipeline`] for inference, and then walk you through how to combine a model and scheduler to replicate what's happening inside the [`DiffusionPipeline`].
|
| 26 |
+
|
| 27 |
+
<Tip>
|
| 28 |
+
|
| 29 |
+
The quicktour is a simplified version of the introductory 🧨 Diffusers [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) to help you get started quickly. If you want to learn more about 🧨 Diffusers' goal, design philosophy, and additional details about its core API, check out the notebook!
|
| 30 |
+
|
| 31 |
+
</Tip>
|
| 32 |
+
|
| 33 |
+
Before you begin, make sure you have all the necessary libraries installed:
|
| 34 |
+
|
| 35 |
+
```py
|
| 36 |
+
# uncomment to install the necessary libraries in Colab
|
| 37 |
+
#!pip install --upgrade diffusers accelerate transformers
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
- [🤗 Accelerate](https://huggingface.co/docs/accelerate/index) speeds up model loading for inference and training.
|
| 41 |
+
- [🤗 Transformers](https://huggingface.co/docs/transformers/index) is required to run the most popular diffusion models, such as [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview).
|
| 42 |
+
|
| 43 |
+
## DiffusionPipeline
|
| 44 |
+
|
| 45 |
+
The [`DiffusionPipeline`] is the easiest way to use a pretrained diffusion system for inference. It is an end-to-end system containing the model and the scheduler. You can use the [`DiffusionPipeline`] out-of-the-box for many tasks. Take a look at the table below for some supported tasks, and for a complete list of supported tasks, check out the [🧨 Diffusers Summary](./api/pipelines/overview#diffusers-summary) table.
|
| 46 |
+
|
| 47 |
+
| **Task** | **Description** | **Pipeline**
|
| 48 |
+
|------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------|
|
| 49 |
+
| Unconditional Image Generation | generate an image from Gaussian noise | [unconditional_image_generation](./using-diffusers/unconditional_image_generation) |
|
| 50 |
+
| Text-Guided Image Generation | generate an image given a text prompt | [conditional_image_generation](./using-diffusers/conditional_image_generation) |
|
| 51 |
+
| Text-Guided Image-to-Image Translation | adapt an image guided by a text prompt | [img2img](./using-diffusers/img2img) |
|
| 52 |
+
| Text-Guided Image-Inpainting | fill the masked part of an image given the image, the mask and a text prompt | [inpaint](./using-diffusers/inpaint) |
|
| 53 |
+
| Text-Guided Depth-to-Image Translation | adapt parts of an image guided by a text prompt while preserving structure via depth estimation | [depth2img](./using-diffusers/depth2img) |
|
| 54 |
+
|
| 55 |
+
Start by creating an instance of a [`DiffusionPipeline`] and specify which pipeline checkpoint you would like to download.
|
| 56 |
+
You can use the [`DiffusionPipeline`] for any [checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads) stored on the Hugging Face Hub.
|
| 57 |
+
In this quicktour, you'll load the [`stable-diffusion-v1-5`](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) checkpoint for text-to-image generation.
|
| 58 |
+
|
| 59 |
+
<Tip warning={true}>
|
| 60 |
+
|
| 61 |
+
For [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion) models, please carefully read the [license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) first before running the model. 🧨 Diffusers implements a [`safety_checker`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py) to prevent offensive or harmful content, but the model's improved image generation capabilities can still produce potentially harmful content.
|
| 62 |
+
|
| 63 |
+
</Tip>
|
| 64 |
+
|
| 65 |
+
Load the model with the [`~DiffusionPipeline.from_pretrained`] method:
|
| 66 |
+
|
| 67 |
+
```python
|
| 68 |
+
>>> from diffusers import DiffusionPipeline
|
| 69 |
+
|
| 70 |
+
>>> pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", use_safetensors=True)
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components. You'll see that the Stable Diffusion pipeline is composed of the [`UNet2DConditionModel`] and [`PNDMScheduler`] among other things:
|
| 74 |
+
|
| 75 |
+
```py
|
| 76 |
+
>>> pipeline
|
| 77 |
+
StableDiffusionPipeline {
|
| 78 |
+
"_class_name": "StableDiffusionPipeline",
|
| 79 |
+
"_diffusers_version": "0.21.4",
|
| 80 |
+
...,
|
| 81 |
+
"scheduler": [
|
| 82 |
+
"diffusers",
|
| 83 |
+
"PNDMScheduler"
|
| 84 |
+
],
|
| 85 |
+
...,
|
| 86 |
+
"unet": [
|
| 87 |
+
"diffusers",
|
| 88 |
+
"UNet2DConditionModel"
|
| 89 |
+
],
|
| 90 |
+
"vae": [
|
| 91 |
+
"diffusers",
|
| 92 |
+
"AutoencoderKL"
|
| 93 |
+
]
|
| 94 |
+
}
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
We strongly recommend running the pipeline on a GPU because the model consists of roughly 1.4 billion parameters.
|
| 98 |
+
You can move the generator object to a GPU, just like you would in PyTorch:
|
| 99 |
+
|
| 100 |
+
```python
|
| 101 |
+
>>> pipeline.to("cuda")
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
Now you can pass a text prompt to the `pipeline` to generate an image, and then access the denoised image. By default, the image output is wrapped in a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) object.
|
| 105 |
+
|
| 106 |
+
```python
|
| 107 |
+
>>> image = pipeline("An image of a squirrel in Picasso style").images[0]
|
| 108 |
+
>>> image
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
<div class="flex justify-center">
|
| 112 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/image_of_squirrel_painting.png"/>
|
| 113 |
+
</div>
|
| 114 |
+
|
| 115 |
+
Save the image by calling `save`:
|
| 116 |
+
|
| 117 |
+
```python
|
| 118 |
+
>>> image.save("image_of_squirrel_painting.png")
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
### Local pipeline
|
| 122 |
+
|
| 123 |
+
You can also use the pipeline locally. The only difference is you need to download the weights first:
|
| 124 |
+
|
| 125 |
+
```bash
|
| 126 |
+
!git lfs install
|
| 127 |
+
!git clone https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5
|
| 128 |
+
```
|
| 129 |
+
|
| 130 |
+
Then load the saved weights into the pipeline:
|
| 131 |
+
|
| 132 |
+
```python
|
| 133 |
+
>>> pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5", use_safetensors=True)
|
| 134 |
+
```
|
| 135 |
+
|
| 136 |
+
Now, you can run the pipeline as you would in the section above.
|
| 137 |
+
|
| 138 |
+
### Swapping schedulers
|
| 139 |
+
|
| 140 |
+
Different schedulers come with different denoising speeds and quality trade-offs. The best way to find out which one works best for you is to try them out! One of the main features of 🧨 Diffusers is to allow you to easily switch between schedulers. For example, to replace the default [`PNDMScheduler`] with the [`EulerDiscreteScheduler`], load it with the [`~diffusers.ConfigMixin.from_config`] method:
|
| 141 |
+
|
| 142 |
+
```py
|
| 143 |
+
>>> from diffusers import EulerDiscreteScheduler
|
| 144 |
+
|
| 145 |
+
>>> pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", use_safetensors=True)
|
| 146 |
+
>>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)
|
| 147 |
+
```
|
| 148 |
+
|
| 149 |
+
Try generating an image with the new scheduler and see if you notice a difference!
|
| 150 |
+
|
| 151 |
+
In the next section, you'll take a closer look at the components - the model and scheduler - that make up the [`DiffusionPipeline`] and learn how to use these components to generate an image of a cat.
|
| 152 |
+
|
| 153 |
+
## Models
|
| 154 |
+
|
| 155 |
+
Most models take a noisy sample, and at each timestep it predicts the *noise residual* (other models learn to predict the previous sample directly or the velocity or [`v-prediction`](https://github.com/huggingface/diffusers/blob/5e5ce13e2f89ac45a0066cb3f369462a3cf1d9ef/src/diffusers/schedulers/scheduling_ddim.py#L110)), the difference between a less noisy image and the input image. You can mix and match models to create other diffusion systems.
|
| 156 |
+
|
| 157 |
+
Models are initiated with the [`~ModelMixin.from_pretrained`] method which also locally caches the model weights so it is faster the next time you load the model. For the quicktour, you'll load the [`UNet2DModel`], a basic unconditional image generation model with a checkpoint trained on cat images:
|
| 158 |
+
|
| 159 |
+
```py
|
| 160 |
+
>>> from diffusers import UNet2DModel
|
| 161 |
+
|
| 162 |
+
>>> repo_id = "google/ddpm-cat-256"
|
| 163 |
+
>>> model = UNet2DModel.from_pretrained(repo_id, use_safetensors=True)
|
| 164 |
+
```
|
| 165 |
+
|
| 166 |
+
> [!TIP]
|
| 167 |
+
> Use the [`AutoModel`] API to automatically select a model class if you're unsure of which one to use.
|
| 168 |
+
|
| 169 |
+
To access the model parameters, call `model.config`:
|
| 170 |
+
|
| 171 |
+
```py
|
| 172 |
+
>>> model.config
|
| 173 |
+
```
|
| 174 |
+
|
| 175 |
+
The model configuration is a 🧊 frozen 🧊 dictionary, which means those parameters can't be changed after the model is created. This is intentional and ensures that the parameters used to define the model architecture at the start remain the same, while other parameters can still be adjusted during inference.
|
| 176 |
+
|
| 177 |
+
Some of the most important parameters are:
|
| 178 |
+
|
| 179 |
+
* `sample_size`: the height and width dimension of the input sample.
|
| 180 |
+
* `in_channels`: the number of input channels of the input sample.
|
| 181 |
+
* `down_block_types` and `up_block_types`: the type of down- and upsampling blocks used to create the UNet architecture.
|
| 182 |
+
* `block_out_channels`: the number of output channels of the downsampling blocks; also used in reverse order for the number of input channels of the upsampling blocks.
|
| 183 |
+
* `layers_per_block`: the number of ResNet blocks present in each UNet block.
|
| 184 |
+
|
| 185 |
+
To use the model for inference, create the image shape with random Gaussian noise. It should have a `batch` axis because the model can receive multiple random noises, a `channel` axis corresponding to the number of input channels, and a `sample_size` axis for the height and width of the image:
|
| 186 |
+
|
| 187 |
+
```py
|
| 188 |
+
>>> import torch
|
| 189 |
+
|
| 190 |
+
>>> torch.manual_seed(0)
|
| 191 |
+
|
| 192 |
+
>>> noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size)
|
| 193 |
+
>>> noisy_sample.shape
|
| 194 |
+
torch.Size([1, 3, 256, 256])
|
| 195 |
+
```
|
| 196 |
+
|
| 197 |
+
For inference, pass the noisy image and a `timestep` to the model. The `timestep` indicates how noisy the input image is, with more noise at the beginning and less at the end. This helps the model determine its position in the diffusion process, whether it is closer to the start or the end. Use the `sample` method to get the model output:
|
| 198 |
+
|
| 199 |
+
```py
|
| 200 |
+
>>> with torch.no_grad():
|
| 201 |
+
... noisy_residual = model(sample=noisy_sample, timestep=2).sample
|
| 202 |
+
```
|
| 203 |
+
|
| 204 |
+
To generate actual examples though, you'll need a scheduler to guide the denoising process. In the next section, you'll learn how to couple a model with a scheduler.
|
| 205 |
+
|
| 206 |
+
## Schedulers
|
| 207 |
+
|
| 208 |
+
Schedulers manage going from a noisy sample to a less noisy sample given the model output - in this case, it is the `noisy_residual`.
|
| 209 |
+
|
| 210 |
+
<Tip>
|
| 211 |
+
|
| 212 |
+
🧨 Diffusers is a toolbox for building diffusion systems. While the [`DiffusionPipeline`] is a convenient way to get started with a pre-built diffusion system, you can also choose your own model and scheduler components separately to build a custom diffusion system.
|
| 213 |
+
|
| 214 |
+
</Tip>
|
| 215 |
+
|
| 216 |
+
For the quicktour, you'll instantiate the [`DDPMScheduler`] with its [`~diffusers.ConfigMixin.from_config`] method:
|
| 217 |
+
|
| 218 |
+
```py
|
| 219 |
+
>>> from diffusers import DDPMScheduler
|
| 220 |
+
|
| 221 |
+
>>> scheduler = DDPMScheduler.from_pretrained(repo_id)
|
| 222 |
+
>>> scheduler
|
| 223 |
+
DDPMScheduler {
|
| 224 |
+
"_class_name": "DDPMScheduler",
|
| 225 |
+
"_diffusers_version": "0.21.4",
|
| 226 |
+
"beta_end": 0.02,
|
| 227 |
+
"beta_schedule": "linear",
|
| 228 |
+
"beta_start": 0.0001,
|
| 229 |
+
"clip_sample": true,
|
| 230 |
+
"clip_sample_range": 1.0,
|
| 231 |
+
"dynamic_thresholding_ratio": 0.995,
|
| 232 |
+
"num_train_timesteps": 1000,
|
| 233 |
+
"prediction_type": "epsilon",
|
| 234 |
+
"sample_max_value": 1.0,
|
| 235 |
+
"steps_offset": 0,
|
| 236 |
+
"thresholding": false,
|
| 237 |
+
"timestep_spacing": "leading",
|
| 238 |
+
"trained_betas": null,
|
| 239 |
+
"variance_type": "fixed_small"
|
| 240 |
+
}
|
| 241 |
+
```
|
| 242 |
+
|
| 243 |
+
<Tip>
|
| 244 |
+
|
| 245 |
+
💡 Unlike a model, a scheduler does not have trainable weights and is parameter-free!
|
| 246 |
+
|
| 247 |
+
</Tip>
|
| 248 |
+
|
| 249 |
+
Some of the most important parameters are:
|
| 250 |
+
|
| 251 |
+
* `num_train_timesteps`: the length of the denoising process or, in other words, the number of timesteps required to process random Gaussian noise into a data sample.
|
| 252 |
+
* `beta_schedule`: the type of noise schedule to use for inference and training.
|
| 253 |
+
* `beta_start` and `beta_end`: the start and end noise values for the noise schedule.
|
| 254 |
+
|
| 255 |
+
To predict a slightly less noisy image, pass the following to the scheduler's [`~diffusers.DDPMScheduler.step`] method: model output, `timestep`, and current `sample`.
|
| 256 |
+
|
| 257 |
+
```py
|
| 258 |
+
>>> less_noisy_sample = scheduler.step(model_output=noisy_residual, timestep=2, sample=noisy_sample).prev_sample
|
| 259 |
+
>>> less_noisy_sample.shape
|
| 260 |
+
torch.Size([1, 3, 256, 256])
|
| 261 |
+
```
|
| 262 |
+
|
| 263 |
+
The `less_noisy_sample` can be passed to the next `timestep` where it'll get even less noisy! Let's bring it all together now and visualize the entire denoising process.
|
| 264 |
+
|
| 265 |
+
First, create a function that postprocesses and displays the denoised image as a `PIL.Image`:
|
| 266 |
+
|
| 267 |
+
```py
|
| 268 |
+
>>> import PIL.Image
|
| 269 |
+
>>> import numpy as np
|
| 270 |
+
|
| 271 |
+
|
| 272 |
+
>>> def display_sample(sample, i):
|
| 273 |
+
... image_processed = sample.cpu().permute(0, 2, 3, 1)
|
| 274 |
+
... image_processed = (image_processed + 1.0) * 127.5
|
| 275 |
+
... image_processed = image_processed.numpy().astype(np.uint8)
|
| 276 |
+
|
| 277 |
+
... image_pil = PIL.Image.fromarray(image_processed[0])
|
| 278 |
+
... display(f"Image at step {i}")
|
| 279 |
+
... display(image_pil)
|
| 280 |
+
```
|
| 281 |
+
|
| 282 |
+
To speed up the denoising process, move the input and model to a GPU:
|
| 283 |
+
|
| 284 |
+
```py
|
| 285 |
+
>>> model.to("cuda")
|
| 286 |
+
>>> noisy_sample = noisy_sample.to("cuda")
|
| 287 |
+
```
|
| 288 |
+
|
| 289 |
+
Now create a denoising loop that predicts the residual of the less noisy sample, and computes the less noisy sample with the scheduler:
|
| 290 |
+
|
| 291 |
+
```py
|
| 292 |
+
>>> import tqdm
|
| 293 |
+
|
| 294 |
+
>>> sample = noisy_sample
|
| 295 |
+
|
| 296 |
+
>>> for i, t in enumerate(tqdm.tqdm(scheduler.timesteps)):
|
| 297 |
+
... # 1. predict noise residual
|
| 298 |
+
... with torch.no_grad():
|
| 299 |
+
... residual = model(sample, t).sample
|
| 300 |
+
|
| 301 |
+
... # 2. compute less noisy image and set x_t -> x_t-1
|
| 302 |
+
... sample = scheduler.step(residual, t, sample).prev_sample
|
| 303 |
+
|
| 304 |
+
... # 3. optionally look at image
|
| 305 |
+
... if (i + 1) % 50 == 0:
|
| 306 |
+
... display_sample(sample, i + 1)
|
| 307 |
+
```
|
| 308 |
+
|
| 309 |
+
Sit back and watch as a cat is generated from nothing but noise! 😻
|
| 310 |
+
|
| 311 |
+
<div class="flex justify-center">
|
| 312 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/diffusion-quicktour.png"/>
|
| 313 |
+
</div>
|
| 314 |
+
|
| 315 |
+
## Next steps
|
| 316 |
+
|
| 317 |
+
Hopefully, you generated some cool images with 🧨 Diffusers in this quicktour! For your next steps, you can:
|
| 318 |
+
|
| 319 |
+
* Train or finetune a model to generate your own images in the [training](./tutorials/basic_training) tutorial.
|
| 320 |
+
* See example official and community [training or finetuning scripts](https://github.com/huggingface/diffusers/tree/main/examples#-diffusers-examples) for a variety of use cases.
|
| 321 |
+
* Learn more about loading, accessing, changing, and comparing schedulers in the [Using different Schedulers](./using-diffusers/schedulers) guide.
|
| 322 |
+
* Explore prompt engineering, speed and memory optimizations, and tips and tricks for generating higher-quality images with the [Stable Diffusion](./stable_diffusion) guide.
|
| 323 |
+
* Dive deeper into speeding up 🧨 Diffusers with guides on [optimized PyTorch on a GPU](./optimization/fp16), and inference guides for running [Stable Diffusion on Apple Silicon (M1/M2)](./optimization/mps) and [ONNX Runtime](./optimization/onnx).
|
diffusers/docs/source/en/stable_diffusion.md
ADDED
|
@@ -0,0 +1,261 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
| 2 |
+
|
| 3 |
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
| 4 |
+
the License. You may obtain a copy of the License at
|
| 5 |
+
|
| 6 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 7 |
+
|
| 8 |
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
| 9 |
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
| 10 |
+
specific language governing permissions and limitations under the License.
|
| 11 |
+
-->
|
| 12 |
+
|
| 13 |
+
# Effective and efficient diffusion
|
| 14 |
+
|
| 15 |
+
[[open-in-colab]]
|
| 16 |
+
|
| 17 |
+
Getting the [`DiffusionPipeline`] to generate images in a certain style or include what you want can be tricky. Often times, you have to run the [`DiffusionPipeline`] several times before you end up with an image you're happy with. But generating something out of nothing is a computationally intensive process, especially if you're running inference over and over again.
|
| 18 |
+
|
| 19 |
+
This is why it's important to get the most *computational* (speed) and *memory* (GPU vRAM) efficiency from the pipeline to reduce the time between inference cycles so you can iterate faster.
|
| 20 |
+
|
| 21 |
+
This tutorial walks you through how to generate faster and better with the [`DiffusionPipeline`].
|
| 22 |
+
|
| 23 |
+
Begin by loading the [`stable-diffusion-v1-5/stable-diffusion-v1-5`](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) model:
|
| 24 |
+
|
| 25 |
+
```python
|
| 26 |
+
from diffusers import DiffusionPipeline
|
| 27 |
+
|
| 28 |
+
model_id = "stable-diffusion-v1-5/stable-diffusion-v1-5"
|
| 29 |
+
pipeline = DiffusionPipeline.from_pretrained(model_id, use_safetensors=True)
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
The example prompt you'll use is a portrait of an old warrior chief, but feel free to use your own prompt:
|
| 33 |
+
|
| 34 |
+
```python
|
| 35 |
+
prompt = "portrait photo of a old warrior chief"
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
## Speed
|
| 39 |
+
|
| 40 |
+
<Tip>
|
| 41 |
+
|
| 42 |
+
💡 If you don't have access to a GPU, you can use one for free from a GPU provider like [Colab](https://colab.research.google.com/)!
|
| 43 |
+
|
| 44 |
+
</Tip>
|
| 45 |
+
|
| 46 |
+
One of the simplest ways to speed up inference is to place the pipeline on a GPU the same way you would with any PyTorch module:
|
| 47 |
+
|
| 48 |
+
```python
|
| 49 |
+
pipeline = pipeline.to("cuda")
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
To make sure you can use the same image and improve on it, use a [`Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) and set a seed for [reproducibility](./using-diffusers/reusing_seeds):
|
| 53 |
+
|
| 54 |
+
```python
|
| 55 |
+
import torch
|
| 56 |
+
|
| 57 |
+
generator = torch.Generator("cuda").manual_seed(0)
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
Now you can generate an image:
|
| 61 |
+
|
| 62 |
+
```python
|
| 63 |
+
image = pipeline(prompt, generator=generator).images[0]
|
| 64 |
+
image
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
<div class="flex justify-center">
|
| 68 |
+
<img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/stable_diffusion_101/sd_101_1.png">
|
| 69 |
+
</div>
|
| 70 |
+
|
| 71 |
+
This process took ~30 seconds on a T4 GPU (it might be faster if your allocated GPU is better than a T4). By default, the [`DiffusionPipeline`] runs inference with full `float32` precision for 50 inference steps. You can speed this up by switching to a lower precision like `float16` or running fewer inference steps.
|
| 72 |
+
|
| 73 |
+
Let's start by loading the model in `float16` and generate an image:
|
| 74 |
+
|
| 75 |
+
```python
|
| 76 |
+
import torch
|
| 77 |
+
|
| 78 |
+
pipeline = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, use_safetensors=True)
|
| 79 |
+
pipeline = pipeline.to("cuda")
|
| 80 |
+
generator = torch.Generator("cuda").manual_seed(0)
|
| 81 |
+
image = pipeline(prompt, generator=generator).images[0]
|
| 82 |
+
image
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
<div class="flex justify-center">
|
| 86 |
+
<img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/stable_diffusion_101/sd_101_2.png">
|
| 87 |
+
</div>
|
| 88 |
+
|
| 89 |
+
This time, it only took ~11 seconds to generate the image, which is almost 3x faster than before!
|
| 90 |
+
|
| 91 |
+
<Tip>
|
| 92 |
+
|
| 93 |
+
💡 We strongly suggest always running your pipelines in `float16`, and so far, we've rarely seen any degradation in output quality.
|
| 94 |
+
|
| 95 |
+
</Tip>
|
| 96 |
+
|
| 97 |
+
Another option is to reduce the number of inference steps. Choosing a more efficient scheduler could help decrease the number of steps without sacrificing output quality. You can find which schedulers are compatible with the current model in the [`DiffusionPipeline`] by calling the `compatibles` method:
|
| 98 |
+
|
| 99 |
+
```python
|
| 100 |
+
pipeline.scheduler.compatibles
|
| 101 |
+
[
|
| 102 |
+
diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler,
|
| 103 |
+
diffusers.schedulers.scheduling_unipc_multistep.UniPCMultistepScheduler,
|
| 104 |
+
diffusers.schedulers.scheduling_k_dpm_2_discrete.KDPM2DiscreteScheduler,
|
| 105 |
+
diffusers.schedulers.scheduling_deis_multistep.DEISMultistepScheduler,
|
| 106 |
+
diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler,
|
| 107 |
+
diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler,
|
| 108 |
+
diffusers.schedulers.scheduling_ddpm.DDPMScheduler,
|
| 109 |
+
diffusers.schedulers.scheduling_dpmsolver_singlestep.DPMSolverSinglestepScheduler,
|
| 110 |
+
diffusers.schedulers.scheduling_k_dpm_2_ancestral_discrete.KDPM2AncestralDiscreteScheduler,
|
| 111 |
+
diffusers.utils.dummy_torch_and_torchsde_objects.DPMSolverSDEScheduler,
|
| 112 |
+
diffusers.schedulers.scheduling_heun_discrete.HeunDiscreteScheduler,
|
| 113 |
+
diffusers.schedulers.scheduling_pndm.PNDMScheduler,
|
| 114 |
+
diffusers.schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteScheduler,
|
| 115 |
+
diffusers.schedulers.scheduling_ddim.DDIMScheduler,
|
| 116 |
+
]
|
| 117 |
+
```
|
| 118 |
+
|
| 119 |
+
The Stable Diffusion model uses the [`PNDMScheduler`] by default which usually requires ~50 inference steps, but more performant schedulers like [`DPMSolverMultistepScheduler`], require only ~20 or 25 inference steps. Use the [`~ConfigMixin.from_config`] method to load a new scheduler:
|
| 120 |
+
|
| 121 |
+
```python
|
| 122 |
+
from diffusers import DPMSolverMultistepScheduler
|
| 123 |
+
|
| 124 |
+
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
|
| 125 |
+
```
|
| 126 |
+
|
| 127 |
+
Now set the `num_inference_steps` to 20:
|
| 128 |
+
|
| 129 |
+
```python
|
| 130 |
+
generator = torch.Generator("cuda").manual_seed(0)
|
| 131 |
+
image = pipeline(prompt, generator=generator, num_inference_steps=20).images[0]
|
| 132 |
+
image
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
+
<div class="flex justify-center">
|
| 136 |
+
<img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/stable_diffusion_101/sd_101_3.png">
|
| 137 |
+
</div>
|
| 138 |
+
|
| 139 |
+
Great, you've managed to cut the inference time to just 4 seconds! ⚡️
|
| 140 |
+
|
| 141 |
+
## Memory
|
| 142 |
+
|
| 143 |
+
The other key to improving pipeline performance is consuming less memory, which indirectly implies more speed, since you're often trying to maximize the number of images generated per second. The easiest way to see how many images you can generate at once is to try out different batch sizes until you get an `OutOfMemoryError` (OOM).
|
| 144 |
+
|
| 145 |
+
Create a function that'll generate a batch of images from a list of prompts and `Generators`. Make sure to assign each `Generator` a seed so you can reuse it if it produces a good result.
|
| 146 |
+
|
| 147 |
+
```python
|
| 148 |
+
def get_inputs(batch_size=1):
|
| 149 |
+
generator = [torch.Generator("cuda").manual_seed(i) for i in range(batch_size)]
|
| 150 |
+
prompts = batch_size * [prompt]
|
| 151 |
+
num_inference_steps = 20
|
| 152 |
+
|
| 153 |
+
return {"prompt": prompts, "generator": generator, "num_inference_steps": num_inference_steps}
|
| 154 |
+
```
|
| 155 |
+
|
| 156 |
+
Start with `batch_size=4` and see how much memory you've consumed:
|
| 157 |
+
|
| 158 |
+
```python
|
| 159 |
+
from diffusers.utils import make_image_grid
|
| 160 |
+
|
| 161 |
+
images = pipeline(**get_inputs(batch_size=4)).images
|
| 162 |
+
make_image_grid(images, 2, 2)
|
| 163 |
+
```
|
| 164 |
+
|
| 165 |
+
Unless you have a GPU with more vRAM, the code above probably returned an `OOM` error! Most of the memory is taken up by the cross-attention layers. Instead of running this operation in a batch, you can run it sequentially to save a significant amount of memory. All you have to do is configure the pipeline to use the [`~DiffusionPipeline.enable_attention_slicing`] function:
|
| 166 |
+
|
| 167 |
+
```python
|
| 168 |
+
pipeline.enable_attention_slicing()
|
| 169 |
+
```
|
| 170 |
+
|
| 171 |
+
Now try increasing the `batch_size` to 8!
|
| 172 |
+
|
| 173 |
+
```python
|
| 174 |
+
images = pipeline(**get_inputs(batch_size=8)).images
|
| 175 |
+
make_image_grid(images, rows=2, cols=4)
|
| 176 |
+
```
|
| 177 |
+
|
| 178 |
+
<div class="flex justify-center">
|
| 179 |
+
<img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/stable_diffusion_101/sd_101_5.png">
|
| 180 |
+
</div>
|
| 181 |
+
|
| 182 |
+
Whereas before you couldn't even generate a batch of 4 images, now you can generate a batch of 8 images at ~3.5 seconds per image! This is probably the fastest you can go on a T4 GPU without sacrificing quality.
|
| 183 |
+
|
| 184 |
+
## Quality
|
| 185 |
+
|
| 186 |
+
In the last two sections, you learned how to optimize the speed of your pipeline by using `fp16`, reducing the number of inference steps by using a more performant scheduler, and enabling attention slicing to reduce memory consumption. Now you're going to focus on how to improve the quality of generated images.
|
| 187 |
+
|
| 188 |
+
### Better checkpoints
|
| 189 |
+
|
| 190 |
+
The most obvious step is to use better checkpoints. The Stable Diffusion model is a good starting point, and since its official launch, several improved versions have also been released. However, using a newer version doesn't automatically mean you'll get better results. You'll still have to experiment with different checkpoints yourself, and do a little research (such as using [negative prompts](https://minimaxir.com/2022/11/stable-diffusion-negative-prompt/)) to get the best results.
|
| 191 |
+
|
| 192 |
+
As the field grows, there are more and more high-quality checkpoints finetuned to produce certain styles. Try exploring the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) and [Diffusers Gallery](https://huggingface.co/spaces/huggingface-projects/diffusers-gallery) to find one you're interested in!
|
| 193 |
+
|
| 194 |
+
### Better pipeline components
|
| 195 |
+
|
| 196 |
+
You can also try replacing the current pipeline components with a newer version. Let's try loading the latest [autoencoder](https://huggingface.co/stabilityai/stable-diffusion-2-1/tree/main/vae) from Stability AI into the pipeline, and generate some images:
|
| 197 |
+
|
| 198 |
+
```python
|
| 199 |
+
from diffusers import AutoencoderKL
|
| 200 |
+
|
| 201 |
+
vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse", torch_dtype=torch.float16).to("cuda")
|
| 202 |
+
pipeline.vae = vae
|
| 203 |
+
images = pipeline(**get_inputs(batch_size=8)).images
|
| 204 |
+
make_image_grid(images, rows=2, cols=4)
|
| 205 |
+
```
|
| 206 |
+
|
| 207 |
+
<div class="flex justify-center">
|
| 208 |
+
<img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/stable_diffusion_101/sd_101_6.png">
|
| 209 |
+
</div>
|
| 210 |
+
|
| 211 |
+
### Better prompt engineering
|
| 212 |
+
|
| 213 |
+
The text prompt you use to generate an image is super important, so much so that it is called *prompt engineering*. Some considerations to keep during prompt engineering are:
|
| 214 |
+
|
| 215 |
+
- How is the image or similar images of the one I want to generate stored on the internet?
|
| 216 |
+
- What additional detail can I give that steers the model towards the style I want?
|
| 217 |
+
|
| 218 |
+
With this in mind, let's improve the prompt to include color and higher quality details:
|
| 219 |
+
|
| 220 |
+
```python
|
| 221 |
+
prompt += ", tribal panther make up, blue on red, side profile, looking away, serious eyes"
|
| 222 |
+
prompt += " 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta"
|
| 223 |
+
```
|
| 224 |
+
|
| 225 |
+
Generate a batch of images with the new prompt:
|
| 226 |
+
|
| 227 |
+
```python
|
| 228 |
+
images = pipeline(**get_inputs(batch_size=8)).images
|
| 229 |
+
make_image_grid(images, rows=2, cols=4)
|
| 230 |
+
```
|
| 231 |
+
|
| 232 |
+
<div class="flex justify-center">
|
| 233 |
+
<img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/stable_diffusion_101/sd_101_7.png">
|
| 234 |
+
</div>
|
| 235 |
+
|
| 236 |
+
Pretty impressive! Let's tweak the second image - corresponding to the `Generator` with a seed of `1` - a bit more by adding some text about the age of the subject:
|
| 237 |
+
|
| 238 |
+
```python
|
| 239 |
+
prompts = [
|
| 240 |
+
"portrait photo of the oldest warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta",
|
| 241 |
+
"portrait photo of an old warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta",
|
| 242 |
+
"portrait photo of a warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta",
|
| 243 |
+
"portrait photo of a young warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta",
|
| 244 |
+
]
|
| 245 |
+
|
| 246 |
+
generator = [torch.Generator("cuda").manual_seed(1) for _ in range(len(prompts))]
|
| 247 |
+
images = pipeline(prompt=prompts, generator=generator, num_inference_steps=25).images
|
| 248 |
+
make_image_grid(images, 2, 2)
|
| 249 |
+
```
|
| 250 |
+
|
| 251 |
+
<div class="flex justify-center">
|
| 252 |
+
<img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/stable_diffusion_101/sd_101_8.png">
|
| 253 |
+
</div>
|
| 254 |
+
|
| 255 |
+
## Next steps
|
| 256 |
+
|
| 257 |
+
In this tutorial, you learned how to optimize a [`DiffusionPipeline`] for computational and memory efficiency as well as improving the quality of generated outputs. If you're interested in making your pipeline even faster, take a look at the following resources:
|
| 258 |
+
|
| 259 |
+
- Learn how [PyTorch 2.0](./optimization/fp16) and [`torch.compile`](https://pytorch.org/docs/stable/generated/torch.compile.html) can yield 5 - 300% faster inference speed. On an A100 GPU, inference can be up to 50% faster!
|
| 260 |
+
- If you can't use PyTorch 2, we recommend you install [xFormers](./optimization/xformers). Its memory-efficient attention mechanism works great with PyTorch 1.13.1 for faster speed and reduced memory consumption.
|
| 261 |
+
- Other optimization techniques, such as model offloading, are covered in [this guide](./optimization/fp16).
|
diffusers/docs/source/en/using-diffusers/conditional_image_generation.md
ADDED
|
@@ -0,0 +1,316 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
| 2 |
+
|
| 3 |
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
| 4 |
+
the License. You may obtain a copy of the License at
|
| 5 |
+
|
| 6 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 7 |
+
|
| 8 |
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
| 9 |
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
| 10 |
+
specific language governing permissions and limitations under the License.
|
| 11 |
+
-->
|
| 12 |
+
|
| 13 |
+
# Text-to-image
|
| 14 |
+
|
| 15 |
+
[[open-in-colab]]
|
| 16 |
+
|
| 17 |
+
When you think of diffusion models, text-to-image is usually one of the first things that come to mind. Text-to-image generates an image from a text description (for example, "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k") which is also known as a *prompt*.
|
| 18 |
+
|
| 19 |
+
From a very high level, a diffusion model takes a prompt and some random initial noise, and iteratively removes the noise to construct an image. The *denoising* process is guided by the prompt, and once the denoising process ends after a predetermined number of time steps, the image representation is decoded into an image.
|
| 20 |
+
|
| 21 |
+
<Tip>
|
| 22 |
+
|
| 23 |
+
Read the [How does Stable Diffusion work?](https://huggingface.co/blog/stable_diffusion#how-does-stable-diffusion-work) blog post to learn more about how a latent diffusion model works.
|
| 24 |
+
|
| 25 |
+
</Tip>
|
| 26 |
+
|
| 27 |
+
You can generate images from a prompt in 🤗 Diffusers in two steps:
|
| 28 |
+
|
| 29 |
+
1. Load a checkpoint into the [`AutoPipelineForText2Image`] class, which automatically detects the appropriate pipeline class to use based on the checkpoint:
|
| 30 |
+
|
| 31 |
+
```py
|
| 32 |
+
from diffusers import AutoPipelineForText2Image
|
| 33 |
+
import torch
|
| 34 |
+
|
| 35 |
+
pipeline = AutoPipelineForText2Image.from_pretrained(
|
| 36 |
+
"stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
|
| 37 |
+
).to("cuda")
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
2. Pass a prompt to the pipeline to generate an image:
|
| 41 |
+
|
| 42 |
+
```py
|
| 43 |
+
image = pipeline(
|
| 44 |
+
"stained glass of darth vader, backlight, centered composition, masterpiece, photorealistic, 8k"
|
| 45 |
+
).images[0]
|
| 46 |
+
image
|
| 47 |
+
```
|
| 48 |
+
|
| 49 |
+
<div class="flex justify-center">
|
| 50 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-vader.png"/>
|
| 51 |
+
</div>
|
| 52 |
+
|
| 53 |
+
## Popular models
|
| 54 |
+
|
| 55 |
+
The most common text-to-image models are [Stable Diffusion v1.5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5), [Stable Diffusion XL (SDXL)](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), and [Kandinsky 2.2](https://huggingface.co/kandinsky-community/kandinsky-2-2-decoder). There are also ControlNet models or adapters that can be used with text-to-image models for more direct control in generating images. The results from each model are slightly different because of their architecture and training process, but no matter which model you choose, their usage is more or less the same. Let's use the same prompt for each model and compare their results.
|
| 56 |
+
|
| 57 |
+
### Stable Diffusion v1.5
|
| 58 |
+
|
| 59 |
+
[Stable Diffusion v1.5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) is a latent diffusion model initialized from [Stable Diffusion v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4), and finetuned for 595K steps on 512x512 images from the LAION-Aesthetics V2 dataset. You can use this model like:
|
| 60 |
+
|
| 61 |
+
```py
|
| 62 |
+
from diffusers import AutoPipelineForText2Image
|
| 63 |
+
import torch
|
| 64 |
+
|
| 65 |
+
pipeline = AutoPipelineForText2Image.from_pretrained(
|
| 66 |
+
"stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
|
| 67 |
+
).to("cuda")
|
| 68 |
+
generator = torch.Generator("cuda").manual_seed(31)
|
| 69 |
+
image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", generator=generator).images[0]
|
| 70 |
+
image
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
### Stable Diffusion XL
|
| 74 |
+
|
| 75 |
+
SDXL is a much larger version of the previous Stable Diffusion models, and involves a two-stage model process that adds even more details to an image. It also includes some additional *micro-conditionings* to generate high-quality images centered subjects. Take a look at the more comprehensive [SDXL](sdxl) guide to learn more about how to use it. In general, you can use SDXL like:
|
| 76 |
+
|
| 77 |
+
```py
|
| 78 |
+
from diffusers import AutoPipelineForText2Image
|
| 79 |
+
import torch
|
| 80 |
+
|
| 81 |
+
pipeline = AutoPipelineForText2Image.from_pretrained(
|
| 82 |
+
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16"
|
| 83 |
+
).to("cuda")
|
| 84 |
+
generator = torch.Generator("cuda").manual_seed(31)
|
| 85 |
+
image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", generator=generator).images[0]
|
| 86 |
+
image
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
### Kandinsky 2.2
|
| 90 |
+
|
| 91 |
+
The Kandinsky model is a bit different from the Stable Diffusion models because it also uses an image prior model to create embeddings that are used to better align text and images in the diffusion model.
|
| 92 |
+
|
| 93 |
+
The easiest way to use Kandinsky 2.2 is:
|
| 94 |
+
|
| 95 |
+
```py
|
| 96 |
+
from diffusers import AutoPipelineForText2Image
|
| 97 |
+
import torch
|
| 98 |
+
|
| 99 |
+
pipeline = AutoPipelineForText2Image.from_pretrained(
|
| 100 |
+
"kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16
|
| 101 |
+
).to("cuda")
|
| 102 |
+
generator = torch.Generator("cuda").manual_seed(31)
|
| 103 |
+
image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", generator=generator).images[0]
|
| 104 |
+
image
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
### ControlNet
|
| 108 |
+
|
| 109 |
+
ControlNet models are auxiliary models or adapters that are finetuned on top of text-to-image models, such as [Stable Diffusion v1.5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5). Using ControlNet models in combination with text-to-image models offers diverse options for more explicit control over how to generate an image. With ControlNet, you add an additional conditioning input image to the model. For example, if you provide an image of a human pose (usually represented as multiple keypoints that are connected into a skeleton) as a conditioning input, the model generates an image that follows the pose of the image. Check out the more in-depth [ControlNet](controlnet) guide to learn more about other conditioning inputs and how to use them.
|
| 110 |
+
|
| 111 |
+
In this example, let's condition the ControlNet with a human pose estimation image. Load the ControlNet model pretrained on human pose estimations:
|
| 112 |
+
|
| 113 |
+
```py
|
| 114 |
+
from diffusers import ControlNetModel, AutoPipelineForText2Image
|
| 115 |
+
from diffusers.utils import load_image
|
| 116 |
+
import torch
|
| 117 |
+
|
| 118 |
+
controlnet = ControlNetModel.from_pretrained(
|
| 119 |
+
"lllyasviel/control_v11p_sd15_openpose", torch_dtype=torch.float16, variant="fp16"
|
| 120 |
+
).to("cuda")
|
| 121 |
+
pose_image = load_image("https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/control.png")
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
Pass the `controlnet` to the [`AutoPipelineForText2Image`], and provide the prompt and pose estimation image:
|
| 125 |
+
|
| 126 |
+
```py
|
| 127 |
+
pipeline = AutoPipelineForText2Image.from_pretrained(
|
| 128 |
+
"stable-diffusion-v1-5/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16, variant="fp16"
|
| 129 |
+
).to("cuda")
|
| 130 |
+
generator = torch.Generator("cuda").manual_seed(31)
|
| 131 |
+
image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", image=pose_image, generator=generator).images[0]
|
| 132 |
+
image
|
| 133 |
+
```
|
| 134 |
+
|
| 135 |
+
<div class="flex flex-row gap-4">
|
| 136 |
+
<div class="flex-1">
|
| 137 |
+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-1.png"/>
|
| 138 |
+
<figcaption class="mt-2 text-center text-sm text-gray-500">Stable Diffusion v1.5</figcaption>
|
| 139 |
+
</div>
|
| 140 |
+
<div class="flex-1">
|
| 141 |
+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-text2img.png"/>
|
| 142 |
+
<figcaption class="mt-2 text-center text-sm text-gray-500">Stable Diffusion XL</figcaption>
|
| 143 |
+
</div>
|
| 144 |
+
<div class="flex-1">
|
| 145 |
+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-2.png"/>
|
| 146 |
+
<figcaption class="mt-2 text-center text-sm text-gray-500">Kandinsky 2.2</figcaption>
|
| 147 |
+
</div>
|
| 148 |
+
<div class="flex-1">
|
| 149 |
+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-3.png"/>
|
| 150 |
+
<figcaption class="mt-2 text-center text-sm text-gray-500">ControlNet (pose conditioning)</figcaption>
|
| 151 |
+
</div>
|
| 152 |
+
</div>
|
| 153 |
+
|
| 154 |
+
## Configure pipeline parameters
|
| 155 |
+
|
| 156 |
+
There are a number of parameters that can be configured in the pipeline that affect how an image is generated. You can change the image's output size, specify a negative prompt to improve image quality, and more. This section dives deeper into how to use these parameters.
|
| 157 |
+
|
| 158 |
+
### Height and width
|
| 159 |
+
|
| 160 |
+
The `height` and `width` parameters control the height and width (in pixels) of the generated image. By default, the Stable Diffusion v1.5 model outputs 512x512 images, but you can change this to any size that is a multiple of 8. For example, to create a rectangular image:
|
| 161 |
+
|
| 162 |
+
```py
|
| 163 |
+
from diffusers import AutoPipelineForText2Image
|
| 164 |
+
import torch
|
| 165 |
+
|
| 166 |
+
pipeline = AutoPipelineForText2Image.from_pretrained(
|
| 167 |
+
"stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
|
| 168 |
+
).to("cuda")
|
| 169 |
+
image = pipeline(
|
| 170 |
+
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", height=768, width=512
|
| 171 |
+
).images[0]
|
| 172 |
+
image
|
| 173 |
+
```
|
| 174 |
+
|
| 175 |
+
<div class="flex justify-center">
|
| 176 |
+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-hw.png"/>
|
| 177 |
+
</div>
|
| 178 |
+
|
| 179 |
+
<Tip warning={true}>
|
| 180 |
+
|
| 181 |
+
Other models may have different default image sizes depending on the image sizes in the training dataset. For example, SDXL's default image size is 1024x1024 and using lower `height` and `width` values may result in lower quality images. Make sure you check the model's API reference first!
|
| 182 |
+
|
| 183 |
+
</Tip>
|
| 184 |
+
|
| 185 |
+
### Guidance scale
|
| 186 |
+
|
| 187 |
+
The `guidance_scale` parameter affects how much the prompt influences image generation. A lower value gives the model "creativity" to generate images that are more loosely related to the prompt. Higher `guidance_scale` values push the model to follow the prompt more closely, and if this value is too high, you may observe some artifacts in the generated image.
|
| 188 |
+
|
| 189 |
+
```py
|
| 190 |
+
from diffusers import AutoPipelineForText2Image
|
| 191 |
+
import torch
|
| 192 |
+
|
| 193 |
+
pipeline = AutoPipelineForText2Image.from_pretrained(
|
| 194 |
+
"stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16
|
| 195 |
+
).to("cuda")
|
| 196 |
+
image = pipeline(
|
| 197 |
+
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", guidance_scale=3.5
|
| 198 |
+
).images[0]
|
| 199 |
+
image
|
| 200 |
+
```
|
| 201 |
+
|
| 202 |
+
<div class="flex flex-row gap-4">
|
| 203 |
+
<div class="flex-1">
|
| 204 |
+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-guidance-scale-2.5.png"/>
|
| 205 |
+
<figcaption class="mt-2 text-center text-sm text-gray-500">guidance_scale = 2.5</figcaption>
|
| 206 |
+
</div>
|
| 207 |
+
<div class="flex-1">
|
| 208 |
+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-guidance-scale-7.5.png"/>
|
| 209 |
+
<figcaption class="mt-2 text-center text-sm text-gray-500">guidance_scale = 7.5</figcaption>
|
| 210 |
+
</div>
|
| 211 |
+
<div class="flex-1">
|
| 212 |
+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-guidance-scale-10.5.png"/>
|
| 213 |
+
<figcaption class="mt-2 text-center text-sm text-gray-500">guidance_scale = 10.5</figcaption>
|
| 214 |
+
</div>
|
| 215 |
+
</div>
|
| 216 |
+
|
| 217 |
+
### Negative prompt
|
| 218 |
+
|
| 219 |
+
Just like how a prompt guides generation, a *negative prompt* steers the model away from things you don't want the model to generate. This is commonly used to improve overall image quality by removing poor or bad image features such as "low resolution" or "bad details". You can also use a negative prompt to remove or modify the content and style of an image.
|
| 220 |
+
|
| 221 |
+
```py
|
| 222 |
+
from diffusers import AutoPipelineForText2Image
|
| 223 |
+
import torch
|
| 224 |
+
|
| 225 |
+
pipeline = AutoPipelineForText2Image.from_pretrained(
|
| 226 |
+
"stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16
|
| 227 |
+
).to("cuda")
|
| 228 |
+
image = pipeline(
|
| 229 |
+
prompt="Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
|
| 230 |
+
negative_prompt="ugly, deformed, disfigured, poor details, bad anatomy",
|
| 231 |
+
).images[0]
|
| 232 |
+
image
|
| 233 |
+
```
|
| 234 |
+
|
| 235 |
+
<div class="flex flex-row gap-4">
|
| 236 |
+
<div class="flex-1">
|
| 237 |
+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-neg-prompt-1.png"/>
|
| 238 |
+
<figcaption class="mt-2 text-center text-sm text-gray-500">negative_prompt = "ugly, deformed, disfigured, poor details, bad anatomy"</figcaption>
|
| 239 |
+
</div>
|
| 240 |
+
<div class="flex-1">
|
| 241 |
+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-neg-prompt-2.png"/>
|
| 242 |
+
<figcaption class="mt-2 text-center text-sm text-gray-500">negative_prompt = "astronaut"</figcaption>
|
| 243 |
+
</div>
|
| 244 |
+
</div>
|
| 245 |
+
|
| 246 |
+
### Generator
|
| 247 |
+
|
| 248 |
+
A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html#generator) object enables reproducibility in a pipeline by setting a manual seed. You can use a `Generator` to generate batches of images and iteratively improve on an image generated from a seed as detailed in the [Improve image quality with deterministic generation](reusing_seeds) guide.
|
| 249 |
+
|
| 250 |
+
You can set a seed and `Generator` as shown below. Creating an image with a `Generator` should return the same result each time instead of randomly generating a new image.
|
| 251 |
+
|
| 252 |
+
```py
|
| 253 |
+
from diffusers import AutoPipelineForText2Image
|
| 254 |
+
import torch
|
| 255 |
+
|
| 256 |
+
pipeline = AutoPipelineForText2Image.from_pretrained(
|
| 257 |
+
"stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16
|
| 258 |
+
).to("cuda")
|
| 259 |
+
generator = torch.Generator(device="cuda").manual_seed(30)
|
| 260 |
+
image = pipeline(
|
| 261 |
+
"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
|
| 262 |
+
generator=generator,
|
| 263 |
+
).images[0]
|
| 264 |
+
image
|
| 265 |
+
```
|
| 266 |
+
|
| 267 |
+
## Control image generation
|
| 268 |
+
|
| 269 |
+
There are several ways to exert more control over how an image is generated outside of configuring a pipeline's parameters, such as prompt weighting and ControlNet models.
|
| 270 |
+
|
| 271 |
+
### Prompt weighting
|
| 272 |
+
|
| 273 |
+
Prompt weighting is a technique for increasing or decreasing the importance of concepts in a prompt to emphasize or minimize certain features in an image. We recommend using the [Compel](https://github.com/damian0815/compel) library to help you generate the weighted prompt embeddings.
|
| 274 |
+
|
| 275 |
+
<Tip>
|
| 276 |
+
|
| 277 |
+
Learn how to create the prompt embeddings in the [Prompt weighting](weighted_prompts) guide. This example focuses on how to use the prompt embeddings in the pipeline.
|
| 278 |
+
|
| 279 |
+
</Tip>
|
| 280 |
+
|
| 281 |
+
Once you've created the embeddings, you can pass them to the `prompt_embeds` (and `negative_prompt_embeds` if you're using a negative prompt) parameter in the pipeline.
|
| 282 |
+
|
| 283 |
+
```py
|
| 284 |
+
from diffusers import AutoPipelineForText2Image
|
| 285 |
+
import torch
|
| 286 |
+
|
| 287 |
+
pipeline = AutoPipelineForText2Image.from_pretrained(
|
| 288 |
+
"stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16
|
| 289 |
+
).to("cuda")
|
| 290 |
+
image = pipeline(
|
| 291 |
+
prompt_embeds=prompt_embeds, # generated from Compel
|
| 292 |
+
negative_prompt_embeds=negative_prompt_embeds, # generated from Compel
|
| 293 |
+
).images[0]
|
| 294 |
+
```
|
| 295 |
+
|
| 296 |
+
### ControlNet
|
| 297 |
+
|
| 298 |
+
As you saw in the [ControlNet](#controlnet) section, these models offer a more flexible and accurate way to generate images by incorporating an additional conditioning image input. Each ControlNet model is pretrained on a particular type of conditioning image to generate new images that resemble it. For example, if you take a ControlNet model pretrained on depth maps, you can give the model a depth map as a conditioning input and it'll generate an image that preserves the spatial information in it. This is quicker and easier than specifying the depth information in a prompt. You can even combine multiple conditioning inputs with a [MultiControlNet](controlnet#multicontrolnet)!
|
| 299 |
+
|
| 300 |
+
There are many types of conditioning inputs you can use, and 🤗 Diffusers supports ControlNet for Stable Diffusion and SDXL models. Take a look at the more comprehensive [ControlNet](controlnet) guide to learn how you can use these models.
|
| 301 |
+
|
| 302 |
+
## Optimize
|
| 303 |
+
|
| 304 |
+
Diffusion models are large, and the iterative nature of denoising an image is computationally expensive and intensive. But this doesn't mean you need access to powerful - or even many - GPUs to use them. There are many optimization techniques for running diffusion models on consumer and free-tier resources. For example, you can load model weights in half-precision to save GPU memory and increase speed or offload the entire model to the GPU to save even more memory.
|
| 305 |
+
|
| 306 |
+
PyTorch 2.0 also supports a more memory-efficient attention mechanism called [*scaled dot product attention*](../optimization/fp16#scaled-dot-product-attention) that is automatically enabled if you're using PyTorch 2.0. You can combine this with [`torch.compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) to speed your code up even more:
|
| 307 |
+
|
| 308 |
+
```py
|
| 309 |
+
from diffusers import AutoPipelineForText2Image
|
| 310 |
+
import torch
|
| 311 |
+
|
| 312 |
+
pipeline = AutoPipelineForText2Image.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16").to("cuda")
|
| 313 |
+
pipeline.unet = torch.compile(pipeline.unet, mode="reduce-overhead", fullgraph=True)
|
| 314 |
+
```
|
| 315 |
+
|
| 316 |
+
For more tips on how to optimize your code to save memory and speed up inference, read the [Accelerate inference](../optimization/fp16) and [Reduce memory usage](../optimization/memory) guides.
|
diffusers/docs/source/en/using-diffusers/consisid.md
ADDED
|
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
| 2 |
+
|
| 3 |
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
| 4 |
+
the License. You may obtain a copy of the License at
|
| 5 |
+
|
| 6 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 7 |
+
|
| 8 |
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
| 9 |
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
| 10 |
+
specific language governing permissions and limitations under the License.
|
| 11 |
+
-->
|
| 12 |
+
# ConsisID
|
| 13 |
+
|
| 14 |
+
[ConsisID](https://github.com/PKU-YuanGroup/ConsisID) is an identity-preserving text-to-video generation model that keeps the face consistent in the generated video by frequency decomposition. The main features of ConsisID are:
|
| 15 |
+
|
| 16 |
+
- Frequency decomposition: The characteristics of the DiT architecture are analyzed from the frequency domain perspective, and based on these characteristics, a reasonable control information injection method is designed.
|
| 17 |
+
- Consistency training strategy: A coarse-to-fine training strategy, dynamic masking loss, and dynamic cross-face loss further enhance the model's generalization ability and identity preservation performance.
|
| 18 |
+
- Inference without finetuning: Previous methods required case-by-case finetuning of the input ID before inference, leading to significant time and computational costs. In contrast, ConsisID is tuning-free.
|
| 19 |
+
|
| 20 |
+
This guide will walk you through using ConsisID for use cases.
|
| 21 |
+
|
| 22 |
+
## Load Model Checkpoints
|
| 23 |
+
|
| 24 |
+
Model weights may be stored in separate subfolders on the Hub or locally, in which case, you should use the [`~DiffusionPipeline.from_pretrained`] method.
|
| 25 |
+
|
| 26 |
+
```python
|
| 27 |
+
# !pip install consisid_eva_clip insightface facexlib
|
| 28 |
+
import torch
|
| 29 |
+
from diffusers import ConsisIDPipeline
|
| 30 |
+
from diffusers.pipelines.consisid.consisid_utils import prepare_face_models, process_face_embeddings_infer
|
| 31 |
+
from huggingface_hub import snapshot_download
|
| 32 |
+
|
| 33 |
+
# Download ckpts
|
| 34 |
+
snapshot_download(repo_id="BestWishYsh/ConsisID-preview", local_dir="BestWishYsh/ConsisID-preview")
|
| 35 |
+
|
| 36 |
+
# Load face helper model to preprocess input face image
|
| 37 |
+
face_helper_1, face_helper_2, face_clip_model, face_main_model, eva_transform_mean, eva_transform_std = prepare_face_models("BestWishYsh/ConsisID-preview", device="cuda", dtype=torch.bfloat16)
|
| 38 |
+
|
| 39 |
+
# Load consisid base model
|
| 40 |
+
pipe = ConsisIDPipeline.from_pretrained("BestWishYsh/ConsisID-preview", torch_dtype=torch.bfloat16)
|
| 41 |
+
pipe.to("cuda")
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
## Identity-Preserving Text-to-Video
|
| 45 |
+
|
| 46 |
+
For identity-preserving text-to-video, pass a text prompt and an image contain clear face (e.g., preferably half-body or full-body). By default, ConsisID generates a 720x480 video for the best results.
|
| 47 |
+
|
| 48 |
+
```python
|
| 49 |
+
from diffusers.utils import export_to_video
|
| 50 |
+
|
| 51 |
+
prompt = "The video captures a boy walking along a city street, filmed in black and white on a classic 35mm camera. His expression is thoughtful, his brow slightly furrowed as if he's lost in contemplation. The film grain adds a textured, timeless quality to the image, evoking a sense of nostalgia. Around him, the cityscape is filled with vintage buildings, cobblestone sidewalks, and softly blurred figures passing by, their outlines faint and indistinct. Streetlights cast a gentle glow, while shadows play across the boy's path, adding depth to the scene. The lighting highlights the boy's subtle smile, hinting at a fleeting moment of curiosity. The overall cinematic atmosphere, complete with classic film still aesthetics and dramatic contrasts, gives the scene an evocative and introspective feel."
|
| 52 |
+
image = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_input.png?download=true"
|
| 53 |
+
|
| 54 |
+
id_cond, id_vit_hidden, image, face_kps = process_face_embeddings_infer(face_helper_1, face_clip_model, face_helper_2, eva_transform_mean, eva_transform_std, face_main_model, "cuda", torch.bfloat16, image, is_align_face=True)
|
| 55 |
+
|
| 56 |
+
video = pipe(image=image, prompt=prompt, num_inference_steps=50, guidance_scale=6.0, use_dynamic_cfg=False, id_vit_hidden=id_vit_hidden, id_cond=id_cond, kps_cond=face_kps, generator=torch.Generator("cuda").manual_seed(42))
|
| 57 |
+
export_to_video(video.frames[0], "output.mp4", fps=8)
|
| 58 |
+
```
|
| 59 |
+
<table>
|
| 60 |
+
<tr>
|
| 61 |
+
<th style="text-align: center;">Face Image</th>
|
| 62 |
+
<th style="text-align: center;">Video</th>
|
| 63 |
+
<th style="text-align: center;">Description</th
|
| 64 |
+
</tr>
|
| 65 |
+
<tr>
|
| 66 |
+
<td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_image_0.png?download=true" style="height: auto; width: 600px;"></td>
|
| 67 |
+
<td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_output_0.gif?download=true" style="height: auto; width: 2000px;"></td>
|
| 68 |
+
<td>The video, in a beautifully crafted animated style, features a confident woman riding a horse through a lush forest clearing. Her expression is focused yet serene as she adjusts her wide-brimmed hat with a practiced hand. She wears a flowy bohemian dress, which moves gracefully with the rhythm of the horse, the fabric flowing fluidly in the animated motion. The dappled sunlight filters through the trees, casting soft, painterly patterns on the forest floor. Her posture is poised, showing both control and elegance as she guides the horse with ease. The animation's gentle, fluid style adds a dreamlike quality to the scene, with the woman’s calm demeanor and the peaceful surroundings evoking a sense of freedom and harmony.</td>
|
| 69 |
+
</tr>
|
| 70 |
+
<tr>
|
| 71 |
+
<td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_image_1.png?download=true" style="height: auto; width: 600px;"></td>
|
| 72 |
+
<td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_output_1.gif?download=true" style="height: auto; width: 2000px;"></td>
|
| 73 |
+
<td>The video, in a captivating animated style, shows a woman standing in the center of a snowy forest, her eyes narrowed in concentration as she extends her hand forward. She is dressed in a deep blue cloak, her breath visible in the cold air, which is rendered with soft, ethereal strokes. A faint smile plays on her lips as she summons a wisp of ice magic, watching with focus as the surrounding trees and ground begin to shimmer and freeze, covered in delicate ice crystals. The animation’s fluid motion brings the magic to life, with the frost spreading outward in intricate, sparkling patterns. The environment is painted with soft, watercolor-like hues, enhancing the magical, dreamlike atmosphere. The overall mood is serene yet powerful, with the quiet winter air amplifying the delicate beauty of the frozen scene.</td>
|
| 74 |
+
</tr>
|
| 75 |
+
<tr>
|
| 76 |
+
<td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_image_2.png?download=true" style="height: auto; width: 600px;"></td>
|
| 77 |
+
<td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_output_2.gif?download=true" style="height: auto; width: 2000px;"></td>
|
| 78 |
+
<td>The animation features a whimsical portrait of a balloon seller standing in a gentle breeze, captured with soft, hazy brushstrokes that evoke the feel of a serene spring day. His face is framed by a gentle smile, his eyes squinting slightly against the sun, while a few wisps of hair flutter in the wind. He is dressed in a light, pastel-colored shirt, and the balloons around him sway with the wind, adding a sense of playfulness to the scene. The background blurs softly, with hints of a vibrant market or park, enhancing the light-hearted, yet tender mood of the moment.</td>
|
| 79 |
+
</tr>
|
| 80 |
+
<tr>
|
| 81 |
+
<td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_image_3.png?download=true" style="height: auto; width: 600px;"></td>
|
| 82 |
+
<td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_output_3.gif?download=true" style="height: auto; width: 2000px;"></td>
|
| 83 |
+
<td>The video captures a boy walking along a city street, filmed in black and white on a classic 35mm camera. His expression is thoughtful, his brow slightly furrowed as if he's lost in contemplation. The film grain adds a textured, timeless quality to the image, evoking a sense of nostalgia. Around him, the cityscape is filled with vintage buildings, cobblestone sidewalks, and softly blurred figures passing by, their outlines faint and indistinct. Streetlights cast a gentle glow, while shadows play across the boy's path, adding depth to the scene. The lighting highlights the boy's subtle smile, hinting at a fleeting moment of curiosity. The overall cinematic atmosphere, complete with classic film still aesthetics and dramatic contrasts, gives the scene an evocative and introspective feel.</td>
|
| 84 |
+
</tr>
|
| 85 |
+
<tr>
|
| 86 |
+
<td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_image_4.png?download=true" style="height: auto; width: 600px;"></td>
|
| 87 |
+
<td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_output_4.gif?download=true" style="height: auto; width: 2000px;"></td>
|
| 88 |
+
<td>The video features a baby wearing a bright superhero cape, standing confidently with arms raised in a powerful pose. The baby has a determined look on their face, with eyes wide and lips pursed in concentration, as if ready to take on a challenge. The setting appears playful, with colorful toys scattered around and a soft rug underfoot, while sunlight streams through a nearby window, highlighting the fluttering cape and adding to the impression of heroism. The overall atmosphere is lighthearted and fun, with the baby's expressions capturing a mix of innocence and an adorable attempt at bravery, as if truly ready to save the day.</td>
|
| 89 |
+
</tr>
|
| 90 |
+
</table>
|
| 91 |
+
|
| 92 |
+
## Resources
|
| 93 |
+
|
| 94 |
+
Learn more about ConsisID with the following resources.
|
| 95 |
+
- A [video](https://www.youtube.com/watch?v=PhlgC-bI5SQ) demonstrating ConsisID's main features.
|
| 96 |
+
- The research paper, [Identity-Preserving Text-to-Video Generation by Frequency Decomposition](https://hf.co/papers/2411.17440) for more details.
|
diffusers/docs/source/en/using-diffusers/controlling_generation.md
ADDED
|
@@ -0,0 +1,217 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
| 2 |
+
|
| 3 |
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
| 4 |
+
the License. You may obtain a copy of the License at
|
| 5 |
+
|
| 6 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 7 |
+
|
| 8 |
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
| 9 |
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
| 10 |
+
specific language governing permissions and limitations under the License.
|
| 11 |
+
-->
|
| 12 |
+
|
| 13 |
+
# Controlled generation
|
| 14 |
+
|
| 15 |
+
Controlling outputs generated by diffusion models has been long pursued by the community and is now an active research topic. In many popular diffusion models, subtle changes in inputs, both images and text prompts, can drastically change outputs. In an ideal world we want to be able to control how semantics are preserved and changed.
|
| 16 |
+
|
| 17 |
+
Most examples of preserving semantics reduce to being able to accurately map a change in input to a change in output. I.e. adding an adjective to a subject in a prompt preserves the entire image, only modifying the changed subject. Or, image variation of a particular subject preserves the subject's pose.
|
| 18 |
+
|
| 19 |
+
Additionally, there are qualities of generated images that we would like to influence beyond semantic preservation. I.e. in general, we would like our outputs to be of good quality, adhere to a particular style, or be realistic.
|
| 20 |
+
|
| 21 |
+
We will document some of the techniques `diffusers` supports to control generation of diffusion models. Much is cutting edge research and can be quite nuanced. If something needs clarifying or you have a suggestion, don't hesitate to open a discussion on the [forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) or a [GitHub issue](https://github.com/huggingface/diffusers/issues).
|
| 22 |
+
|
| 23 |
+
We provide a high level explanation of how the generation can be controlled as well as a snippet of the technicals. For more in depth explanations on the technicals, the original papers which are linked from the pipelines are always the best resources.
|
| 24 |
+
|
| 25 |
+
Depending on the use case, one should choose a technique accordingly. In many cases, these techniques can be combined. For example, one can combine Textual Inversion with SEGA to provide more semantic guidance to the outputs generated using Textual Inversion.
|
| 26 |
+
|
| 27 |
+
Unless otherwise mentioned, these are techniques that work with existing models and don't require their own weights.
|
| 28 |
+
|
| 29 |
+
1. [InstructPix2Pix](#instruct-pix2pix)
|
| 30 |
+
2. [Pix2Pix Zero](#pix2pix-zero)
|
| 31 |
+
3. [Attend and Excite](#attend-and-excite)
|
| 32 |
+
4. [Semantic Guidance](#semantic-guidance-sega)
|
| 33 |
+
5. [Self-attention Guidance](#self-attention-guidance-sag)
|
| 34 |
+
6. [Depth2Image](#depth2image)
|
| 35 |
+
7. [MultiDiffusion Panorama](#multidiffusion-panorama)
|
| 36 |
+
8. [DreamBooth](#dreambooth)
|
| 37 |
+
9. [Textual Inversion](#textual-inversion)
|
| 38 |
+
10. [ControlNet](#controlnet)
|
| 39 |
+
11. [Prompt Weighting](#prompt-weighting)
|
| 40 |
+
12. [Custom Diffusion](#custom-diffusion)
|
| 41 |
+
13. [Model Editing](#model-editing)
|
| 42 |
+
14. [DiffEdit](#diffedit)
|
| 43 |
+
15. [T2I-Adapter](#t2i-adapter)
|
| 44 |
+
16. [FABRIC](#fabric)
|
| 45 |
+
|
| 46 |
+
For convenience, we provide a table to denote which methods are inference-only and which require fine-tuning/training.
|
| 47 |
+
|
| 48 |
+
| **Method** | **Inference only** | **Requires training /<br> fine-tuning** | **Comments** |
|
| 49 |
+
| :-------------------------------------------------: | :----------------: | :-------------------------------------: | :---------------------------------------------------------------------------------------------: |
|
| 50 |
+
| [InstructPix2Pix](#instruct-pix2pix) | ✅ | ❌ | Can additionally be<br>fine-tuned for better <br>performance on specific <br>edit instructions. |
|
| 51 |
+
| [Pix2Pix Zero](#pix2pix-zero) | ✅ | ❌ | |
|
| 52 |
+
| [Attend and Excite](#attend-and-excite) | ✅ | ❌ | |
|
| 53 |
+
| [Semantic Guidance](#semantic-guidance-sega) | ✅ | ❌ | |
|
| 54 |
+
| [Self-attention Guidance](#self-attention-guidance-sag) | ✅ | ❌ | |
|
| 55 |
+
| [Depth2Image](#depth2image) | ✅ | ❌ | |
|
| 56 |
+
| [MultiDiffusion Panorama](#multidiffusion-panorama) | ✅ | ❌ | |
|
| 57 |
+
| [DreamBooth](#dreambooth) | ❌ | ✅ | |
|
| 58 |
+
| [Textual Inversion](#textual-inversion) | ❌ | ✅ | |
|
| 59 |
+
| [ControlNet](#controlnet) | ✅ | ❌ | A ControlNet can be <br>trained/fine-tuned on<br>a custom conditioning. |
|
| 60 |
+
| [Prompt Weighting](#prompt-weighting) | ✅ | ❌ | |
|
| 61 |
+
| [Custom Diffusion](#custom-diffusion) | ❌ | ✅ | |
|
| 62 |
+
| [Model Editing](#model-editing) | ✅ | ❌ | |
|
| 63 |
+
| [DiffEdit](#diffedit) | ✅ | ❌ | |
|
| 64 |
+
| [T2I-Adapter](#t2i-adapter) | ✅ | ❌ | |
|
| 65 |
+
| [Fabric](#fabric) | ✅ | ❌ | |
|
| 66 |
+
## InstructPix2Pix
|
| 67 |
+
|
| 68 |
+
[Paper](https://huggingface.co/papers/2211.09800)
|
| 69 |
+
|
| 70 |
+
[InstructPix2Pix](../api/pipelines/pix2pix) is fine-tuned from Stable Diffusion to support editing input images. It takes as inputs an image and a prompt describing an edit, and it outputs the edited image.
|
| 71 |
+
InstructPix2Pix has been explicitly trained to work well with [InstructGPT](https://openai.com/blog/instruction-following/)-like prompts.
|
| 72 |
+
|
| 73 |
+
## Pix2Pix Zero
|
| 74 |
+
|
| 75 |
+
[Paper](https://huggingface.co/papers/2302.03027)
|
| 76 |
+
|
| 77 |
+
[Pix2Pix Zero](../api/pipelines/pix2pix_zero) allows modifying an image so that one concept or subject is translated to another one while preserving general image semantics.
|
| 78 |
+
|
| 79 |
+
The denoising process is guided from one conceptual embedding towards another conceptual embedding. The intermediate latents are optimized during the denoising process to push the attention maps towards reference attention maps. The reference attention maps are from the denoising process of the input image and are used to encourage semantic preservation.
|
| 80 |
+
|
| 81 |
+
Pix2Pix Zero can be used both to edit synthetic images as well as real images.
|
| 82 |
+
|
| 83 |
+
- To edit synthetic images, one first generates an image given a caption.
|
| 84 |
+
Next, we generate image captions for the concept that shall be edited and for the new target concept. We can use a model like [Flan-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5) for this purpose. Then, "mean" prompt embeddings for both the source and target concepts are created via the text encoder. Finally, the pix2pix-zero algorithm is used to edit the synthetic image.
|
| 85 |
+
- To edit a real image, one first generates an image caption using a model like [BLIP](https://huggingface.co/docs/transformers/model_doc/blip). Then one applies DDIM inversion on the prompt and image to generate "inverse" latents. Similar to before, "mean" prompt embeddings for both source and target concepts are created and finally the pix2pix-zero algorithm in combination with the "inverse" latents is used to edit the image.
|
| 86 |
+
|
| 87 |
+
<Tip>
|
| 88 |
+
|
| 89 |
+
Pix2Pix Zero is the first model that allows "zero-shot" image editing. This means that the model
|
| 90 |
+
can edit an image in less than a minute on a consumer GPU as shown [here](../api/pipelines/pix2pix_zero#usage-example).
|
| 91 |
+
|
| 92 |
+
</Tip>
|
| 93 |
+
|
| 94 |
+
As mentioned above, Pix2Pix Zero includes optimizing the latents (and not any of the UNet, VAE, or the text encoder) to steer the generation toward a specific concept. This means that the overall
|
| 95 |
+
pipeline might require more memory than a standard [StableDiffusionPipeline](../api/pipelines/stable_diffusion/text2img).
|
| 96 |
+
|
| 97 |
+
<Tip>
|
| 98 |
+
|
| 99 |
+
An important distinction between methods like InstructPix2Pix and Pix2Pix Zero is that the former
|
| 100 |
+
involves fine-tuning the pre-trained weights while the latter does not. This means that you can
|
| 101 |
+
apply Pix2Pix Zero to any of the available Stable Diffusion models.
|
| 102 |
+
|
| 103 |
+
</Tip>
|
| 104 |
+
|
| 105 |
+
## Attend and Excite
|
| 106 |
+
|
| 107 |
+
[Paper](https://huggingface.co/papers/2301.13826)
|
| 108 |
+
|
| 109 |
+
[Attend and Excite](../api/pipelines/attend_and_excite) allows subjects in the prompt to be faithfully represented in the final image.
|
| 110 |
+
|
| 111 |
+
A set of token indices are given as input, corresponding to the subjects in the prompt that need to be present in the image. During denoising, each token index is guaranteed to have a minimum attention threshold for at least one patch of the image. The intermediate latents are iteratively optimized during the denoising process to strengthen the attention of the most neglected subject token until the attention threshold is passed for all subject tokens.
|
| 112 |
+
|
| 113 |
+
Like Pix2Pix Zero, Attend and Excite also involves a mini optimization loop (leaving the pre-trained weights untouched) in its pipeline and can require more memory than the usual [StableDiffusionPipeline](../api/pipelines/stable_diffusion/text2img).
|
| 114 |
+
|
| 115 |
+
## Semantic Guidance (SEGA)
|
| 116 |
+
|
| 117 |
+
[Paper](https://huggingface.co/papers/2301.12247)
|
| 118 |
+
|
| 119 |
+
[SEGA](../api/pipelines/semantic_stable_diffusion) allows applying or removing one or more concepts from an image. The strength of the concept can also be controlled. I.e. the smile concept can be used to incrementally increase or decrease the smile of a portrait.
|
| 120 |
+
|
| 121 |
+
Similar to how classifier free guidance provides guidance via empty prompt inputs, SEGA provides guidance on conceptual prompts. Multiple of these conceptual prompts can be applied simultaneously. Each conceptual prompt can either add or remove their concept depending on if the guidance is applied positively or negatively.
|
| 122 |
+
|
| 123 |
+
Unlike Pix2Pix Zero or Attend and Excite, SEGA directly interacts with the diffusion process instead of performing any explicit gradient-based optimization.
|
| 124 |
+
|
| 125 |
+
## Self-attention Guidance (SAG)
|
| 126 |
+
|
| 127 |
+
[Paper](https://huggingface.co/papers/2210.00939)
|
| 128 |
+
|
| 129 |
+
[Self-attention Guidance](../api/pipelines/self_attention_guidance) improves the general quality of images.
|
| 130 |
+
|
| 131 |
+
SAG provides guidance from predictions not conditioned on high-frequency details to fully conditioned images. The high frequency details are extracted out of the UNet self-attention maps.
|
| 132 |
+
|
| 133 |
+
## Depth2Image
|
| 134 |
+
|
| 135 |
+
[Project](https://huggingface.co/stabilityai/stable-diffusion-2-depth)
|
| 136 |
+
|
| 137 |
+
[Depth2Image](../api/pipelines/stable_diffusion/depth2img) is fine-tuned from Stable Diffusion to better preserve semantics for text guided image variation.
|
| 138 |
+
|
| 139 |
+
It conditions on a monocular depth estimate of the original image.
|
| 140 |
+
|
| 141 |
+
## MultiDiffusion Panorama
|
| 142 |
+
|
| 143 |
+
[Paper](https://huggingface.co/papers/2302.08113)
|
| 144 |
+
|
| 145 |
+
[MultiDiffusion Panorama](../api/pipelines/panorama) defines a new generation process over a pre-trained diffusion model. This process binds together multiple diffusion generation methods that can be readily applied to generate high quality and diverse images. Results adhere to user-provided controls, such as desired aspect ratio (e.g., panorama), and spatial guiding signals, ranging from tight segmentation masks to bounding boxes.
|
| 146 |
+
MultiDiffusion Panorama allows to generate high-quality images at arbitrary aspect ratios (e.g., panoramas).
|
| 147 |
+
|
| 148 |
+
## Fine-tuning your own models
|
| 149 |
+
|
| 150 |
+
In addition to pre-trained models, Diffusers has training scripts for fine-tuning models on user-provided data.
|
| 151 |
+
|
| 152 |
+
## DreamBooth
|
| 153 |
+
|
| 154 |
+
[Project](https://dreambooth.github.io/)
|
| 155 |
+
|
| 156 |
+
[DreamBooth](../training/dreambooth) fine-tunes a model to teach it about a new subject. I.e. a few pictures of a person can be used to generate images of that person in different styles.
|
| 157 |
+
|
| 158 |
+
## Textual Inversion
|
| 159 |
+
|
| 160 |
+
[Paper](https://huggingface.co/papers/2208.01618)
|
| 161 |
+
|
| 162 |
+
[Textual Inversion](../training/text_inversion) fine-tunes a model to teach it about a new concept. I.e. a few pictures of a style of artwork can be used to generate images in that style.
|
| 163 |
+
|
| 164 |
+
## ControlNet
|
| 165 |
+
|
| 166 |
+
[Paper](https://huggingface.co/papers/2302.05543)
|
| 167 |
+
|
| 168 |
+
[ControlNet](../api/pipelines/controlnet) is an auxiliary network which adds an extra condition.
|
| 169 |
+
There are 8 canonical pre-trained ControlNets trained on different conditionings such as edge detection, scribbles,
|
| 170 |
+
depth maps, and semantic segmentations.
|
| 171 |
+
|
| 172 |
+
## Prompt Weighting
|
| 173 |
+
|
| 174 |
+
[Prompt weighting](../using-diffusers/weighted_prompts) is a simple technique that puts more attention weight on certain parts of the text
|
| 175 |
+
input.
|
| 176 |
+
|
| 177 |
+
## Custom Diffusion
|
| 178 |
+
|
| 179 |
+
[Paper](https://huggingface.co/papers/2212.04488)
|
| 180 |
+
|
| 181 |
+
[Custom Diffusion](../training/custom_diffusion) only fine-tunes the cross-attention maps of a pre-trained
|
| 182 |
+
text-to-image diffusion model. It also allows for additionally performing Textual Inversion. It supports
|
| 183 |
+
multi-concept training by design. Like DreamBooth and Textual Inversion, Custom Diffusion is also used to
|
| 184 |
+
teach a pre-trained text-to-image diffusion model about new concepts to generate outputs involving the
|
| 185 |
+
concept(s) of interest.
|
| 186 |
+
|
| 187 |
+
## Model Editing
|
| 188 |
+
|
| 189 |
+
[Paper](https://huggingface.co/papers/2303.08084)
|
| 190 |
+
|
| 191 |
+
The [text-to-image model editing pipeline](../api/pipelines/model_editing) helps you mitigate some of the incorrect implicit assumptions a pre-trained text-to-image
|
| 192 |
+
diffusion model might make about the subjects present in the input prompt. For example, if you prompt Stable Diffusion to generate images for "A pack of roses", the roses in the generated images
|
| 193 |
+
are more likely to be red. This pipeline helps you change that assumption.
|
| 194 |
+
|
| 195 |
+
## DiffEdit
|
| 196 |
+
|
| 197 |
+
[Paper](https://huggingface.co/papers/2210.11427)
|
| 198 |
+
|
| 199 |
+
[DiffEdit](../api/pipelines/diffedit) allows for semantic editing of input images along with
|
| 200 |
+
input prompts while preserving the original input images as much as possible.
|
| 201 |
+
|
| 202 |
+
## T2I-Adapter
|
| 203 |
+
|
| 204 |
+
[Paper](https://huggingface.co/papers/2302.08453)
|
| 205 |
+
|
| 206 |
+
[T2I-Adapter](../api/pipelines/stable_diffusion/adapter) is an auxiliary network which adds an extra condition.
|
| 207 |
+
There are 8 canonical pre-trained adapters trained on different conditionings such as edge detection, sketch,
|
| 208 |
+
depth maps, and semantic segmentations.
|
| 209 |
+
|
| 210 |
+
## Fabric
|
| 211 |
+
|
| 212 |
+
[Paper](https://huggingface.co/papers/2307.10159)
|
| 213 |
+
|
| 214 |
+
[Fabric](https://github.com/huggingface/diffusers/tree/442017ccc877279bcf24fbe92f92d3d0def191b6/examples/community#stable-diffusion-fabric-pipeline) is a training-free
|
| 215 |
+
approach applicable to a wide range of popular diffusion models, which exploits
|
| 216 |
+
the self-attention layer present in the most widely used architectures to condition
|
| 217 |
+
the diffusion process on a set of feedback images.
|
diffusers/docs/source/en/using-diffusers/depth2img.md
ADDED
|
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
| 2 |
+
|
| 3 |
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
| 4 |
+
the License. You may obtain a copy of the License at
|
| 5 |
+
|
| 6 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 7 |
+
|
| 8 |
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
| 9 |
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
| 10 |
+
specific language governing permissions and limitations under the License.
|
| 11 |
+
-->
|
| 12 |
+
|
| 13 |
+
# Text-guided depth-to-image generation
|
| 14 |
+
|
| 15 |
+
[[open-in-colab]]
|
| 16 |
+
|
| 17 |
+
The [`StableDiffusionDepth2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images. In addition, you can also pass a `depth_map` to preserve the image structure. If no `depth_map` is provided, the pipeline automatically predicts the depth via an integrated [depth-estimation model](https://github.com/isl-org/MiDaS).
|
| 18 |
+
|
| 19 |
+
Start by creating an instance of the [`StableDiffusionDepth2ImgPipeline`]:
|
| 20 |
+
|
| 21 |
+
```python
|
| 22 |
+
import torch
|
| 23 |
+
from diffusers import StableDiffusionDepth2ImgPipeline
|
| 24 |
+
from diffusers.utils import load_image, make_image_grid
|
| 25 |
+
|
| 26 |
+
pipeline = StableDiffusionDepth2ImgPipeline.from_pretrained(
|
| 27 |
+
"stabilityai/stable-diffusion-2-depth",
|
| 28 |
+
torch_dtype=torch.float16,
|
| 29 |
+
use_safetensors=True,
|
| 30 |
+
).to("cuda")
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
Now pass your prompt to the pipeline. You can also pass a `negative_prompt` to prevent certain words from guiding how an image is generated:
|
| 34 |
+
|
| 35 |
+
```python
|
| 36 |
+
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
|
| 37 |
+
init_image = load_image(url)
|
| 38 |
+
prompt = "two tigers"
|
| 39 |
+
negative_prompt = "bad, deformed, ugly, bad anatomy"
|
| 40 |
+
image = pipeline(prompt=prompt, image=init_image, negative_prompt=negative_prompt, strength=0.7).images[0]
|
| 41 |
+
make_image_grid([init_image, image], rows=1, cols=2)
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
| Input | Output |
|
| 45 |
+
|---------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
|
| 46 |
+
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/coco-cats.png" width="500"/> | <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/depth2img-tigers.png" width="500"/> |
|
diffusers/docs/source/en/using-diffusers/ip_adapter.md
ADDED
|
@@ -0,0 +1,790 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
| 2 |
+
|
| 3 |
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
| 4 |
+
the License. You may obtain a copy of the License at
|
| 5 |
+
|
| 6 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 7 |
+
|
| 8 |
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
| 9 |
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
| 10 |
+
specific language governing permissions and limitations under the License.
|
| 11 |
+
-->
|
| 12 |
+
|
| 13 |
+
# IP-Adapter
|
| 14 |
+
|
| 15 |
+
[IP-Adapter](https://huggingface.co/papers/2308.06721) is a lightweight adapter designed to integrate image-based guidance with text-to-image diffusion models. The adapter uses an image encoder to extract image features that are passed to the newly added cross-attention layers in the UNet and fine-tuned. The original UNet model and the existing cross-attention layers corresponding to text features is frozen. Decoupling the cross-attention for image and text features enables more fine-grained and controllable generation.
|
| 16 |
+
|
| 17 |
+
IP-Adapter files are typically ~100MBs because they only contain the image embeddings. This means you need to load a model first, and then load the IP-Adapter with [`~loaders.IPAdapterMixin.load_ip_adapter`].
|
| 18 |
+
|
| 19 |
+
> [!TIP]
|
| 20 |
+
> IP-Adapters are available to many models such as [Flux](../api/pipelines/flux#ip-adapter) and [Stable Diffusion 3](../api/pipelines/stable_diffusion/stable_diffusion_3), and more. The examples in this guide use Stable Diffusion and Stable Diffusion XL.
|
| 21 |
+
|
| 22 |
+
Use the [`~loaders.IPAdapterMixin.set_ip_adapter_scale`] parameter to scale the influence of the IP-Adapter during generation. A value of `1.0` means the model is only conditioned on the image prompt, and `0.5` typically produces balanced results between the text and image prompt.
|
| 23 |
+
|
| 24 |
+
```py
|
| 25 |
+
import torch
|
| 26 |
+
from diffusers import AutoPipelineForText2Image
|
| 27 |
+
from diffusers.utils import load_image
|
| 28 |
+
|
| 29 |
+
pipeline = AutoPipelineForText2Image.from_pretrained(
|
| 30 |
+
"stabilityai/stable-diffusion-xl-base-1.0",
|
| 31 |
+
torch_dtype=torch.float16
|
| 32 |
+
).to("cuda")
|
| 33 |
+
pipeline.load_ip_adapter(
|
| 34 |
+
"h94/IP-Adapter",
|
| 35 |
+
subfolder="sdxl_models",
|
| 36 |
+
weight_name="ip-adapter_sdxl.bin"
|
| 37 |
+
)
|
| 38 |
+
pipeline.set_ip_adapter_scale(0.8)
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
Pass an image to `ip_adapter_image` along with a text prompt to generate an image.
|
| 42 |
+
|
| 43 |
+
```py
|
| 44 |
+
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_diner.png")
|
| 45 |
+
pipeline(
|
| 46 |
+
prompt="a polar bear sitting in a chair drinking a milkshake",
|
| 47 |
+
ip_adapter_image=image,
|
| 48 |
+
negative_prompt="deformed, ugly, wrong proportion, low res, bad anatomy, worst quality, low quality",
|
| 49 |
+
).images[0]
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
|
| 53 |
+
<figure>
|
| 54 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_diner.png" width="400" alt="IP-Adapter image"/>
|
| 55 |
+
<figcaption style="text-align: center;">IP-Adapter image</figcaption>
|
| 56 |
+
</figure>
|
| 57 |
+
<figure>
|
| 58 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_diner_2.png" width="400" alt="generated image"/>
|
| 59 |
+
<figcaption style="text-align: center;">generated image</figcaption>
|
| 60 |
+
</figure>
|
| 61 |
+
</div>
|
| 62 |
+
|
| 63 |
+
Take a look at the examples below to learn how to use IP-Adapter for other tasks.
|
| 64 |
+
|
| 65 |
+
<hfoptions id="usage">
|
| 66 |
+
<hfoption id="image-to-image">
|
| 67 |
+
|
| 68 |
+
```py
|
| 69 |
+
import torch
|
| 70 |
+
from diffusers import AutoPipelineForImage2Image
|
| 71 |
+
from diffusers.utils import load_image
|
| 72 |
+
|
| 73 |
+
pipeline = AutoPipelineForImage2Image.from_pretrained(
|
| 74 |
+
"stabilityai/stable-diffusion-xl-base-1.0",
|
| 75 |
+
torch_dtype=torch.float16
|
| 76 |
+
).to("cuda")
|
| 77 |
+
pipeline.load_ip_adapter(
|
| 78 |
+
"h94/IP-Adapter",
|
| 79 |
+
subfolder="sdxl_models",
|
| 80 |
+
weight_name="ip-adapter_sdxl.bin"
|
| 81 |
+
)
|
| 82 |
+
pipeline.set_ip_adapter_scale(0.8)
|
| 83 |
+
|
| 84 |
+
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_bear_1.png")
|
| 85 |
+
ip_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_gummy.png")
|
| 86 |
+
pipeline(
|
| 87 |
+
prompt="best quality, high quality",
|
| 88 |
+
image=image,
|
| 89 |
+
ip_adapter_image=ip_image,
|
| 90 |
+
strength=0.5,
|
| 91 |
+
).images[0]
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
|
| 95 |
+
<figure>
|
| 96 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_bear_1.png" width="300" alt="input image"/>
|
| 97 |
+
<figcaption style="text-align: center;">input image</figcaption>
|
| 98 |
+
</figure>
|
| 99 |
+
<figure>
|
| 100 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_gummy.png" width="300" alt="IP-Adapter image"/>
|
| 101 |
+
<figcaption style="text-align: center;">IP-Adapter image</figcaption>
|
| 102 |
+
</figure>
|
| 103 |
+
<figure>
|
| 104 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_bear_3.png" width="300" alt="generated image"/>
|
| 105 |
+
<figcaption style="text-align: center;">generated image</figcaption>
|
| 106 |
+
</figure>
|
| 107 |
+
</div>
|
| 108 |
+
|
| 109 |
+
</hfoption>
|
| 110 |
+
<hfoption id="inpainting">
|
| 111 |
+
|
| 112 |
+
```py
|
| 113 |
+
import torch
|
| 114 |
+
from diffusers import AutoPipelineForImage2Image
|
| 115 |
+
from diffusers.utils import load_image
|
| 116 |
+
|
| 117 |
+
pipeline = AutoPipelineForImage2Image.from_pretrained(
|
| 118 |
+
"stabilityai/stable-diffusion-xl-base-1.0",
|
| 119 |
+
torch_dtype=torch.float16
|
| 120 |
+
).to("cuda")
|
| 121 |
+
pipeline.load_ip_adapter(
|
| 122 |
+
"h94/IP-Adapter",
|
| 123 |
+
subfolder="sdxl_models",
|
| 124 |
+
weight_name="ip-adapter_sdxl.bin"
|
| 125 |
+
)
|
| 126 |
+
pipeline.set_ip_adapter_scale(0.6)
|
| 127 |
+
|
| 128 |
+
mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_mask.png")
|
| 129 |
+
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_bear_1.png")
|
| 130 |
+
ip_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_gummy.png")
|
| 131 |
+
pipeline(
|
| 132 |
+
prompt="a cute gummy bear waving",
|
| 133 |
+
image=image,
|
| 134 |
+
mask_image=mask_image,
|
| 135 |
+
ip_adapter_image=ip_image,
|
| 136 |
+
).images[0]
|
| 137 |
+
```
|
| 138 |
+
|
| 139 |
+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
|
| 140 |
+
<figure>
|
| 141 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_bear_1.png" width="300" alt="input image"/>
|
| 142 |
+
<figcaption style="text-align: center;">input image</figcaption>
|
| 143 |
+
</figure>
|
| 144 |
+
<figure>
|
| 145 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_gummy.png" width="300" alt="IP-Adapter image"/>
|
| 146 |
+
<figcaption style="text-align: center;">IP-Adapter image</figcaption>
|
| 147 |
+
</figure>
|
| 148 |
+
<figure>
|
| 149 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_inpaint.png" width="300" alt="generated image"/>
|
| 150 |
+
<figcaption style="text-align: center;">generated image</figcaption>
|
| 151 |
+
</figure>
|
| 152 |
+
</div>
|
| 153 |
+
|
| 154 |
+
</hfoption>
|
| 155 |
+
<hfoption id="video">
|
| 156 |
+
|
| 157 |
+
The [`~DiffusionPipeline.enable_model_cpu_offload`] method is useful for reducing memory and it should be enabled **after** the IP-Adapter is loaded. Otherwise, the IP-Adapter's image encoder is also offloaded to the CPU and returns an error.
|
| 158 |
+
|
| 159 |
+
```py
|
| 160 |
+
import torch
|
| 161 |
+
from diffusers import AnimateDiffPipeline, DDIMScheduler, MotionAdapter
|
| 162 |
+
from diffusers.utils import export_to_gif
|
| 163 |
+
from diffusers.utils import load_image
|
| 164 |
+
|
| 165 |
+
adapter = MotionAdapter.from_pretrained(
|
| 166 |
+
"guoyww/animatediff-motion-adapter-v1-5-2",
|
| 167 |
+
torch_dtype=torch.float16
|
| 168 |
+
)
|
| 169 |
+
pipeline = AnimateDiffPipeline.from_pretrained(
|
| 170 |
+
"emilianJR/epiCRealism",
|
| 171 |
+
motion_adapter=adapter,
|
| 172 |
+
torch_dtype=torch.float16
|
| 173 |
+
)
|
| 174 |
+
scheduler = DDIMScheduler.from_pretrained(
|
| 175 |
+
"emilianJR/epiCRealism",
|
| 176 |
+
subfolder="scheduler",
|
| 177 |
+
clip_sample=False,
|
| 178 |
+
timestep_spacing="linspace",
|
| 179 |
+
beta_schedule="linear",
|
| 180 |
+
steps_offset=1,
|
| 181 |
+
)
|
| 182 |
+
pipeline.scheduler = scheduler
|
| 183 |
+
pipeline.enable_vae_slicing()
|
| 184 |
+
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
|
| 185 |
+
pipeline.enable_model_cpu_offload()
|
| 186 |
+
|
| 187 |
+
ip_adapter_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_inpaint.png")
|
| 188 |
+
pipeline(
|
| 189 |
+
prompt="A cute gummy bear waving",
|
| 190 |
+
negative_prompt="bad quality, worse quality, low resolution",
|
| 191 |
+
ip_adapter_image=ip_adapter_image,
|
| 192 |
+
num_frames=16,
|
| 193 |
+
guidance_scale=7.5,
|
| 194 |
+
num_inference_steps=50,
|
| 195 |
+
).frames[0]
|
| 196 |
+
```
|
| 197 |
+
|
| 198 |
+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
|
| 199 |
+
<figure>
|
| 200 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_inpaint.png" width="400" alt="IP-Adapter image"/>
|
| 201 |
+
<figcaption style="text-align: center;">IP-Adapter image</figcaption>
|
| 202 |
+
</figure>
|
| 203 |
+
<figure>
|
| 204 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/gummy_bear.gif" width="400" alt="generated video"/>
|
| 205 |
+
<figcaption style="text-align: center;">generated video</figcaption>
|
| 206 |
+
</figure>
|
| 207 |
+
</div>
|
| 208 |
+
|
| 209 |
+
</hfoption>
|
| 210 |
+
</hfoptions>
|
| 211 |
+
|
| 212 |
+
## Model variants
|
| 213 |
+
|
| 214 |
+
There are two variants of IP-Adapter, Plus and FaceID. The Plus variant uses patch embeddings and the ViT-H image encoder. FaceID variant uses face embeddings generated from InsightFace.
|
| 215 |
+
|
| 216 |
+
<hfoptions id="ipadapter-variants">
|
| 217 |
+
<hfoption id="IP-Adapter Plus">
|
| 218 |
+
|
| 219 |
+
```py
|
| 220 |
+
import torch
|
| 221 |
+
from transformers import CLIPVisionModelWithProjection, AutoPipelineForText2Image
|
| 222 |
+
|
| 223 |
+
image_encoder = CLIPVisionModelWithProjection.from_pretrained(
|
| 224 |
+
"h94/IP-Adapter",
|
| 225 |
+
subfolder="models/image_encoder",
|
| 226 |
+
torch_dtype=torch.float16
|
| 227 |
+
)
|
| 228 |
+
|
| 229 |
+
pipeline = AutoPipelineForText2Image.from_pretrained(
|
| 230 |
+
"stabilityai/stable-diffusion-xl-base-1.0",
|
| 231 |
+
image_encoder=image_encoder,
|
| 232 |
+
torch_dtype=torch.float16
|
| 233 |
+
).to("cuda")
|
| 234 |
+
|
| 235 |
+
pipeline.load_ip_adapter(
|
| 236 |
+
"h94/IP-Adapter",
|
| 237 |
+
subfolder="sdxl_models",
|
| 238 |
+
weight_name="ip-adapter-plus_sdxl_vit-h.safetensors"
|
| 239 |
+
)
|
| 240 |
+
```
|
| 241 |
+
|
| 242 |
+
</hfoption>
|
| 243 |
+
<hfoption id="IP-Adapter FaceID">
|
| 244 |
+
|
| 245 |
+
```py
|
| 246 |
+
import torch
|
| 247 |
+
from transformers import AutoPipelineForText2Image
|
| 248 |
+
|
| 249 |
+
pipeline = AutoPipelineForText2Image.from_pretrained(
|
| 250 |
+
"stabilityai/stable-diffusion-xl-base-1.0",
|
| 251 |
+
torch_dtype=torch.float16
|
| 252 |
+
).to("cuda")
|
| 253 |
+
|
| 254 |
+
pipeline.load_ip_adapter(
|
| 255 |
+
"h94/IP-Adapter-FaceID",
|
| 256 |
+
subfolder=None,
|
| 257 |
+
weight_name="ip-adapter-faceid_sdxl.bin",
|
| 258 |
+
image_encoder_folder=None
|
| 259 |
+
)
|
| 260 |
+
```
|
| 261 |
+
|
| 262 |
+
To use a IP-Adapter FaceID Plus model, load the CLIP image encoder as well as [`~transformers.CLIPVisionModelWithProjection`].
|
| 263 |
+
|
| 264 |
+
```py
|
| 265 |
+
from transformers import AutoPipelineForText2Image, CLIPVisionModelWithProjection
|
| 266 |
+
|
| 267 |
+
image_encoder = CLIPVisionModelWithProjection.from_pretrained(
|
| 268 |
+
"laion/CLIP-ViT-H-14-laion2B-s32B-b79K",
|
| 269 |
+
torch_dtype=torch.float16,
|
| 270 |
+
)
|
| 271 |
+
|
| 272 |
+
pipeline = AutoPipelineForText2Image.from_pretrained(
|
| 273 |
+
"stable-diffusion-v1-5/stable-diffusion-v1-5",
|
| 274 |
+
image_encoder=image_encoder,
|
| 275 |
+
torch_dtype=torch.float16
|
| 276 |
+
).to("cuda")
|
| 277 |
+
|
| 278 |
+
pipeline.load_ip_adapter(
|
| 279 |
+
"h94/IP-Adapter-FaceID",
|
| 280 |
+
subfolder=None,
|
| 281 |
+
weight_name="ip-adapter-faceid-plus_sd15.bin"
|
| 282 |
+
)
|
| 283 |
+
```
|
| 284 |
+
|
| 285 |
+
</hfoption>
|
| 286 |
+
</hfoptions>
|
| 287 |
+
|
| 288 |
+
## Image embeddings
|
| 289 |
+
|
| 290 |
+
The `prepare_ip_adapter_image_embeds` generates image embeddings you can reuse if you're running the pipeline multiple times because you have more than one image. Loading and encoding multiple images each time you use the pipeline can be inefficient. Precomputing the image embeddings ahead of time, saving them to disk, and loading them when you need them is more efficient.
|
| 291 |
+
|
| 292 |
+
```py
|
| 293 |
+
import torch
|
| 294 |
+
from diffusers import AutoPipelineForText2Image
|
| 295 |
+
|
| 296 |
+
pipeline = AutoPipelineForImage2Image.from_pretrained(
|
| 297 |
+
"stabilityai/stable-diffusion-xl-base-1.0",
|
| 298 |
+
torch_dtype=torch.float16
|
| 299 |
+
).to("cuda")
|
| 300 |
+
|
| 301 |
+
image_embeds = pipeline.prepare_ip_adapter_image_embeds(
|
| 302 |
+
ip_adapter_image=image,
|
| 303 |
+
ip_adapter_image_embeds=None,
|
| 304 |
+
device="cuda",
|
| 305 |
+
num_images_per_prompt=1,
|
| 306 |
+
do_classifier_free_guidance=True,
|
| 307 |
+
)
|
| 308 |
+
|
| 309 |
+
torch.save(image_embeds, "image_embeds.ipadpt")
|
| 310 |
+
```
|
| 311 |
+
|
| 312 |
+
Reload the image embeddings by passing them to the `ip_adapter_image_embeds` parameter. Set `image_encoder_folder` to `None` because you don't need the image encoder anymore to generate the image embeddings.
|
| 313 |
+
|
| 314 |
+
> [!TIP]
|
| 315 |
+
> You can also load image embeddings from other sources such as ComfyUI.
|
| 316 |
+
|
| 317 |
+
```py
|
| 318 |
+
pipeline.load_ip_adapter(
|
| 319 |
+
"h94/IP-Adapter",
|
| 320 |
+
subfolder="sdxl_models",
|
| 321 |
+
image_encoder_folder=None,
|
| 322 |
+
weight_name="ip-adapter_sdxl.bin"
|
| 323 |
+
)
|
| 324 |
+
pipeline.set_ip_adapter_scale(0.8)
|
| 325 |
+
image_embeds = torch.load("image_embeds.ipadpt")
|
| 326 |
+
pipeline(
|
| 327 |
+
prompt="a polar bear sitting in a chair drinking a milkshake",
|
| 328 |
+
ip_adapter_image_embeds=image_embeds,
|
| 329 |
+
negative_prompt="deformed, ugly, wrong proportion, low res, bad anatomy, worst quality, low quality",
|
| 330 |
+
num_inference_steps=100,
|
| 331 |
+
generator=generator,
|
| 332 |
+
).images[0]
|
| 333 |
+
```
|
| 334 |
+
|
| 335 |
+
## Masking
|
| 336 |
+
|
| 337 |
+
Binary masking enables assigning an IP-Adapter image to a specific area of the output image, making it useful for composing multiple IP-Adapter images. Each IP-Adapter image requires a binary mask.
|
| 338 |
+
|
| 339 |
+
Load the [`~image_processor.IPAdapterMaskProcessor`] to preprocess the image masks. For the best results, provide the output `height` and `width` to ensure masks with different aspect ratios are appropriately sized. If the input masks already match the aspect ratio of the generated image, you don't need to set the `height` and `width`.
|
| 340 |
+
|
| 341 |
+
```py
|
| 342 |
+
import torch
|
| 343 |
+
from diffusers import AutoPipelineForText2Image
|
| 344 |
+
from diffusers.image_processor import IPAdapterMaskProcessor
|
| 345 |
+
from diffusers.utils import load_image
|
| 346 |
+
|
| 347 |
+
pipeline = AutoPipelineForImage2Image.from_pretrained(
|
| 348 |
+
"stabilityai/stable-diffusion-xl-base-1.0",
|
| 349 |
+
torch_dtype=torch.float16
|
| 350 |
+
).to("cuda")
|
| 351 |
+
|
| 352 |
+
mask1 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_mask1.png")
|
| 353 |
+
mask2 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_mask2.png")
|
| 354 |
+
|
| 355 |
+
processor = IPAdapterMaskProcessor()
|
| 356 |
+
masks = processor.preprocess([mask1, mask2], height=1024, width=1024)
|
| 357 |
+
```
|
| 358 |
+
|
| 359 |
+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
|
| 360 |
+
<figure>
|
| 361 |
+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_mask_mask1.png" width="200" alt="mask 1"/>
|
| 362 |
+
<figcaption style="text-align: center;">mask 1</figcaption>
|
| 363 |
+
</figure>
|
| 364 |
+
<figure>
|
| 365 |
+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_mask_mask2.png" width="200" alt="mask 2"/>
|
| 366 |
+
<figcaption style="text-align: center;">mask 2</figcaption>
|
| 367 |
+
</figure>
|
| 368 |
+
</div>
|
| 369 |
+
|
| 370 |
+
Provide both the IP-Adapter images and their scales as a list. Pass the preprocessed masks to `cross_attention_kwargs` in the pipeline.
|
| 371 |
+
|
| 372 |
+
```py
|
| 373 |
+
face_image1 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl1.png")
|
| 374 |
+
face_image2 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl2.png")
|
| 375 |
+
|
| 376 |
+
pipeline.load_ip_adapter(
|
| 377 |
+
"h94/IP-Adapter",
|
| 378 |
+
subfolder="sdxl_models",
|
| 379 |
+
weight_name=["ip-adapter-plus-face_sdxl_vit-h.safetensors"]
|
| 380 |
+
)
|
| 381 |
+
pipeline.set_ip_adapter_scale([[0.7, 0.7]])
|
| 382 |
+
|
| 383 |
+
ip_images = [[face_image1, face_image2]]
|
| 384 |
+
masks = [masks.reshape(1, masks.shape[0], masks.shape[2], masks.shape[3])]
|
| 385 |
+
|
| 386 |
+
pipeline(
|
| 387 |
+
prompt="2 girls",
|
| 388 |
+
ip_adapter_image=ip_images,
|
| 389 |
+
negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
|
| 390 |
+
cross_attention_kwargs={"ip_adapter_masks": masks}
|
| 391 |
+
).images[0]
|
| 392 |
+
```
|
| 393 |
+
|
| 394 |
+
<div style="display: flex; flex-direction: column; gap: 10px;">
|
| 395 |
+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
|
| 396 |
+
<figure>
|
| 397 |
+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_mask_girl1.png" width="400" alt="IP-Adapter image 1"/>
|
| 398 |
+
<figcaption style="text-align: center;">IP-Adapter image 1</figcaption>
|
| 399 |
+
</figure>
|
| 400 |
+
<figure>
|
| 401 |
+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_mask_girl2.png" width="400" alt="IP-Adapter image 2"/>
|
| 402 |
+
<figcaption style="text-align: center;">IP-Adapter image 2</figcaption>
|
| 403 |
+
</figure>
|
| 404 |
+
</div>
|
| 405 |
+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
|
| 406 |
+
<figure>
|
| 407 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_attention_mask_result_seed_0.png" width="400" alt="Generated image with mask"/>
|
| 408 |
+
<figcaption style="text-align: center;">generated with mask</figcaption>
|
| 409 |
+
</figure>
|
| 410 |
+
<figure>
|
| 411 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_no_attention_mask_result_seed_0.png" width="400" alt="Generated image without mask"/>
|
| 412 |
+
<figcaption style="text-align: center;">generated without mask</figcaption>
|
| 413 |
+
</figure>
|
| 414 |
+
</div>
|
| 415 |
+
</div>
|
| 416 |
+
|
| 417 |
+
## Applications
|
| 418 |
+
|
| 419 |
+
The section below covers some popular applications of IP-Adapter.
|
| 420 |
+
|
| 421 |
+
### Face models
|
| 422 |
+
|
| 423 |
+
Face generation and preserving its details can be challenging. To help generate more accurate faces, there are checkpoints specifically conditioned on images of cropped faces. You can find the face models in the [h94/IP-Adapter](https://huggingface.co/h94/IP-Adapter) repository or the [h94/IP-Adapter-FaceID](https://huggingface.co/h94/IP-Adapter-FaceID) repository. The FaceID checkpoints use the FaceID embeddings from [InsightFace](https://github.com/deepinsight/insightface) instead of CLIP image embeddings.
|
| 424 |
+
|
| 425 |
+
We recommend using the [`DDIMScheduler`] or [`EulerDiscreteScheduler`] for face models.
|
| 426 |
+
|
| 427 |
+
<hfoptions id="usage">
|
| 428 |
+
<hfoption id="h94/IP-Adapter">
|
| 429 |
+
|
| 430 |
+
```py
|
| 431 |
+
import torch
|
| 432 |
+
from diffusers import StableDiffusionPipeline, DDIMScheduler
|
| 433 |
+
from diffusers.utils import load_image
|
| 434 |
+
|
| 435 |
+
pipeline = StableDiffusionPipeline.from_pretrained(
|
| 436 |
+
"stable-diffusion-v1-5/stable-diffusion-v1-5",
|
| 437 |
+
torch_dtype=torch.float16,
|
| 438 |
+
).to("cuda")
|
| 439 |
+
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
|
| 440 |
+
pipeline.load_ip_adapter(
|
| 441 |
+
"h94/IP-Adapter",
|
| 442 |
+
subfolder="models",
|
| 443 |
+
weight_name="ip-adapter-full-face_sd15.bin"
|
| 444 |
+
)
|
| 445 |
+
|
| 446 |
+
pipeline.set_ip_adapter_scale(0.5)
|
| 447 |
+
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_einstein_base.png")
|
| 448 |
+
|
| 449 |
+
pipeline(
|
| 450 |
+
prompt="A photo of Einstein as a chef, wearing an apron, cooking in a French restaurant",
|
| 451 |
+
ip_adapter_image=image,
|
| 452 |
+
negative_prompt="lowres, bad anatomy, worst quality, low quality",
|
| 453 |
+
num_inference_steps=100,
|
| 454 |
+
).images[0]
|
| 455 |
+
```
|
| 456 |
+
|
| 457 |
+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
|
| 458 |
+
<figure>
|
| 459 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_einstein_base.png" width="400" alt="IP-Adapter image"/>
|
| 460 |
+
<figcaption style="text-align: center;">IP-Adapter image</figcaption>
|
| 461 |
+
</figure>
|
| 462 |
+
<figure>
|
| 463 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_einstein.png" width="400" alt="generated image"/>
|
| 464 |
+
<figcaption style="text-align: center;">generated image</figcaption>
|
| 465 |
+
</figure>
|
| 466 |
+
</div>
|
| 467 |
+
|
| 468 |
+
</hfoption>
|
| 469 |
+
<hfoption id="h94/IP-Adapter-FaceID">
|
| 470 |
+
|
| 471 |
+
For FaceID models, extract the face embeddings and pass them as a list of tensors to `ip_adapter_image_embeds`.
|
| 472 |
+
|
| 473 |
+
```py
|
| 474 |
+
# pip install insightface
|
| 475 |
+
import torch
|
| 476 |
+
from diffusers import StableDiffusionPipeline, DDIMScheduler
|
| 477 |
+
from diffusers.utils import load_image
|
| 478 |
+
from insightface.app import FaceAnalysis
|
| 479 |
+
|
| 480 |
+
pipeline = StableDiffusionPipeline.from_pretrained(
|
| 481 |
+
"stable-diffusion-v1-5/stable-diffusion-v1-5",
|
| 482 |
+
torch_dtype=torch.float16,
|
| 483 |
+
).to("cuda")
|
| 484 |
+
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
|
| 485 |
+
pipeline.load_ip_adapter(
|
| 486 |
+
"h94/IP-Adapter-FaceID",
|
| 487 |
+
subfolder=None,
|
| 488 |
+
weight_name="ip-adapter-faceid_sd15.bin",
|
| 489 |
+
image_encoder_folder=None
|
| 490 |
+
)
|
| 491 |
+
pipeline.set_ip_adapter_scale(0.6)
|
| 492 |
+
|
| 493 |
+
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl1.png")
|
| 494 |
+
|
| 495 |
+
ref_images_embeds = []
|
| 496 |
+
app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
|
| 497 |
+
app.prepare(ctx_id=0, det_size=(640, 640))
|
| 498 |
+
image = cv2.cvtColor(np.asarray(image), cv2.COLOR_BGR2RGB)
|
| 499 |
+
faces = app.get(image)
|
| 500 |
+
image = torch.from_numpy(faces[0].normed_embedding)
|
| 501 |
+
ref_images_embeds.append(image.unsqueeze(0))
|
| 502 |
+
ref_images_embeds = torch.stack(ref_images_embeds, dim=0).unsqueeze(0)
|
| 503 |
+
neg_ref_images_embeds = torch.zeros_like(ref_images_embeds)
|
| 504 |
+
id_embeds = torch.cat([neg_ref_images_embeds, ref_images_embeds]).to(dtype=torch.float16, device="cuda")
|
| 505 |
+
|
| 506 |
+
pipeline(
|
| 507 |
+
prompt="A photo of a girl",
|
| 508 |
+
ip_adapter_image_embeds=[id_embeds],
|
| 509 |
+
negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
|
| 510 |
+
).images[0]
|
| 511 |
+
```
|
| 512 |
+
|
| 513 |
+
The IP-Adapter FaceID Plus and Plus v2 models require CLIP image embeddings. Prepare the face embeddings and then extract and pass the CLIP embeddings to the hidden image projection layers.
|
| 514 |
+
|
| 515 |
+
```py
|
| 516 |
+
clip_embeds = pipeline.prepare_ip_adapter_image_embeds(
|
| 517 |
+
[ip_adapter_images], None, torch.device("cuda"), num_images, True)[0]
|
| 518 |
+
|
| 519 |
+
pipeline.unet.encoder_hid_proj.image_projection_layers[0].clip_embeds = clip_embeds.to(dtype=torch.float16)
|
| 520 |
+
# set to True if using IP-Adapter FaceID Plus v2
|
| 521 |
+
pipeline.unet.encoder_hid_proj.image_projection_layers[0].shortcut = False
|
| 522 |
+
```
|
| 523 |
+
|
| 524 |
+
</hfoption>
|
| 525 |
+
</hfoptions>
|
| 526 |
+
|
| 527 |
+
### Multiple IP-Adapters
|
| 528 |
+
|
| 529 |
+
Combine multiple IP-Adapters to generate images in more diverse styles. For example, you can use IP-Adapter Face to generate consistent faces and characters and IP-Adapter Plus to generate those faces in specific styles.
|
| 530 |
+
|
| 531 |
+
Load an image encoder with [`~transformers.CLIPVisionModelWithProjection`].
|
| 532 |
+
|
| 533 |
+
```py
|
| 534 |
+
import torch
|
| 535 |
+
from diffusers import AutoPipelineForText2Image, DDIMScheduler
|
| 536 |
+
from transformers import CLIPVisionModelWithProjection
|
| 537 |
+
from diffusers.utils import load_image
|
| 538 |
+
|
| 539 |
+
image_encoder = CLIPVisionModelWithProjection.from_pretrained(
|
| 540 |
+
"h94/IP-Adapter",
|
| 541 |
+
subfolder="models/image_encoder",
|
| 542 |
+
torch_dtype=torch.float16,
|
| 543 |
+
)
|
| 544 |
+
```
|
| 545 |
+
|
| 546 |
+
Load a base model, scheduler and the following IP-Adapters.
|
| 547 |
+
|
| 548 |
+
- [ip-adapter-plus_sdxl_vit-h](https://huggingface.co/h94/IP-Adapter#ip-adapter-for-sdxl-10) uses patch embeddings and a ViT-H image encoder
|
| 549 |
+
- [ip-adapter-plus-face_sdxl_vit-h](https://huggingface.co/h94/IP-Adapter#ip-adapter-for-sdxl-10) uses patch embeddings and a ViT-H image encoder but it is conditioned on images of cropped faces
|
| 550 |
+
|
| 551 |
+
```py
|
| 552 |
+
pipeline = AutoPipelineForText2Image.from_pretrained(
|
| 553 |
+
"stabilityai/stable-diffusion-xl-base-1.0",
|
| 554 |
+
torch_dtype=torch.float16,
|
| 555 |
+
image_encoder=image_encoder,
|
| 556 |
+
)
|
| 557 |
+
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
|
| 558 |
+
pipeline.load_ip_adapter(
|
| 559 |
+
"h94/IP-Adapter",
|
| 560 |
+
subfolder="sdxl_models",
|
| 561 |
+
weight_name=["ip-adapter-plus_sdxl_vit-h.safetensors", "ip-adapter-plus-face_sdxl_vit-h.safetensors"]
|
| 562 |
+
)
|
| 563 |
+
pipeline.set_ip_adapter_scale([0.7, 0.3])
|
| 564 |
+
# enable_model_cpu_offload to reduce memory usage
|
| 565 |
+
pipeline.enable_model_cpu_offload()
|
| 566 |
+
```
|
| 567 |
+
|
| 568 |
+
Load an image and a folder containing images of a certain style to apply.
|
| 569 |
+
|
| 570 |
+
```py
|
| 571 |
+
face_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png")
|
| 572 |
+
style_folder = "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/style_ziggy"
|
| 573 |
+
style_images = [load_image(f"{style_folder}/img{i}.png") for i in range(10)]
|
| 574 |
+
```
|
| 575 |
+
|
| 576 |
+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
|
| 577 |
+
<figure>
|
| 578 |
+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png" width="400" alt="Face image"/>
|
| 579 |
+
<figcaption style="text-align: center;">face image</figcaption>
|
| 580 |
+
</figure>
|
| 581 |
+
<figure>
|
| 582 |
+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_style_grid.png" width="400" alt="Style images"/>
|
| 583 |
+
<figcaption style="text-align: center;">style images</figcaption>
|
| 584 |
+
</figure>
|
| 585 |
+
</div>
|
| 586 |
+
|
| 587 |
+
Pass style and face images as a list to `ip_adapter_image`.
|
| 588 |
+
|
| 589 |
+
```py
|
| 590 |
+
generator = torch.Generator(device="cpu").manual_seed(0)
|
| 591 |
+
|
| 592 |
+
pipeline(
|
| 593 |
+
prompt="wonderwoman",
|
| 594 |
+
ip_adapter_image=[style_images, face_image],
|
| 595 |
+
negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
|
| 596 |
+
).images[0]
|
| 597 |
+
```
|
| 598 |
+
|
| 599 |
+
<div style="display: flex; justify-content: center;">
|
| 600 |
+
<figure>
|
| 601 |
+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_multi_out.png" width="400" alt="Generated image"/>
|
| 602 |
+
<figcaption style="text-align: center;">generated image</figcaption>
|
| 603 |
+
</figure>
|
| 604 |
+
</div>
|
| 605 |
+
|
| 606 |
+
### Instant generation
|
| 607 |
+
|
| 608 |
+
[Latent Consistency Models (LCM)](../api/pipelines/latent_consistency_models) can generate images 4 steps or less, unlike other diffusion models which require a lot more steps, making it feel "instantaneous". IP-Adapters are compatible with LCM models to instantly generate images.
|
| 609 |
+
|
| 610 |
+
Load the IP-Adapter weights and load the LoRA weights with [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`].
|
| 611 |
+
|
| 612 |
+
```py
|
| 613 |
+
import torch
|
| 614 |
+
from diffusers import DiffusionPipeline, LCMScheduler
|
| 615 |
+
from diffusers.utils import load_image
|
| 616 |
+
|
| 617 |
+
pipeline = DiffusionPipeline.from_pretrained(
|
| 618 |
+
"sd-dreambooth-library/herge-style",
|
| 619 |
+
torch_dtype=torch.float16
|
| 620 |
+
)
|
| 621 |
+
|
| 622 |
+
pipeline.load_ip_adapter(
|
| 623 |
+
"h94/IP-Adapter",
|
| 624 |
+
subfolder="models",
|
| 625 |
+
weight_name="ip-adapter_sd15.bin"
|
| 626 |
+
)
|
| 627 |
+
pipeline.load_lora_weights("latent-consistency/lcm-lora-sdv1-5")
|
| 628 |
+
pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
|
| 629 |
+
# enable_model_cpu_offload to reduce memory usage
|
| 630 |
+
pipeline.enable_model_cpu_offload()
|
| 631 |
+
```
|
| 632 |
+
|
| 633 |
+
Try using a lower IP-Adapter scale to condition generation more on the style you want to apply and remember to use the special token in your prompt to trigger its generation.
|
| 634 |
+
|
| 635 |
+
```py
|
| 636 |
+
pipeline.set_ip_adapter_scale(0.4)
|
| 637 |
+
|
| 638 |
+
prompt = "herge_style woman in armor, best quality, high quality"
|
| 639 |
+
|
| 640 |
+
ip_adapter_image = load_image("https://user-images.githubusercontent.com/24734142/266492875-2d50d223-8475-44f0-a7c6-08b51cb53572.png")
|
| 641 |
+
pipeline(
|
| 642 |
+
prompt=prompt,
|
| 643 |
+
ip_adapter_image=ip_adapter_image,
|
| 644 |
+
num_inference_steps=4,
|
| 645 |
+
guidance_scale=1,
|
| 646 |
+
).images[0]
|
| 647 |
+
```
|
| 648 |
+
|
| 649 |
+
<div style="display: flex; justify-content: center;">
|
| 650 |
+
<figure>
|
| 651 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_herge.png" width="400" alt="Generated image"/>
|
| 652 |
+
<figcaption style="text-align: center;">generated image</figcaption>
|
| 653 |
+
</figure>
|
| 654 |
+
</div>
|
| 655 |
+
|
| 656 |
+
### Structural control
|
| 657 |
+
|
| 658 |
+
For structural control, combine IP-Adapter with [ControlNet](../api/pipelines/controlnet) conditioned on depth maps, edge maps, pose estimations, and more.
|
| 659 |
+
|
| 660 |
+
The example below loads a [`ControlNetModel`] checkpoint conditioned on depth maps and combines it with a IP-Adapter.
|
| 661 |
+
|
| 662 |
+
```py
|
| 663 |
+
import torch
|
| 664 |
+
from diffusers.utils import load_image
|
| 665 |
+
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
|
| 666 |
+
|
| 667 |
+
controlnet = ControlNetModel.from_pretrained(
|
| 668 |
+
"lllyasviel/control_v11f1p_sd15_depth",
|
| 669 |
+
torch_dtype=torch.float16
|
| 670 |
+
)
|
| 671 |
+
|
| 672 |
+
pipeline = StableDiffusionControlNetPipeline.from_pretrained(
|
| 673 |
+
"stable-diffusion-v1-5/stable-diffusion-v1-5",
|
| 674 |
+
controlnet=controlnet,
|
| 675 |
+
torch_dtype=torch.float16
|
| 676 |
+
).to("cuda")
|
| 677 |
+
pipeline.load_ip_adapter(
|
| 678 |
+
"h94/IP-Adapter",
|
| 679 |
+
subfolder="models",
|
| 680 |
+
weight_name="ip-adapter_sd15.bin"
|
| 681 |
+
)
|
| 682 |
+
```
|
| 683 |
+
|
| 684 |
+
Pass the depth map and IP-Adapter image to the pipeline.
|
| 685 |
+
|
| 686 |
+
```py
|
| 687 |
+
pipeline(
|
| 688 |
+
prompt="best quality, high quality",
|
| 689 |
+
image=depth_map,
|
| 690 |
+
ip_adapter_image=ip_adapter_image,
|
| 691 |
+
negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
|
| 692 |
+
).images[0]
|
| 693 |
+
```
|
| 694 |
+
|
| 695 |
+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
|
| 696 |
+
<figure>
|
| 697 |
+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/statue.png" width="300" alt="IP-Adapter image"/>
|
| 698 |
+
<figcaption style="text-align: center;">IP-Adapter image</figcaption>
|
| 699 |
+
</figure>
|
| 700 |
+
<figure>
|
| 701 |
+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/depth.png" width="300" alt="Depth map"/>
|
| 702 |
+
<figcaption style="text-align: center;">depth map</figcaption>
|
| 703 |
+
</figure>
|
| 704 |
+
<figure>
|
| 705 |
+
<img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ipa-controlnet-out.png" width="300" alt="Generated image"/>
|
| 706 |
+
<figcaption style="text-align: center;">generated image</figcaption>
|
| 707 |
+
</figure>
|
| 708 |
+
</div>
|
| 709 |
+
|
| 710 |
+
### Style and layout control
|
| 711 |
+
|
| 712 |
+
For style and layout control, combine IP-Adapter with [InstantStyle](https://huggingface.co/papers/2404.02733). InstantStyle separates *style* (color, texture, overall feel) and *content* from each other. It only applies the style in style-specific blocks of the model to prevent it from distorting other areas of an image. This generates images with stronger and more consistent styles and better control over the layout.
|
| 713 |
+
|
| 714 |
+
The IP-Adapter is only activated for specific parts of the model. Use the [`~loaders.IPAdapterMixin.set_ip_adapter_scale`] method to scale the influence of the IP-Adapter in different layers. The example below activates the IP-Adapter in the second layer of the models down `block_2` and up `block_0`. Down `block_2` is where the IP-Adapter injects layout information and up `block_0` is where style is injected.
|
| 715 |
+
|
| 716 |
+
```py
|
| 717 |
+
import torch
|
| 718 |
+
from diffusers import AutoPipelineForText2Image
|
| 719 |
+
from diffusers.utils import load_image
|
| 720 |
+
|
| 721 |
+
pipeline = AutoPipelineForText2Image.from_pretrained(
|
| 722 |
+
"stabilityai/stable-diffusion-xl-base-1.0",
|
| 723 |
+
torch_dtype=torch.float16
|
| 724 |
+
).to("cuda")
|
| 725 |
+
pipeline.load_ip_adapter(
|
| 726 |
+
"h94/IP-Adapter",
|
| 727 |
+
subfolder="sdxl_models",
|
| 728 |
+
weight_name="ip-adapter_sdxl.bin"
|
| 729 |
+
)
|
| 730 |
+
|
| 731 |
+
scale = {
|
| 732 |
+
"down": {"block_2": [0.0, 1.0]},
|
| 733 |
+
"up": {"block_0": [0.0, 1.0, 0.0]},
|
| 734 |
+
}
|
| 735 |
+
pipeline.set_ip_adapter_scale(scale)
|
| 736 |
+
```
|
| 737 |
+
|
| 738 |
+
Load the style image and generate an image.
|
| 739 |
+
|
| 740 |
+
```py
|
| 741 |
+
style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")
|
| 742 |
+
|
| 743 |
+
pipeline(
|
| 744 |
+
prompt="a cat, masterpiece, best quality, high quality",
|
| 745 |
+
ip_adapter_image=style_image,
|
| 746 |
+
negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
|
| 747 |
+
guidance_scale=5,
|
| 748 |
+
).images[0]
|
| 749 |
+
```
|
| 750 |
+
|
| 751 |
+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
|
| 752 |
+
<figure>
|
| 753 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg" width="400" alt="Style image"/>
|
| 754 |
+
<figcaption style="text-align: center;">style image</figcaption>
|
| 755 |
+
</figure>
|
| 756 |
+
<figure>
|
| 757 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/datasets/cat_style_layout.png" width="400" alt="Generated image"/>
|
| 758 |
+
<figcaption style="text-align: center;">generated image</figcaption>
|
| 759 |
+
</figure>
|
| 760 |
+
</div>
|
| 761 |
+
|
| 762 |
+
You can also insert the IP-Adapter in all the model layers. This tends to generate images that focus more on the image prompt and may reduce the diversity of generated images. Only activate the IP-Adapter in up `block_0` or the style layer.
|
| 763 |
+
|
| 764 |
+
> [!TIP]
|
| 765 |
+
> You don't need to specify all the layers in the `scale` dictionary. Layers not included are set to 0, which means the IP-Adapter is disabled.
|
| 766 |
+
|
| 767 |
+
```py
|
| 768 |
+
scale = {
|
| 769 |
+
"up": {"block_0": [0.0, 1.0, 0.0]},
|
| 770 |
+
}
|
| 771 |
+
pipeline.set_ip_adapter_scale(scale)
|
| 772 |
+
|
| 773 |
+
pipeline(
|
| 774 |
+
prompt="a cat, masterpiece, best quality, high quality",
|
| 775 |
+
ip_adapter_image=style_image,
|
| 776 |
+
negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
|
| 777 |
+
guidance_scale=5,
|
| 778 |
+
).images[0]
|
| 779 |
+
```
|
| 780 |
+
|
| 781 |
+
<div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
|
| 782 |
+
<figure>
|
| 783 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/datasets/cat_style_only.png" width="400" alt="Generated image (style only)"/>
|
| 784 |
+
<figcaption style="text-align: center;">style-layer generated image</figcaption>
|
| 785 |
+
</figure>
|
| 786 |
+
<figure>
|
| 787 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/datasets/cat_ip_adapter.png" width="400" alt="Generated image (IP-Adapter only)"/>
|
| 788 |
+
<figcaption style="text-align: center;">all layers generated image</figcaption>
|
| 789 |
+
</figure>
|
| 790 |
+
</div>
|
diffusers/docs/source/en/using-diffusers/loading.md
ADDED
|
@@ -0,0 +1,583 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
| 2 |
+
|
| 3 |
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
| 4 |
+
the License. You may obtain a copy of the License at
|
| 5 |
+
|
| 6 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 7 |
+
|
| 8 |
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
| 9 |
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
| 10 |
+
specific language governing permissions and limitations under the License.
|
| 11 |
+
-->
|
| 12 |
+
|
| 13 |
+
# Load pipelines
|
| 14 |
+
|
| 15 |
+
[[open-in-colab]]
|
| 16 |
+
|
| 17 |
+
Diffusion systems consist of multiple components like parameterized models and schedulers that interact in complex ways. That is why we designed the [`DiffusionPipeline`] to wrap the complexity of the entire diffusion system into an easy-to-use API. At the same time, the [`DiffusionPipeline`] is entirely customizable so you can modify each component to build a diffusion system for your use case.
|
| 18 |
+
|
| 19 |
+
This guide will show you how to load:
|
| 20 |
+
|
| 21 |
+
- pipelines from the Hub and locally
|
| 22 |
+
- different components into a pipeline
|
| 23 |
+
- multiple pipelines without increasing memory usage
|
| 24 |
+
- checkpoint variants such as different floating point types or non-exponential mean averaged (EMA) weights
|
| 25 |
+
|
| 26 |
+
## Load a pipeline
|
| 27 |
+
|
| 28 |
+
> [!TIP]
|
| 29 |
+
> Skip to the [DiffusionPipeline explained](#diffusionpipeline-explained) section if you're interested in an explanation about how the [`DiffusionPipeline`] class works.
|
| 30 |
+
|
| 31 |
+
There are two ways to load a pipeline for a task:
|
| 32 |
+
|
| 33 |
+
1. Load the generic [`DiffusionPipeline`] class and allow it to automatically detect the correct pipeline class from the checkpoint.
|
| 34 |
+
2. Load a specific pipeline class for a specific task.
|
| 35 |
+
|
| 36 |
+
<hfoptions id="pipelines">
|
| 37 |
+
<hfoption id="generic pipeline">
|
| 38 |
+
|
| 39 |
+
The [`DiffusionPipeline`] class is a simple and generic way to load the latest trending diffusion model from the [Hub](https://huggingface.co/models?library=diffusers&sort=trending). It uses the [`~DiffusionPipeline.from_pretrained`] method to automatically detect the correct pipeline class for a task from the checkpoint, downloads and caches all the required configuration and weight files, and returns a pipeline ready for inference.
|
| 40 |
+
|
| 41 |
+
```python
|
| 42 |
+
from diffusers import DiffusionPipeline
|
| 43 |
+
|
| 44 |
+
pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", use_safetensors=True)
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
This same checkpoint can also be used for an image-to-image task. The [`DiffusionPipeline`] class can handle any task as long as you provide the appropriate inputs. For example, for an image-to-image task, you need to pass an initial image to the pipeline.
|
| 48 |
+
|
| 49 |
+
```py
|
| 50 |
+
from diffusers import DiffusionPipeline
|
| 51 |
+
|
| 52 |
+
pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", use_safetensors=True)
|
| 53 |
+
|
| 54 |
+
init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png")
|
| 55 |
+
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
|
| 56 |
+
image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", image=init_image).images[0]
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
</hfoption>
|
| 60 |
+
<hfoption id="specific pipeline">
|
| 61 |
+
|
| 62 |
+
Checkpoints can be loaded by their specific pipeline class if you already know it. For example, to load a Stable Diffusion model, use the [`StableDiffusionPipeline`] class.
|
| 63 |
+
|
| 64 |
+
```python
|
| 65 |
+
from diffusers import StableDiffusionPipeline
|
| 66 |
+
|
| 67 |
+
pipeline = StableDiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", use_safetensors=True)
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
This same checkpoint may also be used for another task like image-to-image. To differentiate what task you want to use the checkpoint for, you have to use the corresponding task-specific pipeline class. For example, to use the same checkpoint for image-to-image, use the [`StableDiffusionImg2ImgPipeline`] class.
|
| 71 |
+
|
| 72 |
+
```py
|
| 73 |
+
from diffusers import StableDiffusionImg2ImgPipeline
|
| 74 |
+
|
| 75 |
+
pipeline = StableDiffusionImg2ImgPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", use_safetensors=True)
|
| 76 |
+
```
|
| 77 |
+
|
| 78 |
+
</hfoption>
|
| 79 |
+
</hfoptions>
|
| 80 |
+
|
| 81 |
+
Use the Space below to gauge a pipeline's memory requirements before you download and load it to see if it runs on your hardware.
|
| 82 |
+
|
| 83 |
+
<div class="block dark:hidden">
|
| 84 |
+
<iframe
|
| 85 |
+
src="https://diffusers-compute-pipeline-size.hf.space?__theme=light"
|
| 86 |
+
width="850"
|
| 87 |
+
height="1600"
|
| 88 |
+
></iframe>
|
| 89 |
+
</div>
|
| 90 |
+
<div class="hidden dark:block">
|
| 91 |
+
<iframe
|
| 92 |
+
src="https://diffusers-compute-pipeline-size.hf.space?__theme=dark"
|
| 93 |
+
width="850"
|
| 94 |
+
height="1600"
|
| 95 |
+
></iframe>
|
| 96 |
+
</div>
|
| 97 |
+
|
| 98 |
+
### Specifying Component-Specific Data Types
|
| 99 |
+
|
| 100 |
+
You can customize the data types for individual sub-models by passing a dictionary to the `torch_dtype` parameter. This allows you to load different components of a pipeline in different floating point precisions. For instance, if you want to load the transformer with `torch.bfloat16` and all other components with `torch.float16`, you can pass a dictionary mapping:
|
| 101 |
+
|
| 102 |
+
```python
|
| 103 |
+
from diffusers import HunyuanVideoPipeline
|
| 104 |
+
import torch
|
| 105 |
+
|
| 106 |
+
pipe = HunyuanVideoPipeline.from_pretrained(
|
| 107 |
+
"hunyuanvideo-community/HunyuanVideo",
|
| 108 |
+
torch_dtype={"transformer": torch.bfloat16, "default": torch.float16},
|
| 109 |
+
)
|
| 110 |
+
print(pipe.transformer.dtype, pipe.vae.dtype) # (torch.bfloat16, torch.float16)
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
If a component is not explicitly specified in the dictionary and no `default` is provided, it will be loaded with `torch.float32`.
|
| 114 |
+
|
| 115 |
+
### Local pipeline
|
| 116 |
+
|
| 117 |
+
To load a pipeline locally, use [git-lfs](https://git-lfs.github.com/) to manually download a checkpoint to your local disk.
|
| 118 |
+
|
| 119 |
+
```bash
|
| 120 |
+
git-lfs install
|
| 121 |
+
git clone https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
This creates a local folder, ./stable-diffusion-v1-5, on your disk and you should pass its path to [`~DiffusionPipeline.from_pretrained`].
|
| 125 |
+
|
| 126 |
+
```python
|
| 127 |
+
from diffusers import DiffusionPipeline
|
| 128 |
+
|
| 129 |
+
stable_diffusion = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5", use_safetensors=True)
|
| 130 |
+
```
|
| 131 |
+
|
| 132 |
+
The [`~DiffusionPipeline.from_pretrained`] method won't download files from the Hub when it detects a local path, but this also means it won't download and cache the latest changes to a checkpoint.
|
| 133 |
+
|
| 134 |
+
## Customize a pipeline
|
| 135 |
+
|
| 136 |
+
You can customize a pipeline by loading different components into it. This is important because you can:
|
| 137 |
+
|
| 138 |
+
- change to a scheduler with faster generation speed or higher generation quality depending on your needs (call the `scheduler.compatibles` method on your pipeline to see compatible schedulers)
|
| 139 |
+
- change a default pipeline component to a newer and better performing one
|
| 140 |
+
|
| 141 |
+
For example, let's customize the default [stabilityai/stable-diffusion-xl-base-1.0](https://hf.co/stabilityai/stable-diffusion-xl-base-1.0) checkpoint with:
|
| 142 |
+
|
| 143 |
+
- The [`HeunDiscreteScheduler`] to generate higher quality images at the expense of slower generation speed. You must pass the `subfolder="scheduler"` parameter in [`~HeunDiscreteScheduler.from_pretrained`] to load the scheduler configuration into the correct [subfolder](https://hf.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main/scheduler) of the pipeline repository.
|
| 144 |
+
- A more stable VAE that runs in fp16.
|
| 145 |
+
|
| 146 |
+
```py
|
| 147 |
+
from diffusers import StableDiffusionXLPipeline, HeunDiscreteScheduler, AutoencoderKL
|
| 148 |
+
import torch
|
| 149 |
+
|
| 150 |
+
scheduler = HeunDiscreteScheduler.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler")
|
| 151 |
+
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16, use_safetensors=True)
|
| 152 |
+
```
|
| 153 |
+
|
| 154 |
+
Now pass the new scheduler and VAE to the [`StableDiffusionXLPipeline`].
|
| 155 |
+
|
| 156 |
+
```py
|
| 157 |
+
pipeline = StableDiffusionXLPipeline.from_pretrained(
|
| 158 |
+
"stabilityai/stable-diffusion-xl-base-1.0",
|
| 159 |
+
scheduler=scheduler,
|
| 160 |
+
vae=vae,
|
| 161 |
+
torch_dtype=torch.float16,
|
| 162 |
+
variant="fp16",
|
| 163 |
+
use_safetensors=True
|
| 164 |
+
).to("cuda")
|
| 165 |
+
```
|
| 166 |
+
|
| 167 |
+
## Reuse a pipeline
|
| 168 |
+
|
| 169 |
+
When you load multiple pipelines that share the same model components, it makes sense to reuse the shared components instead of reloading everything into memory again, especially if your hardware is memory-constrained. For example:
|
| 170 |
+
|
| 171 |
+
1. You generated an image with the [`StableDiffusionPipeline`] but you want to improve its quality with the [`StableDiffusionSAGPipeline`]. Both of these pipelines share the same pretrained model, so it'd be a waste of memory to load the same model twice.
|
| 172 |
+
2. You want to add a model component, like a [`MotionAdapter`](../api/pipelines/animatediff#animatediffpipeline), to [`AnimateDiffPipeline`] which was instantiated from an existing [`StableDiffusionPipeline`]. Again, both pipelines share the same pretrained model, so it'd be a waste of memory to load an entirely new pipeline again.
|
| 173 |
+
|
| 174 |
+
With the [`DiffusionPipeline.from_pipe`] API, you can switch between multiple pipelines to take advantage of their different features without increasing memory-usage. It is similar to turning on and off a feature in your pipeline.
|
| 175 |
+
|
| 176 |
+
> [!TIP]
|
| 177 |
+
> To switch between tasks (rather than features), use the [`~DiffusionPipeline.from_pipe`] method with the [AutoPipeline](../api/pipelines/auto_pipeline) class, which automatically identifies the pipeline class based on the task (learn more in the [AutoPipeline](../tutorials/autopipeline) tutorial).
|
| 178 |
+
|
| 179 |
+
Let's start with a [`StableDiffusionPipeline`] and then reuse the loaded model components to create a [`StableDiffusionSAGPipeline`] to increase generation quality. You'll use the [`StableDiffusionPipeline`] with an [IP-Adapter](./ip_adapter) to generate a bear eating pizza.
|
| 180 |
+
|
| 181 |
+
```python
|
| 182 |
+
from diffusers import DiffusionPipeline, StableDiffusionSAGPipeline
|
| 183 |
+
import torch
|
| 184 |
+
import gc
|
| 185 |
+
from diffusers.utils import load_image
|
| 186 |
+
from accelerate.utils import compute_module_sizes
|
| 187 |
+
|
| 188 |
+
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_neg_embed.png")
|
| 189 |
+
|
| 190 |
+
pipe_sd = DiffusionPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", torch_dtype=torch.float16)
|
| 191 |
+
pipe_sd.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
|
| 192 |
+
pipe_sd.set_ip_adapter_scale(0.6)
|
| 193 |
+
pipe_sd.to("cuda")
|
| 194 |
+
|
| 195 |
+
generator = torch.Generator(device="cpu").manual_seed(33)
|
| 196 |
+
out_sd = pipe_sd(
|
| 197 |
+
prompt="bear eats pizza",
|
| 198 |
+
negative_prompt="wrong white balance, dark, sketches,worst quality,low quality",
|
| 199 |
+
ip_adapter_image=image,
|
| 200 |
+
num_inference_steps=50,
|
| 201 |
+
generator=generator,
|
| 202 |
+
).images[0]
|
| 203 |
+
out_sd
|
| 204 |
+
```
|
| 205 |
+
|
| 206 |
+
<div class="flex justify-center">
|
| 207 |
+
<img class="rounded-xl" src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/from_pipe_out_sd_0.png"/>
|
| 208 |
+
</div>
|
| 209 |
+
|
| 210 |
+
For reference, you can check how much memory this process consumed.
|
| 211 |
+
|
| 212 |
+
```python
|
| 213 |
+
def bytes_to_giga_bytes(bytes):
|
| 214 |
+
return bytes / 1024 / 1024 / 1024
|
| 215 |
+
print(f"Max memory allocated: {bytes_to_giga_bytes(torch.cuda.max_memory_allocated())} GB")
|
| 216 |
+
"Max memory allocated: 4.406213283538818 GB"
|
| 217 |
+
```
|
| 218 |
+
|
| 219 |
+
Now, reuse the same pipeline components from [`StableDiffusionPipeline`] in [`StableDiffusionSAGPipeline`] with the [`~DiffusionPipeline.from_pipe`] method.
|
| 220 |
+
|
| 221 |
+
> [!WARNING]
|
| 222 |
+
> Some pipeline methods may not function properly on new pipelines created with [`~DiffusionPipeline.from_pipe`]. For instance, the [`~DiffusionPipeline.enable_model_cpu_offload`] method installs hooks on the model components based on a unique offloading sequence for each pipeline. If the models are executed in a different order in the new pipeline, the CPU offloading may not work correctly.
|
| 223 |
+
>
|
| 224 |
+
> To ensure everything works as expected, we recommend re-applying a pipeline method on a new pipeline created with [`~DiffusionPipeline.from_pipe`].
|
| 225 |
+
|
| 226 |
+
```python
|
| 227 |
+
pipe_sag = StableDiffusionSAGPipeline.from_pipe(
|
| 228 |
+
pipe_sd
|
| 229 |
+
)
|
| 230 |
+
|
| 231 |
+
generator = torch.Generator(device="cpu").manual_seed(33)
|
| 232 |
+
out_sag = pipe_sag(
|
| 233 |
+
prompt="bear eats pizza",
|
| 234 |
+
negative_prompt="wrong white balance, dark, sketches,worst quality,low quality",
|
| 235 |
+
ip_adapter_image=image,
|
| 236 |
+
num_inference_steps=50,
|
| 237 |
+
generator=generator,
|
| 238 |
+
guidance_scale=1.0,
|
| 239 |
+
sag_scale=0.75
|
| 240 |
+
).images[0]
|
| 241 |
+
out_sag
|
| 242 |
+
```
|
| 243 |
+
|
| 244 |
+
<div class="flex justify-center">
|
| 245 |
+
<img class="rounded-xl" src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/from_pipe_out_sag_1.png"/>
|
| 246 |
+
</div>
|
| 247 |
+
|
| 248 |
+
If you check the memory usage, you'll see it remains the same as before because [`StableDiffusionPipeline`] and [`StableDiffusionSAGPipeline`] are sharing the same pipeline components. This allows you to use them interchangeably without any additional memory overhead.
|
| 249 |
+
|
| 250 |
+
```py
|
| 251 |
+
print(f"Max memory allocated: {bytes_to_giga_bytes(torch.cuda.max_memory_allocated())} GB")
|
| 252 |
+
"Max memory allocated: 4.406213283538818 GB"
|
| 253 |
+
```
|
| 254 |
+
|
| 255 |
+
Let's animate the image with the [`AnimateDiffPipeline`] and also add a [`MotionAdapter`] module to the pipeline. For the [`AnimateDiffPipeline`], you need to unload the IP-Adapter first and reload it *after* you've created your new pipeline (this only applies to the [`AnimateDiffPipeline`]).
|
| 256 |
+
|
| 257 |
+
```py
|
| 258 |
+
from diffusers import AnimateDiffPipeline, MotionAdapter, DDIMScheduler
|
| 259 |
+
from diffusers.utils import export_to_gif
|
| 260 |
+
|
| 261 |
+
pipe_sag.unload_ip_adapter()
|
| 262 |
+
adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
|
| 263 |
+
|
| 264 |
+
pipe_animate = AnimateDiffPipeline.from_pipe(pipe_sd, motion_adapter=adapter)
|
| 265 |
+
pipe_animate.scheduler = DDIMScheduler.from_config(pipe_animate.scheduler.config, beta_schedule="linear")
|
| 266 |
+
# load IP-Adapter and LoRA weights again
|
| 267 |
+
pipe_animate.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
|
| 268 |
+
pipe_animate.load_lora_weights("guoyww/animatediff-motion-lora-zoom-out", adapter_name="zoom-out")
|
| 269 |
+
pipe_animate.to("cuda")
|
| 270 |
+
|
| 271 |
+
generator = torch.Generator(device="cpu").manual_seed(33)
|
| 272 |
+
pipe_animate.set_adapters("zoom-out", adapter_weights=0.75)
|
| 273 |
+
out = pipe_animate(
|
| 274 |
+
prompt="bear eats pizza",
|
| 275 |
+
num_frames=16,
|
| 276 |
+
num_inference_steps=50,
|
| 277 |
+
ip_adapter_image=image,
|
| 278 |
+
generator=generator,
|
| 279 |
+
).frames[0]
|
| 280 |
+
export_to_gif(out, "out_animate.gif")
|
| 281 |
+
```
|
| 282 |
+
|
| 283 |
+
<div class="flex justify-center">
|
| 284 |
+
<img class="rounded-xl" src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/from_pipe_out_animate_3.gif"/>
|
| 285 |
+
</div>
|
| 286 |
+
|
| 287 |
+
The [`AnimateDiffPipeline`] is more memory-intensive and consumes 15GB of memory (see the [Memory-usage of from_pipe](#memory-usage-of-from_pipe) section to learn what this means for your memory-usage).
|
| 288 |
+
|
| 289 |
+
```py
|
| 290 |
+
print(f"Max memory allocated: {bytes_to_giga_bytes(torch.cuda.max_memory_allocated())} GB")
|
| 291 |
+
"Max memory allocated: 15.178664207458496 GB"
|
| 292 |
+
```
|
| 293 |
+
|
| 294 |
+
### Modify from_pipe components
|
| 295 |
+
|
| 296 |
+
Pipelines loaded with [`~DiffusionPipeline.from_pipe`] can be customized with different model components or methods. However, whenever you modify the *state* of the model components, it affects all the other pipelines that share the same components. For example, if you call [`~diffusers.loaders.IPAdapterMixin.unload_ip_adapter`] on the [`StableDiffusionSAGPipeline`], you won't be able to use IP-Adapter with the [`StableDiffusionPipeline`] because it's been removed from their shared components.
|
| 297 |
+
|
| 298 |
+
```py
|
| 299 |
+
pipe.sag_unload_ip_adapter()
|
| 300 |
+
|
| 301 |
+
generator = torch.Generator(device="cpu").manual_seed(33)
|
| 302 |
+
out_sd = pipe_sd(
|
| 303 |
+
prompt="bear eats pizza",
|
| 304 |
+
negative_prompt="wrong white balance, dark, sketches,worst quality,low quality",
|
| 305 |
+
ip_adapter_image=image,
|
| 306 |
+
num_inference_steps=50,
|
| 307 |
+
generator=generator,
|
| 308 |
+
).images[0]
|
| 309 |
+
"AttributeError: 'NoneType' object has no attribute 'image_projection_layers'"
|
| 310 |
+
```
|
| 311 |
+
|
| 312 |
+
### Memory usage of from_pipe
|
| 313 |
+
|
| 314 |
+
The memory requirement of loading multiple pipelines with [`~DiffusionPipeline.from_pipe`] is determined by the pipeline with the highest memory-usage regardless of the number of pipelines you create.
|
| 315 |
+
|
| 316 |
+
| Pipeline | Memory usage (GB) |
|
| 317 |
+
|---|---|
|
| 318 |
+
| StableDiffusionPipeline | 4.400 |
|
| 319 |
+
| StableDiffusionSAGPipeline | 4.400 |
|
| 320 |
+
| AnimateDiffPipeline | 15.178 |
|
| 321 |
+
|
| 322 |
+
The [`AnimateDiffPipeline`] has the highest memory requirement, so the *total memory-usage* is based only on the [`AnimateDiffPipeline`]. Your memory-usage will not increase if you create additional pipelines as long as their memory requirements doesn't exceed that of the [`AnimateDiffPipeline`]. Each pipeline can be used interchangeably without any additional memory overhead.
|
| 323 |
+
|
| 324 |
+
## Safety checker
|
| 325 |
+
|
| 326 |
+
Diffusers implements a [safety checker](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py) for Stable Diffusion models which can generate harmful content. The safety checker screens the generated output against known hardcoded not-safe-for-work (NSFW) content. If for whatever reason you'd like to disable the safety checker, pass `safety_checker=None` to the [`~DiffusionPipeline.from_pretrained`] method.
|
| 327 |
+
|
| 328 |
+
```python
|
| 329 |
+
from diffusers import DiffusionPipeline
|
| 330 |
+
|
| 331 |
+
pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", safety_checker=None, use_safetensors=True)
|
| 332 |
+
"""
|
| 333 |
+
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide by the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend keeping the safety filter enabled in all public-facing circumstances, disabling it only for use cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
|
| 334 |
+
"""
|
| 335 |
+
```
|
| 336 |
+
|
| 337 |
+
## Checkpoint variants
|
| 338 |
+
|
| 339 |
+
A checkpoint variant is usually a checkpoint whose weights are:
|
| 340 |
+
|
| 341 |
+
- Stored in a different floating point type, such as [torch.float16](https://pytorch.org/docs/stable/tensors.html#data-types), because it only requires half the bandwidth and storage to download. You can't use this variant if you're continuing training or using a CPU.
|
| 342 |
+
- Non-exponential mean averaged (EMA) weights which shouldn't be used for inference. You should use this variant to continue finetuning a model.
|
| 343 |
+
|
| 344 |
+
> [!TIP]
|
| 345 |
+
> When the checkpoints have identical model structures, but they were trained on different datasets and with a different training setup, they should be stored in separate repositories. For example, [stabilityai/stable-diffusion-2](https://hf.co/stabilityai/stable-diffusion-2) and [stabilityai/stable-diffusion-2-1](https://hf.co/stabilityai/stable-diffusion-2-1) are stored in separate repositories.
|
| 346 |
+
|
| 347 |
+
Otherwise, a variant is **identical** to the original checkpoint. They have exactly the same serialization format (like [safetensors](./using_safetensors)), model structure, and their weights have identical tensor shapes.
|
| 348 |
+
|
| 349 |
+
| **checkpoint type** | **weight name** | **argument for loading weights** |
|
| 350 |
+
|---------------------|---------------------------------------------|----------------------------------|
|
| 351 |
+
| original | diffusion_pytorch_model.safetensors | |
|
| 352 |
+
| floating point | diffusion_pytorch_model.fp16.safetensors | `variant`, `torch_dtype` |
|
| 353 |
+
| non-EMA | diffusion_pytorch_model.non_ema.safetensors | `variant` |
|
| 354 |
+
|
| 355 |
+
There are two important arguments for loading variants:
|
| 356 |
+
|
| 357 |
+
- `torch_dtype` specifies the floating point precision of the loaded checkpoint. For example, if you want to save bandwidth by loading a fp16 variant, you should set `variant="fp16"` and `torch_dtype=torch.float16` to *convert the weights* to fp16. Otherwise, the fp16 weights are converted to the default fp32 precision.
|
| 358 |
+
|
| 359 |
+
If you only set `torch_dtype=torch.float16`, the default fp32 weights are downloaded first and then converted to fp16.
|
| 360 |
+
|
| 361 |
+
- `variant` specifies which files should be loaded from the repository. For example, if you want to load a non-EMA variant of a UNet from [stable-diffusion-v1-5/stable-diffusion-v1-5](https://hf.co/stable-diffusion-v1-5/stable-diffusion-v1-5/tree/main/unet), set `variant="non_ema"` to download the `non_ema` file.
|
| 362 |
+
|
| 363 |
+
<hfoptions id="variants">
|
| 364 |
+
<hfoption id="fp16">
|
| 365 |
+
|
| 366 |
+
```py
|
| 367 |
+
from diffusers import DiffusionPipeline
|
| 368 |
+
import torch
|
| 369 |
+
|
| 370 |
+
pipeline = DiffusionPipeline.from_pretrained(
|
| 371 |
+
"stable-diffusion-v1-5/stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16, use_safetensors=True
|
| 372 |
+
)
|
| 373 |
+
```
|
| 374 |
+
|
| 375 |
+
</hfoption>
|
| 376 |
+
<hfoption id="non-EMA">
|
| 377 |
+
|
| 378 |
+
```py
|
| 379 |
+
pipeline = DiffusionPipeline.from_pretrained(
|
| 380 |
+
"stable-diffusion-v1-5/stable-diffusion-v1-5", variant="non_ema", use_safetensors=True
|
| 381 |
+
)
|
| 382 |
+
```
|
| 383 |
+
|
| 384 |
+
</hfoption>
|
| 385 |
+
</hfoptions>
|
| 386 |
+
|
| 387 |
+
Use the `variant` parameter in the [`DiffusionPipeline.save_pretrained`] method to save a checkpoint as a different floating point type or as a non-EMA variant. You should try save a variant to the same folder as the original checkpoint, so you have the option of loading both from the same folder.
|
| 388 |
+
|
| 389 |
+
<hfoptions id="save">
|
| 390 |
+
<hfoption id="fp16">
|
| 391 |
+
|
| 392 |
+
```python
|
| 393 |
+
from diffusers import DiffusionPipeline
|
| 394 |
+
|
| 395 |
+
pipeline.save_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", variant="fp16")
|
| 396 |
+
```
|
| 397 |
+
|
| 398 |
+
</hfoption>
|
| 399 |
+
<hfoption id="non_ema">
|
| 400 |
+
|
| 401 |
+
```py
|
| 402 |
+
pipeline.save_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", variant="non_ema")
|
| 403 |
+
```
|
| 404 |
+
|
| 405 |
+
</hfoption>
|
| 406 |
+
</hfoptions>
|
| 407 |
+
|
| 408 |
+
If you don't save the variant to an existing folder, you must specify the `variant` argument otherwise it'll throw an `Exception` because it can't find the original checkpoint.
|
| 409 |
+
|
| 410 |
+
```python
|
| 411 |
+
# 👎 this won't work
|
| 412 |
+
pipeline = DiffusionPipeline.from_pretrained(
|
| 413 |
+
"./stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True
|
| 414 |
+
)
|
| 415 |
+
# 👍 this works
|
| 416 |
+
pipeline = DiffusionPipeline.from_pretrained(
|
| 417 |
+
"./stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16, use_safetensors=True
|
| 418 |
+
)
|
| 419 |
+
```
|
| 420 |
+
|
| 421 |
+
## DiffusionPipeline explained
|
| 422 |
+
|
| 423 |
+
As a class method, [`DiffusionPipeline.from_pretrained`] is responsible for two things:
|
| 424 |
+
|
| 425 |
+
- Download the latest version of the folder structure required for inference and cache it. If the latest folder structure is available in the local cache, [`DiffusionPipeline.from_pretrained`] reuses the cache and won't redownload the files.
|
| 426 |
+
- Load the cached weights into the correct pipeline [class](../api/pipelines/overview#diffusers-summary) - retrieved from the `model_index.json` file - and return an instance of it.
|
| 427 |
+
|
| 428 |
+
The pipelines' underlying folder structure corresponds directly with their class instances. For example, the [`StableDiffusionPipeline`] corresponds to the folder structure in [`stable-diffusion-v1-5/stable-diffusion-v1-5`](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5).
|
| 429 |
+
|
| 430 |
+
```python
|
| 431 |
+
from diffusers import DiffusionPipeline
|
| 432 |
+
|
| 433 |
+
repo_id = "stable-diffusion-v1-5/stable-diffusion-v1-5"
|
| 434 |
+
pipeline = DiffusionPipeline.from_pretrained(repo_id, use_safetensors=True)
|
| 435 |
+
print(pipeline)
|
| 436 |
+
```
|
| 437 |
+
|
| 438 |
+
You'll see pipeline is an instance of [`StableDiffusionPipeline`], which consists of seven components:
|
| 439 |
+
|
| 440 |
+
- `"feature_extractor"`: a [`~transformers.CLIPImageProcessor`] from 🤗 Transformers.
|
| 441 |
+
- `"safety_checker"`: a [component](https://github.com/huggingface/diffusers/blob/e55687e1e15407f60f32242027b7bb8170e58266/src/diffusers/pipelines/stable_diffusion/safety_checker.py#L32) for screening against harmful content.
|
| 442 |
+
- `"scheduler"`: an instance of [`PNDMScheduler`].
|
| 443 |
+
- `"text_encoder"`: a [`~transformers.CLIPTextModel`] from 🤗 Transformers.
|
| 444 |
+
- `"tokenizer"`: a [`~transformers.CLIPTokenizer`] from 🤗 Transformers.
|
| 445 |
+
- `"unet"`: an instance of [`UNet2DConditionModel`].
|
| 446 |
+
- `"vae"`: an instance of [`AutoencoderKL`].
|
| 447 |
+
|
| 448 |
+
```json
|
| 449 |
+
StableDiffusionPipeline {
|
| 450 |
+
"feature_extractor": [
|
| 451 |
+
"transformers",
|
| 452 |
+
"CLIPImageProcessor"
|
| 453 |
+
],
|
| 454 |
+
"safety_checker": [
|
| 455 |
+
"stable_diffusion",
|
| 456 |
+
"StableDiffusionSafetyChecker"
|
| 457 |
+
],
|
| 458 |
+
"scheduler": [
|
| 459 |
+
"diffusers",
|
| 460 |
+
"PNDMScheduler"
|
| 461 |
+
],
|
| 462 |
+
"text_encoder": [
|
| 463 |
+
"transformers",
|
| 464 |
+
"CLIPTextModel"
|
| 465 |
+
],
|
| 466 |
+
"tokenizer": [
|
| 467 |
+
"transformers",
|
| 468 |
+
"CLIPTokenizer"
|
| 469 |
+
],
|
| 470 |
+
"unet": [
|
| 471 |
+
"diffusers",
|
| 472 |
+
"UNet2DConditionModel"
|
| 473 |
+
],
|
| 474 |
+
"vae": [
|
| 475 |
+
"diffusers",
|
| 476 |
+
"AutoencoderKL"
|
| 477 |
+
]
|
| 478 |
+
}
|
| 479 |
+
```
|
| 480 |
+
|
| 481 |
+
Compare the components of the pipeline instance to the [`stable-diffusion-v1-5/stable-diffusion-v1-5`](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/tree/main) folder structure, and you'll see there is a separate folder for each of the components in the repository:
|
| 482 |
+
|
| 483 |
+
```
|
| 484 |
+
.
|
| 485 |
+
├── feature_extractor
|
| 486 |
+
│ └── preprocessor_config.json
|
| 487 |
+
├── model_index.json
|
| 488 |
+
├── safety_checker
|
| 489 |
+
│ ├── config.json
|
| 490 |
+
| ├── model.fp16.safetensors
|
| 491 |
+
│ ├── model.safetensors
|
| 492 |
+
│ ├── pytorch_model.bin
|
| 493 |
+
| └── pytorch_model.fp16.bin
|
| 494 |
+
├── scheduler
|
| 495 |
+
│ └── scheduler_config.json
|
| 496 |
+
├── text_encoder
|
| 497 |
+
│ ├── config.json
|
| 498 |
+
| ├── model.fp16.safetensors
|
| 499 |
+
│ ├── model.safetensors
|
| 500 |
+
│ |── pytorch_model.bin
|
| 501 |
+
| └── pytorch_model.fp16.bin
|
| 502 |
+
├── tokenizer
|
| 503 |
+
│ ├── merges.txt
|
| 504 |
+
│ ├── special_tokens_map.json
|
| 505 |
+
│ ├── tokenizer_config.json
|
| 506 |
+
│ └── vocab.json
|
| 507 |
+
├── unet
|
| 508 |
+
│ ├── config.json
|
| 509 |
+
│ ├── diffusion_pytorch_model.bin
|
| 510 |
+
| |── diffusion_pytorch_model.fp16.bin
|
| 511 |
+
│ |── diffusion_pytorch_model.f16.safetensors
|
| 512 |
+
│ |── diffusion_pytorch_model.non_ema.bin
|
| 513 |
+
│ |── diffusion_pytorch_model.non_ema.safetensors
|
| 514 |
+
│ └── diffusion_pytorch_model.safetensors
|
| 515 |
+
|── vae
|
| 516 |
+
. ├── config.json
|
| 517 |
+
. ├── diffusion_pytorch_model.bin
|
| 518 |
+
├── diffusion_pytorch_model.fp16.bin
|
| 519 |
+
├── diffusion_pytorch_model.fp16.safetensors
|
| 520 |
+
└── diffusion_pytorch_model.safetensors
|
| 521 |
+
```
|
| 522 |
+
|
| 523 |
+
You can access each of the components of the pipeline as an attribute to view its configuration:
|
| 524 |
+
|
| 525 |
+
```py
|
| 526 |
+
pipeline.tokenizer
|
| 527 |
+
CLIPTokenizer(
|
| 528 |
+
name_or_path="/root/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/39593d5650112b4cc580433f6b0435385882d819/tokenizer",
|
| 529 |
+
vocab_size=49408,
|
| 530 |
+
model_max_length=77,
|
| 531 |
+
is_fast=False,
|
| 532 |
+
padding_side="right",
|
| 533 |
+
truncation_side="right",
|
| 534 |
+
special_tokens={
|
| 535 |
+
"bos_token": AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
|
| 536 |
+
"eos_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
|
| 537 |
+
"unk_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
|
| 538 |
+
"pad_token": "<|endoftext|>",
|
| 539 |
+
},
|
| 540 |
+
clean_up_tokenization_spaces=True
|
| 541 |
+
)
|
| 542 |
+
```
|
| 543 |
+
|
| 544 |
+
Every pipeline expects a [`model_index.json`](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/model_index.json) file that tells the [`DiffusionPipeline`]:
|
| 545 |
+
|
| 546 |
+
- which pipeline class to load from `_class_name`
|
| 547 |
+
- which version of 🧨 Diffusers was used to create the model in `_diffusers_version`
|
| 548 |
+
- what components from which library are stored in the subfolders (`name` corresponds to the component and subfolder name, `library` corresponds to the name of the library to load the class from, and `class` corresponds to the class name)
|
| 549 |
+
|
| 550 |
+
```json
|
| 551 |
+
{
|
| 552 |
+
"_class_name": "StableDiffusionPipeline",
|
| 553 |
+
"_diffusers_version": "0.6.0",
|
| 554 |
+
"feature_extractor": [
|
| 555 |
+
"transformers",
|
| 556 |
+
"CLIPImageProcessor"
|
| 557 |
+
],
|
| 558 |
+
"safety_checker": [
|
| 559 |
+
"stable_diffusion",
|
| 560 |
+
"StableDiffusionSafetyChecker"
|
| 561 |
+
],
|
| 562 |
+
"scheduler": [
|
| 563 |
+
"diffusers",
|
| 564 |
+
"PNDMScheduler"
|
| 565 |
+
],
|
| 566 |
+
"text_encoder": [
|
| 567 |
+
"transformers",
|
| 568 |
+
"CLIPTextModel"
|
| 569 |
+
],
|
| 570 |
+
"tokenizer": [
|
| 571 |
+
"transformers",
|
| 572 |
+
"CLIPTokenizer"
|
| 573 |
+
],
|
| 574 |
+
"unet": [
|
| 575 |
+
"diffusers",
|
| 576 |
+
"UNet2DConditionModel"
|
| 577 |
+
],
|
| 578 |
+
"vae": [
|
| 579 |
+
"diffusers",
|
| 580 |
+
"AutoencoderKL"
|
| 581 |
+
]
|
| 582 |
+
}
|
| 583 |
+
```
|
diffusers/docs/source/en/using-diffusers/other-formats.md
ADDED
|
@@ -0,0 +1,512 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
| 2 |
+
|
| 3 |
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
| 4 |
+
the License. You may obtain a copy of the License at
|
| 5 |
+
|
| 6 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
| 7 |
+
|
| 8 |
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
| 9 |
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
| 10 |
+
specific language governing permissions and limitations under the License.
|
| 11 |
+
-->
|
| 12 |
+
|
| 13 |
+
# Model files and layouts
|
| 14 |
+
|
| 15 |
+
[[open-in-colab]]
|
| 16 |
+
|
| 17 |
+
Diffusion models are saved in various file types and organized in different layouts. Diffusers stores model weights as safetensors files in *Diffusers-multifolder* layout and it also supports loading files (like safetensors and ckpt files) from a *single-file* layout which is commonly used in the diffusion ecosystem.
|
| 18 |
+
|
| 19 |
+
Each layout has its own benefits and use cases, and this guide will show you how to load the different files and layouts, and how to convert them.
|
| 20 |
+
|
| 21 |
+
## Files
|
| 22 |
+
|
| 23 |
+
PyTorch model weights are typically saved with Python's [pickle](https://docs.python.org/3/library/pickle.html) utility as ckpt or bin files. However, pickle is not secure and pickled files may contain malicious code that can be executed. This vulnerability is a serious concern given the popularity of model sharing. To address this security issue, the [Safetensors](https://hf.co/docs/safetensors) library was developed as a secure alternative to pickle, which saves models as safetensors files.
|
| 24 |
+
|
| 25 |
+
### safetensors
|
| 26 |
+
|
| 27 |
+
> [!TIP]
|
| 28 |
+
> Learn more about the design decisions and why safetensor files are preferred for saving and loading model weights in the [Safetensors audited as really safe and becoming the default](https://blog.eleuther.ai/safetensors-security-audit/) blog post.
|
| 29 |
+
|
| 30 |
+
[Safetensors](https://hf.co/docs/safetensors) is a safe and fast file format for securely storing and loading tensors. Safetensors restricts the header size to limit certain types of attacks, supports lazy loading (useful for distributed setups), and has generally faster loading speeds.
|
| 31 |
+
|
| 32 |
+
Make sure you have the [Safetensors](https://hf.co/docs/safetensors) library installed.
|
| 33 |
+
|
| 34 |
+
```py
|
| 35 |
+
!pip install safetensors
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
Safetensors stores weights in a safetensors file. Diffusers loads safetensors files by default if they're available and the Safetensors library is installed. There are two ways safetensors files can be organized:
|
| 39 |
+
|
| 40 |
+
1. Diffusers-multifolder layout: there may be several separate safetensors files, one for each pipeline component (text encoder, UNet, VAE), organized in subfolders (check out the [stable-diffusion-v1-5/stable-diffusion-v1-5](https://hf.co/stable-diffusion-v1-5/stable-diffusion-v1-5/tree/main) repository as an example)
|
| 41 |
+
2. single-file layout: all the model weights may be saved in a single file (check out the [WarriorMama777/OrangeMixs](https://hf.co/WarriorMama777/OrangeMixs/tree/main/Models/AbyssOrangeMix) repository as an example)
|
| 42 |
+
|
| 43 |
+
<hfoptions id="safetensors">
|
| 44 |
+
<hfoption id="multifolder">
|
| 45 |
+
|
| 46 |
+
Use the [`~DiffusionPipeline.from_pretrained`] method to load a model with safetensors files stored in multiple folders.
|
| 47 |
+
|
| 48 |
+
```py
|
| 49 |
+
from diffusers import DiffusionPipeline
|
| 50 |
+
|
| 51 |
+
pipeline = DiffusionPipeline.from_pretrained(
|
| 52 |
+
"stable-diffusion-v1-5/stable-diffusion-v1-5",
|
| 53 |
+
use_safetensors=True
|
| 54 |
+
)
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
</hfoption>
|
| 58 |
+
<hfoption id="single file">
|
| 59 |
+
|
| 60 |
+
Use the [`~loaders.FromSingleFileMixin.from_single_file`] method to load a model with all the weights stored in a single safetensors file.
|
| 61 |
+
|
| 62 |
+
```py
|
| 63 |
+
from diffusers import StableDiffusionPipeline
|
| 64 |
+
|
| 65 |
+
pipeline = StableDiffusionPipeline.from_single_file(
|
| 66 |
+
"https://huggingface.co/WarriorMama777/OrangeMixs/blob/main/Models/AbyssOrangeMix/AbyssOrangeMix.safetensors"
|
| 67 |
+
)
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
</hfoption>
|
| 71 |
+
</hfoptions>
|
| 72 |
+
|
| 73 |
+
#### LoRAs
|
| 74 |
+
|
| 75 |
+
[LoRAs](../tutorials/using_peft_for_inference) are lightweight checkpoints fine-tuned to generate images or video in a specific style. If you are using a checkpoint trained with a Diffusers training script, the LoRA configuration is automatically saved as metadata in a safetensors file. When the safetensors file is loaded, the metadata is parsed to correctly configure the LoRA and avoids missing or incorrect LoRA configurations.
|
| 76 |
+
|
| 77 |
+
The easiest way to inspect the metadata, if available, is by clicking on the Safetensors logo next to the weights.
|
| 78 |
+
|
| 79 |
+
<div class="flex justify-center">
|
| 80 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/safetensors_lora.png"/>
|
| 81 |
+
</div>
|
| 82 |
+
|
| 83 |
+
For LoRAs that aren't trained with Diffusers, you can still save metadata with the `transformer_lora_adapter_metadata` and `text_encoder_lora_adapter_metadata` arguments in [`~loaders.FluxLoraLoaderMixin.save_lora_weights`] as long as it is a safetensors file.
|
| 84 |
+
|
| 85 |
+
```py
|
| 86 |
+
import torch
|
| 87 |
+
from diffusers import FluxPipeline
|
| 88 |
+
|
| 89 |
+
pipeline = FluxPipeline.from_pretrained(
|
| 90 |
+
"black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16
|
| 91 |
+
).to("cuda")
|
| 92 |
+
pipeline.load_lora_weights("linoyts/yarn_art_Flux_LoRA")
|
| 93 |
+
pipeline.save_lora_weights(
|
| 94 |
+
transformer_lora_adapter_metadata={"r": 16, "lora_alpha": 16},
|
| 95 |
+
text_encoder_lora_adapter_metadata={"r": 8, "lora_alpha": 8}
|
| 96 |
+
)
|
| 97 |
+
```
|
| 98 |
+
|
| 99 |
+
### ckpt
|
| 100 |
+
|
| 101 |
+
> [!WARNING]
|
| 102 |
+
> Pickled files may be unsafe because they can be exploited to execute malicious code. It is recommended to use safetensors files instead where possible, or convert the weights to safetensors files.
|
| 103 |
+
|
| 104 |
+
PyTorch's [torch.save](https://pytorch.org/docs/stable/generated/torch.save.html) function uses Python's [pickle](https://docs.python.org/3/library/pickle.html) utility to serialize and save models. These files are saved as a ckpt file and they contain the entire model's weights.
|
| 105 |
+
|
| 106 |
+
Use the [`~loaders.FromSingleFileMixin.from_single_file`] method to directly load a ckpt file.
|
| 107 |
+
|
| 108 |
+
```py
|
| 109 |
+
from diffusers import StableDiffusionPipeline
|
| 110 |
+
|
| 111 |
+
pipeline = StableDiffusionPipeline.from_single_file(
|
| 112 |
+
"https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/v1-5-pruned.ckpt"
|
| 113 |
+
)
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
## Storage layout
|
| 117 |
+
|
| 118 |
+
There are two ways model files are organized, either in a Diffusers-multifolder layout or in a single-file layout. The Diffusers-multifolder layout is the default, and each component file (text encoder, UNet, VAE) is stored in a separate subfolder. Diffusers also supports loading models from a single-file layout where all the components are bundled together.
|
| 119 |
+
|
| 120 |
+
### Diffusers-multifolder
|
| 121 |
+
|
| 122 |
+
The Diffusers-multifolder layout is the default storage layout for Diffusers. Each component's (text encoder, UNet, VAE) weights are stored in a separate subfolder. The weights can be stored as safetensors or ckpt files.
|
| 123 |
+
|
| 124 |
+
<div class="flex flex-row gap-4">
|
| 125 |
+
<div class="flex-1">
|
| 126 |
+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/multifolder-layout.png"/>
|
| 127 |
+
<figcaption class="mt-2 text-center text-sm text-gray-500">multifolder layout</figcaption>
|
| 128 |
+
</div>
|
| 129 |
+
<div class="flex-1">
|
| 130 |
+
<img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/multifolder-unet.png"/>
|
| 131 |
+
<figcaption class="mt-2 text-center text-sm text-gray-500">UNet subfolder</figcaption>
|
| 132 |
+
</div>
|
| 133 |
+
</div>
|
| 134 |
+
|
| 135 |
+
To load from Diffusers-multifolder layout, use the [`~DiffusionPipeline.from_pretrained`] method.
|
| 136 |
+
|
| 137 |
+
```py
|
| 138 |
+
from diffusers import DiffusionPipeline
|
| 139 |
+
|
| 140 |
+
pipeline = DiffusionPipeline.from_pretrained(
|
| 141 |
+
"stabilityai/stable-diffusion-xl-base-1.0",
|
| 142 |
+
torch_dtype=torch.float16,
|
| 143 |
+
variant="fp16",
|
| 144 |
+
use_safetensors=True,
|
| 145 |
+
).to("cuda")
|
| 146 |
+
```
|
| 147 |
+
|
| 148 |
+
Benefits of using the Diffusers-multifolder layout include:
|
| 149 |
+
|
| 150 |
+
1. Faster to load each component file individually or in parallel.
|
| 151 |
+
2. Reduced memory usage because you only load the components you need. For example, models like [SDXL Turbo](https://hf.co/stabilityai/sdxl-turbo), [SDXL Lightning](https://hf.co/ByteDance/SDXL-Lightning), and [Hyper-SD](https://hf.co/ByteDance/Hyper-SD) have the same components except for the UNet. You can reuse their shared components with the [`~DiffusionPipeline.from_pipe`] method without consuming any additional memory (take a look at the [Reuse a pipeline](./loading#reuse-a-pipeline) guide) and only load the UNet. This way, you don't need to download redundant components and unnecessarily use more memory.
|
| 152 |
+
|
| 153 |
+
```py
|
| 154 |
+
import torch
|
| 155 |
+
from diffusers import StableDiffusionXLPipeline, UNet2DConditionModel, EulerDiscreteScheduler
|
| 156 |
+
|
| 157 |
+
# download one model
|
| 158 |
+
sdxl_pipeline = StableDiffusionXLPipeline.from_pretrained(
|
| 159 |
+
"stabilityai/stable-diffusion-xl-base-1.0",
|
| 160 |
+
torch_dtype=torch.float16,
|
| 161 |
+
variant="fp16",
|
| 162 |
+
use_safetensors=True,
|
| 163 |
+
).to("cuda")
|
| 164 |
+
|
| 165 |
+
# switch UNet for another model
|
| 166 |
+
unet = UNet2DConditionModel.from_pretrained(
|
| 167 |
+
"stabilityai/sdxl-turbo",
|
| 168 |
+
subfolder="unet",
|
| 169 |
+
torch_dtype=torch.float16,
|
| 170 |
+
variant="fp16",
|
| 171 |
+
use_safetensors=True
|
| 172 |
+
)
|
| 173 |
+
# reuse all the same components in new model except for the UNet
|
| 174 |
+
turbo_pipeline = StableDiffusionXLPipeline.from_pipe(
|
| 175 |
+
sdxl_pipeline, unet=unet,
|
| 176 |
+
).to("cuda")
|
| 177 |
+
turbo_pipeline.scheduler = EulerDiscreteScheduler.from_config(
|
| 178 |
+
turbo_pipeline.scheduler.config,
|
| 179 |
+
timestep+spacing="trailing"
|
| 180 |
+
)
|
| 181 |
+
image = turbo_pipeline(
|
| 182 |
+
"an astronaut riding a unicorn on mars",
|
| 183 |
+
num_inference_steps=1,
|
| 184 |
+
guidance_scale=0.0,
|
| 185 |
+
).images[0]
|
| 186 |
+
image
|
| 187 |
+
```
|
| 188 |
+
|
| 189 |
+
3. Reduced storage requirements because if a component, such as the SDXL [VAE](https://hf.co/madebyollin/sdxl-vae-fp16-fix), is shared across multiple models, you only need to download and store a single copy of it instead of downloading and storing it multiple times. For 10 SDXL models, this can save ~3.5GB of storage. The storage savings is even greater for newer models like PixArt Sigma, where the [text encoder](https://hf.co/PixArt-alpha/PixArt-Sigma-XL-2-1024-MS/tree/main/text_encoder) alone is ~19GB!
|
| 190 |
+
4. Flexibility to replace a component in the model with a newer or better version.
|
| 191 |
+
|
| 192 |
+
```py
|
| 193 |
+
from diffusers import DiffusionPipeline, AutoencoderKL
|
| 194 |
+
|
| 195 |
+
vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16, use_safetensors=True)
|
| 196 |
+
pipeline = DiffusionPipeline.from_pretrained(
|
| 197 |
+
"stabilityai/stable-diffusion-xl-base-1.0",
|
| 198 |
+
vae=vae,
|
| 199 |
+
torch_dtype=torch.float16,
|
| 200 |
+
variant="fp16",
|
| 201 |
+
use_safetensors=True,
|
| 202 |
+
).to("cuda")
|
| 203 |
+
```
|
| 204 |
+
|
| 205 |
+
5. More visibility and information about a model's components, which are stored in a [config.json](https://hf.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/unet/config.json) file in each component subfolder.
|
| 206 |
+
|
| 207 |
+
### Single-file
|
| 208 |
+
|
| 209 |
+
The single-file layout stores all the model weights in a single file. All the model components (text encoder, UNet, VAE) weights are kept together instead of separately in subfolders. This can be a safetensors or ckpt file.
|
| 210 |
+
|
| 211 |
+
<div class="flex justify-center">
|
| 212 |
+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/single-file-layout.png"/>
|
| 213 |
+
</div>
|
| 214 |
+
|
| 215 |
+
To load from a single-file layout, use the [`~loaders.FromSingleFileMixin.from_single_file`] method.
|
| 216 |
+
|
| 217 |
+
```py
|
| 218 |
+
import torch
|
| 219 |
+
from diffusers import StableDiffusionXLPipeline
|
| 220 |
+
|
| 221 |
+
pipeline = StableDiffusionXLPipeline.from_single_file(
|
| 222 |
+
"https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0.safetensors",
|
| 223 |
+
torch_dtype=torch.float16,
|
| 224 |
+
variant="fp16",
|
| 225 |
+
use_safetensors=True,
|
| 226 |
+
).to("cuda")
|
| 227 |
+
```
|
| 228 |
+
|
| 229 |
+
Benefits of using a single-file layout include:
|
| 230 |
+
|
| 231 |
+
1. Easy compatibility with diffusion interfaces such as [ComfyUI](https://github.com/comfyanonymous/ComfyUI) or [Automatic1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui) which commonly use a single-file layout.
|
| 232 |
+
2. Easier to manage (download and share) a single file.
|
| 233 |
+
|
| 234 |
+
### DDUF
|
| 235 |
+
|
| 236 |
+
> [!WARNING]
|
| 237 |
+
> DDUF is an experimental file format and APIs related to it can change in the future.
|
| 238 |
+
|
| 239 |
+
DDUF (**D**DUF **D**iffusion **U**nified **F**ormat) is a file format designed to make storing, distributing, and using diffusion models much easier. Built on the ZIP file format, DDUF offers a standardized, efficient, and flexible way to package all parts of a diffusion model into a single, easy-to-manage file. It provides a balance between Diffusers multi-folder format and the widely popular single-file format.
|
| 240 |
+
|
| 241 |
+
Learn more details about DDUF on the Hugging Face Hub [documentation](https://huggingface.co/docs/hub/dduf).
|
| 242 |
+
|
| 243 |
+
Pass a checkpoint to the `dduf_file` parameter to load it in [`DiffusionPipeline`].
|
| 244 |
+
|
| 245 |
+
```py
|
| 246 |
+
from diffusers import DiffusionPipeline
|
| 247 |
+
import torch
|
| 248 |
+
|
| 249 |
+
pipe = DiffusionPipeline.from_pretrained(
|
| 250 |
+
"DDUF/FLUX.1-dev-DDUF", dduf_file="FLUX.1-dev.dduf", torch_dtype=torch.bfloat16
|
| 251 |
+
).to("cuda")
|
| 252 |
+
image = pipe(
|
| 253 |
+
"photo a cat holding a sign that says Diffusers", num_inference_steps=50, guidance_scale=3.5
|
| 254 |
+
).images[0]
|
| 255 |
+
image.save("cat.png")
|
| 256 |
+
```
|
| 257 |
+
|
| 258 |
+
To save a pipeline as a `.dduf` checkpoint, use the [`~huggingface_hub.export_folder_as_dduf`] utility, which takes care of all the necessary file-level validations.
|
| 259 |
+
|
| 260 |
+
```py
|
| 261 |
+
from huggingface_hub import export_folder_as_dduf
|
| 262 |
+
from diffusers import DiffusionPipeline
|
| 263 |
+
import torch
|
| 264 |
+
|
| 265 |
+
pipe = DiffusionPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
|
| 266 |
+
|
| 267 |
+
save_folder = "flux-dev"
|
| 268 |
+
pipe.save_pretrained("flux-dev")
|
| 269 |
+
export_folder_as_dduf("flux-dev.dduf", folder_path=save_folder)
|
| 270 |
+
|
| 271 |
+
> [!TIP]
|
| 272 |
+
> Packaging and loading quantized checkpoints in the DDUF format is supported as long as they respect the multi-folder structure.
|
| 273 |
+
|
| 274 |
+
## Convert layout and files
|
| 275 |
+
|
| 276 |
+
Diffusers provides many scripts and methods to convert storage layouts and file formats to enable broader support across the diffusion ecosystem.
|
| 277 |
+
|
| 278 |
+
Take a look at the [diffusers/scripts](https://github.com/huggingface/diffusers/tree/main/scripts) collection to find a script that fits your conversion needs.
|
| 279 |
+
|
| 280 |
+
> [!TIP]
|
| 281 |
+
> Scripts that have "`to_diffusers`" appended at the end mean they convert a model to the Diffusers-multifolder layout. Each script has their own specific set of arguments for configuring the conversion, so make sure you check what arguments are available!
|
| 282 |
+
|
| 283 |
+
For example, to convert a Stable Diffusion XL model stored in Diffusers-multifolder layout to a single-file layout, run the [convert_diffusers_to_original_sdxl.py](https://github.com/huggingface/diffusers/blob/main/scripts/convert_diffusers_to_original_sdxl.py) script. Provide the path to the model to convert, and the path to save the converted model to. You can optionally specify whether you want to save the model as a safetensors file and whether to save the model in half-precision.
|
| 284 |
+
|
| 285 |
+
```bash
|
| 286 |
+
python convert_diffusers_to_original_sdxl.py --model_path path/to/model/to/convert --checkpoint_path path/to/save/model/to --use_safetensors
|
| 287 |
+
```
|
| 288 |
+
|
| 289 |
+
You can also save a model to Diffusers-multifolder layout with the [`~DiffusionPipeline.save_pretrained`] method. This creates a directory for you if it doesn't already exist, and it also saves the files as a safetensors file by default.
|
| 290 |
+
|
| 291 |
+
```py
|
| 292 |
+
from diffusers import StableDiffusionXLPipeline
|
| 293 |
+
|
| 294 |
+
pipeline = StableDiffusionXLPipeline.from_single_file(
|
| 295 |
+
"https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0.safetensors",
|
| 296 |
+
)
|
| 297 |
+
pipeline.save_pretrained()
|
| 298 |
+
```
|
| 299 |
+
|
| 300 |
+
Lastly, there are also Spaces, such as [SD To Diffusers](https://hf.co/spaces/diffusers/sd-to-diffusers) and [SD-XL To Diffusers](https://hf.co/spaces/diffusers/sdxl-to-diffusers), that provide a more user-friendly interface for converting models to Diffusers-multifolder layout. This is the easiest and most convenient option for converting layouts, and it'll open a PR on your model repository with the converted files. However, this option is not as reliable as running a script, and the Space may fail for more complicated models.
|
| 301 |
+
|
| 302 |
+
## Single-file layout usage
|
| 303 |
+
|
| 304 |
+
Now that you're familiar with the differences between the Diffusers-multifolder and single-file layout, this section shows you how to load models and pipeline components, customize configuration options for loading, and load local files with the [`~loaders.FromSingleFileMixin.from_single_file`] method.
|
| 305 |
+
|
| 306 |
+
### Load a pipeline or model
|
| 307 |
+
|
| 308 |
+
Pass the file path of the pipeline or model to the [`~loaders.FromSingleFileMixin.from_single_file`] method to load it.
|
| 309 |
+
|
| 310 |
+
<hfoptions id="pipeline-model">
|
| 311 |
+
<hfoption id="pipeline">
|
| 312 |
+
|
| 313 |
+
```py
|
| 314 |
+
from diffusers import StableDiffusionXLPipeline
|
| 315 |
+
|
| 316 |
+
ckpt_path = "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0_0.9vae.safetensors"
|
| 317 |
+
pipeline = StableDiffusionXLPipeline.from_single_file(ckpt_path)
|
| 318 |
+
```
|
| 319 |
+
|
| 320 |
+
</hfoption>
|
| 321 |
+
<hfoption id="model">
|
| 322 |
+
|
| 323 |
+
```py
|
| 324 |
+
from diffusers import StableCascadeUNet
|
| 325 |
+
|
| 326 |
+
ckpt_path = "https://huggingface.co/stabilityai/stable-cascade/blob/main/stage_b_lite.safetensors"
|
| 327 |
+
model = StableCascadeUNet.from_single_file(ckpt_path)
|
| 328 |
+
```
|
| 329 |
+
|
| 330 |
+
</hfoption>
|
| 331 |
+
</hfoptions>
|
| 332 |
+
|
| 333 |
+
Customize components in the pipeline by passing them directly to the [`~loaders.FromSingleFileMixin.from_single_file`] method. For example, you can use a different scheduler in a pipeline.
|
| 334 |
+
|
| 335 |
+
```py
|
| 336 |
+
from diffusers import StableDiffusionXLPipeline, DDIMScheduler
|
| 337 |
+
|
| 338 |
+
ckpt_path = "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0_0.9vae.safetensors"
|
| 339 |
+
scheduler = DDIMScheduler()
|
| 340 |
+
pipeline = StableDiffusionXLPipeline.from_single_file(ckpt_path, scheduler=scheduler)
|
| 341 |
+
```
|
| 342 |
+
|
| 343 |
+
Or you could use a ControlNet model in the pipeline.
|
| 344 |
+
|
| 345 |
+
```py
|
| 346 |
+
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
|
| 347 |
+
|
| 348 |
+
ckpt_path = "https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/v1-5-pruned-emaonly.safetensors"
|
| 349 |
+
controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_canny")
|
| 350 |
+
pipeline = StableDiffusionControlNetPipeline.from_single_file(ckpt_path, controlnet=controlnet)
|
| 351 |
+
```
|
| 352 |
+
|
| 353 |
+
### Customize configuration options
|
| 354 |
+
|
| 355 |
+
Models have a configuration file that define their attributes like the number of inputs in a UNet. Pipelines configuration options are available in the pipeline's class. For example, if you look at the [`StableDiffusionXLInstructPix2PixPipeline`] class, there is an option to scale the image latents with the `is_cosxl_edit` parameter.
|
| 356 |
+
|
| 357 |
+
These configuration files can be found in the models Hub repository or another location from which the configuration file originated (for example, a GitHub repository or locally on your device).
|
| 358 |
+
|
| 359 |
+
<hfoptions id="config-file">
|
| 360 |
+
<hfoption id="Hub configuration file">
|
| 361 |
+
|
| 362 |
+
> [!TIP]
|
| 363 |
+
> The [`~loaders.FromSingleFileMixin.from_single_file`] method automatically maps the checkpoint to the appropriate model repository, but there are cases where it is useful to use the `config` parameter. For example, if the model components in the checkpoint are different from the original checkpoint or if a checkpoint doesn't have the necessary metadata to correctly determine the configuration to use for the pipeline.
|
| 364 |
+
|
| 365 |
+
The [`~loaders.FromSingleFileMixin.from_single_file`] method automatically determines the configuration to use from the configuration file in the model repository. You could also explicitly specify the configuration to use by providing the repository id to the `config` parameter.
|
| 366 |
+
|
| 367 |
+
```py
|
| 368 |
+
from diffusers import StableDiffusionXLPipeline
|
| 369 |
+
|
| 370 |
+
ckpt_path = "https://huggingface.co/segmind/SSD-1B/blob/main/SSD-1B.safetensors"
|
| 371 |
+
repo_id = "segmind/SSD-1B"
|
| 372 |
+
|
| 373 |
+
pipeline = StableDiffusionXLPipeline.from_single_file(ckpt_path, config=repo_id)
|
| 374 |
+
```
|
| 375 |
+
|
| 376 |
+
The model loads the configuration file for the [UNet](https://huggingface.co/segmind/SSD-1B/blob/main/unet/config.json), [VAE](https://huggingface.co/segmind/SSD-1B/blob/main/vae/config.json), and [text encoder](https://huggingface.co/segmind/SSD-1B/blob/main/text_encoder/config.json) from their respective subfolders in the repository.
|
| 377 |
+
|
| 378 |
+
</hfoption>
|
| 379 |
+
<hfoption id="original configuration file">
|
| 380 |
+
|
| 381 |
+
The [`~loaders.FromSingleFileMixin.from_single_file`] method can also load the original configuration file of a pipeline that is stored elsewhere. Pass a local path or URL of the original configuration file to the `original_config` parameter.
|
| 382 |
+
|
| 383 |
+
```py
|
| 384 |
+
from diffusers import StableDiffusionXLPipeline
|
| 385 |
+
|
| 386 |
+
ckpt_path = "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0_0.9vae.safetensors"
|
| 387 |
+
original_config = "https://raw.githubusercontent.com/Stability-AI/generative-models/main/configs/inference/sd_xl_base.yaml"
|
| 388 |
+
|
| 389 |
+
pipeline = StableDiffusionXLPipeline.from_single_file(ckpt_path, original_config=original_config)
|
| 390 |
+
```
|
| 391 |
+
|
| 392 |
+
> [!TIP]
|
| 393 |
+
> Diffusers attempts to infer the pipeline components based on the type signatures of the pipeline class when you use `original_config` with `local_files_only=True`, instead of fetching the configuration files from the model repository on the Hub. This prevents backward breaking changes in code that can't connect to the internet to fetch the necessary configuration files.
|
| 394 |
+
>
|
| 395 |
+
> This is not as reliable as providing a path to a local model repository with the `config` parameter, and might lead to errors during pipeline configuration. To avoid errors, run the pipeline with `local_files_only=False` once to download the appropriate pipeline configuration files to the local cache.
|
| 396 |
+
|
| 397 |
+
</hfoption>
|
| 398 |
+
</hfoptions>
|
| 399 |
+
|
| 400 |
+
While the configuration files specify the pipeline or models default parameters, you can override them by providing the parameters directly to the [`~loaders.FromSingleFileMixin.from_single_file`] method. Any parameter supported by the model or pipeline class can be configured in this way.
|
| 401 |
+
|
| 402 |
+
<hfoptions id="override">
|
| 403 |
+
<hfoption id="pipeline">
|
| 404 |
+
|
| 405 |
+
For example, to scale the image latents in [`StableDiffusionXLInstructPix2PixPipeline`] pass the `is_cosxl_edit` parameter.
|
| 406 |
+
|
| 407 |
+
```python
|
| 408 |
+
from diffusers import StableDiffusionXLInstructPix2PixPipeline
|
| 409 |
+
|
| 410 |
+
ckpt_path = "https://huggingface.co/stabilityai/cosxl/blob/main/cosxl_edit.safetensors"
|
| 411 |
+
pipeline = StableDiffusionXLInstructPix2PixPipeline.from_single_file(ckpt_path, config="diffusers/sdxl-instructpix2pix-768", is_cosxl_edit=True)
|
| 412 |
+
```
|
| 413 |
+
|
| 414 |
+
</hfoption>
|
| 415 |
+
<hfoption id="model">
|
| 416 |
+
|
| 417 |
+
For example, to upcast the attention dimensions in a [`UNet2DConditionModel`] pass the `upcast_attention` parameter.
|
| 418 |
+
|
| 419 |
+
```python
|
| 420 |
+
from diffusers import UNet2DConditionModel
|
| 421 |
+
|
| 422 |
+
ckpt_path = "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0_0.9vae.safetensors"
|
| 423 |
+
model = UNet2DConditionModel.from_single_file(ckpt_path, upcast_attention=True)
|
| 424 |
+
```
|
| 425 |
+
|
| 426 |
+
</hfoption>
|
| 427 |
+
</hfoptions>
|
| 428 |
+
|
| 429 |
+
### Local files
|
| 430 |
+
|
| 431 |
+
In Diffusers>=v0.28.0, the [`~loaders.FromSingleFileMixin.from_single_file`] method attempts to configure a pipeline or model by inferring the model type from the keys in the checkpoint file. The inferred model type is used to determine the appropriate model repository on the Hugging Face Hub to configure the model or pipeline.
|
| 432 |
+
|
| 433 |
+
For example, any single file checkpoint based on the Stable Diffusion XL base model will use the [stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) model repository to configure the pipeline.
|
| 434 |
+
|
| 435 |
+
But if you're working in an environment with restricted internet access, you should download the configuration files with the [`~huggingface_hub.snapshot_download`] function, and the model checkpoint with the [`~huggingface_hub.hf_hub_download`] function. By default, these files are downloaded to the Hugging Face Hub [cache directory](https://huggingface.co/docs/huggingface_hub/en/guides/manage-cache), but you can specify a preferred directory to download the files to with the `local_dir` parameter.
|
| 436 |
+
|
| 437 |
+
Pass the configuration and checkpoint paths to the [`~loaders.FromSingleFileMixin.from_single_file`] method to load locally.
|
| 438 |
+
|
| 439 |
+
<hfoptions id="local">
|
| 440 |
+
<hfoption id="Hub cache directory">
|
| 441 |
+
|
| 442 |
+
```python
|
| 443 |
+
from huggingface_hub import hf_hub_download, snapshot_download
|
| 444 |
+
|
| 445 |
+
my_local_checkpoint_path = hf_hub_download(
|
| 446 |
+
repo_id="segmind/SSD-1B",
|
| 447 |
+
filename="SSD-1B.safetensors"
|
| 448 |
+
)
|
| 449 |
+
|
| 450 |
+
my_local_config_path = snapshot_download(
|
| 451 |
+
repo_id="segmind/SSD-1B",
|
| 452 |
+
allow_patterns=["*.json", "**/*.json", "*.txt", "**/*.txt"]
|
| 453 |
+
)
|
| 454 |
+
|
| 455 |
+
pipeline = StableDiffusionXLPipeline.from_single_file(my_local_checkpoint_path, config=my_local_config_path, local_files_only=True)
|
| 456 |
+
```
|
| 457 |
+
|
| 458 |
+
</hfoption>
|
| 459 |
+
<hfoption id="specific local directory">
|
| 460 |
+
|
| 461 |
+
```python
|
| 462 |
+
from huggingface_hub import hf_hub_download, snapshot_download
|
| 463 |
+
|
| 464 |
+
my_local_checkpoint_path = hf_hub_download(
|
| 465 |
+
repo_id="segmind/SSD-1B",
|
| 466 |
+
filename="SSD-1B.safetensors"
|
| 467 |
+
local_dir="my_local_checkpoints"
|
| 468 |
+
)
|
| 469 |
+
|
| 470 |
+
my_local_config_path = snapshot_download(
|
| 471 |
+
repo_id="segmind/SSD-1B",
|
| 472 |
+
allow_patterns=["*.json", "**/*.json", "*.txt", "**/*.txt"]
|
| 473 |
+
local_dir="my_local_config"
|
| 474 |
+
)
|
| 475 |
+
|
| 476 |
+
pipeline = StableDiffusionXLPipeline.from_single_file(my_local_checkpoint_path, config=my_local_config_path, local_files_only=True)
|
| 477 |
+
```
|
| 478 |
+
|
| 479 |
+
</hfoption>
|
| 480 |
+
</hfoptions>
|
| 481 |
+
|
| 482 |
+
#### Local files without symlink
|
| 483 |
+
|
| 484 |
+
> [!TIP]
|
| 485 |
+
> In huggingface_hub>=v0.23.0, the `local_dir_use_symlinks` argument isn't necessary for the [`~huggingface_hub.hf_hub_download`] and [`~huggingface_hub.snapshot_download`] functions.
|
| 486 |
+
|
| 487 |
+
The [`~loaders.FromSingleFileMixin.from_single_file`] method relies on the [huggingface_hub](https://hf.co/docs/huggingface_hub/index) caching mechanism to fetch and store checkpoints and configuration files for models and pipelines. If you're working with a file system that does not support symlinking, you should download the checkpoint file to a local directory first, and disable symlinking with the `local_dir_use_symlink=False` parameter in the [`~huggingface_hub.hf_hub_download`] function and [`~huggingface_hub.snapshot_download`] functions.
|
| 488 |
+
|
| 489 |
+
```python
|
| 490 |
+
from huggingface_hub import hf_hub_download, snapshot_download
|
| 491 |
+
|
| 492 |
+
my_local_checkpoint_path = hf_hub_download(
|
| 493 |
+
repo_id="segmind/SSD-1B",
|
| 494 |
+
filename="SSD-1B.safetensors"
|
| 495 |
+
local_dir="my_local_checkpoints",
|
| 496 |
+
local_dir_use_symlinks=False
|
| 497 |
+
)
|
| 498 |
+
print("My local checkpoint: ", my_local_checkpoint_path)
|
| 499 |
+
|
| 500 |
+
my_local_config_path = snapshot_download(
|
| 501 |
+
repo_id="segmind/SSD-1B",
|
| 502 |
+
allow_patterns=["*.json", "**/*.json", "*.txt", "**/*.txt"]
|
| 503 |
+
local_dir_use_symlinks=False,
|
| 504 |
+
)
|
| 505 |
+
print("My local config: ", my_local_config_path)
|
| 506 |
+
```
|
| 507 |
+
|
| 508 |
+
Then you can pass the local paths to the `pretrained_model_link_or_path` and `config` parameters.
|
| 509 |
+
|
| 510 |
+
```python
|
| 511 |
+
pipeline = StableDiffusionXLPipeline.from_single_file(my_local_checkpoint_path, config=my_local_config_path, local_files_only=True)
|
| 512 |
+
```
|