lsmpp commited on
Commit
7c445e6
·
verified ·
1 Parent(s): e31d39e

Add files using upload-large-folder tool

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. diffusers/.github/ISSUE_TEMPLATE/bug-report.yml +110 -0
  2. diffusers/.github/ISSUE_TEMPLATE/feature_request.md +20 -0
  3. diffusers/.github/ISSUE_TEMPLATE/new-model-addition.yml +31 -0
  4. diffusers/.github/ISSUE_TEMPLATE/remote-vae-pilot-feedback.yml +38 -0
  5. diffusers/.github/actions/setup-miniconda/action.yml +146 -0
  6. diffusers/.github/workflows/benchmark.yml +89 -0
  7. diffusers/.github/workflows/build_docker_images.yml +107 -0
  8. diffusers/.github/workflows/build_documentation.yml +27 -0
  9. diffusers/.github/workflows/build_pr_documentation.yml +23 -0
  10. diffusers/.github/workflows/mirror_community_pipeline.yml +102 -0
  11. diffusers/.github/workflows/nightly_tests.yml +612 -0
  12. diffusers/.github/workflows/notify_slack_about_release.yml +23 -0
  13. diffusers/.github/workflows/pr_dependency_test.yml +35 -0
  14. diffusers/.github/workflows/pr_flax_dependency_test.yml +38 -0
  15. diffusers/.github/workflows/pr_style_bot.yml +17 -0
  16. diffusers/.github/workflows/pr_test_fetcher.yml +177 -0
  17. diffusers/.github/workflows/pr_tests.yml +289 -0
  18. diffusers/.github/workflows/pr_tests_gpu.yml +296 -0
  19. diffusers/.github/workflows/pr_torch_dependency_test.yml +36 -0
  20. diffusers/.github/workflows/push_tests.yml +294 -0
  21. diffusers/.github/workflows/push_tests_fast.yml +98 -0
  22. diffusers/.github/workflows/push_tests_mps.yml +71 -0
  23. diffusers/.github/workflows/pypi_publish.yaml +81 -0
  24. diffusers/.github/workflows/release_tests_fast.yml +351 -0
  25. diffusers/.github/workflows/run_tests_from_a_pr.yml +74 -0
  26. diffusers/.github/workflows/ssh-pr-runner.yml +40 -0
  27. diffusers/.github/workflows/ssh-runner.yml +52 -0
  28. diffusers/.github/workflows/stale.yml +30 -0
  29. diffusers/.github/workflows/trufflehog.yml +18 -0
  30. diffusers/.github/workflows/typos.yml +14 -0
  31. diffusers/.github/workflows/update_metadata.yml +30 -0
  32. diffusers/.github/workflows/upload_pr_documentation.yml +16 -0
  33. diffusers/docs/source/_config.py +9 -0
  34. diffusers/docs/source/en/_toctree.yml +701 -0
  35. diffusers/docs/source/en/community_projects.md +90 -0
  36. diffusers/docs/source/en/conceptual/contribution.md +568 -0
  37. diffusers/docs/source/en/conceptual/ethical_guidelines.md +63 -0
  38. diffusers/docs/source/en/conceptual/evaluation.md +578 -0
  39. diffusers/docs/source/en/conceptual/philosophy.md +110 -0
  40. diffusers/docs/source/en/index.md +48 -0
  41. diffusers/docs/source/en/installation.md +194 -0
  42. diffusers/docs/source/en/quicktour.md +323 -0
  43. diffusers/docs/source/en/stable_diffusion.md +261 -0
  44. diffusers/docs/source/en/using-diffusers/conditional_image_generation.md +316 -0
  45. diffusers/docs/source/en/using-diffusers/consisid.md +96 -0
  46. diffusers/docs/source/en/using-diffusers/controlling_generation.md +217 -0
  47. diffusers/docs/source/en/using-diffusers/depth2img.md +46 -0
  48. diffusers/docs/source/en/using-diffusers/ip_adapter.md +790 -0
  49. diffusers/docs/source/en/using-diffusers/loading.md +583 -0
  50. diffusers/docs/source/en/using-diffusers/other-formats.md +512 -0
diffusers/.github/ISSUE_TEMPLATE/bug-report.yml ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: "\U0001F41B Bug Report"
2
+ description: Report a bug on Diffusers
3
+ labels: [ "bug" ]
4
+ body:
5
+ - type: markdown
6
+ attributes:
7
+ value: |
8
+ Thanks a lot for taking the time to file this issue 🤗.
9
+ Issues do not only help to improve the library, but also publicly document common problems, questions, workflows for the whole community!
10
+ Thus, issues are of the same importance as pull requests when contributing to this library ❤️.
11
+ In order to make your issue as **useful for the community as possible**, let's try to stick to some simple guidelines:
12
+ - 1. Please try to be as precise and concise as possible.
13
+ *Give your issue a fitting title. Assume that someone which very limited knowledge of Diffusers can understand your issue. Add links to the source code, documentation other issues, pull requests etc...*
14
+ - 2. If your issue is about something not working, **always** provide a reproducible code snippet. The reader should be able to reproduce your issue by **only copy-pasting your code snippet into a Python shell**.
15
+ *The community cannot solve your issue if it cannot reproduce it. If your bug is related to training, add your training script and make everything needed to train public. Otherwise, just add a simple Python code snippet.*
16
+ - 3. Add the **minimum** amount of code / context that is needed to understand, reproduce your issue.
17
+ *Make the life of maintainers easy. `diffusers` is getting many issues every day. Make sure your issue is about one bug and one bug only. Make sure you add only the context, code needed to understand your issues - nothing more. Generally, every issue is a way of documenting this library, try to make it a good documentation entry.*
18
+ - 4. For issues related to community pipelines (i.e., the pipelines located in the `examples/community` folder), please tag the author of the pipeline in your issue thread as those pipelines are not maintained.
19
+ - type: markdown
20
+ attributes:
21
+ value: |
22
+ For more in-detail information on how to write good issues you can have a look [here](https://huggingface.co/course/chapter8/5?fw=pt).
23
+ - type: textarea
24
+ id: bug-description
25
+ attributes:
26
+ label: Describe the bug
27
+ description: A clear and concise description of what the bug is. If you intend to submit a pull request for this issue, tell us in the description. Thanks!
28
+ placeholder: Bug description
29
+ validations:
30
+ required: true
31
+ - type: textarea
32
+ id: reproduction
33
+ attributes:
34
+ label: Reproduction
35
+ description: Please provide a minimal reproducible code which we can copy/paste and reproduce the issue.
36
+ placeholder: Reproduction
37
+ validations:
38
+ required: true
39
+ - type: textarea
40
+ id: logs
41
+ attributes:
42
+ label: Logs
43
+ description: "Please include the Python logs if you can."
44
+ render: shell
45
+ - type: textarea
46
+ id: system-info
47
+ attributes:
48
+ label: System Info
49
+ description: Please share your system info with us. You can run the command `diffusers-cli env` and copy-paste its output below.
50
+ placeholder: Diffusers version, platform, Python version, ...
51
+ validations:
52
+ required: true
53
+ - type: textarea
54
+ id: who-can-help
55
+ attributes:
56
+ label: Who can help?
57
+ description: |
58
+ Your issue will be replied to more quickly if you can figure out the right person to tag with @.
59
+ If you know how to use git blame, that is the easiest way, otherwise, here is a rough guide of **who to tag**.
60
+
61
+ All issues are read by one of the core maintainers, so if you don't know who to tag, just leave this blank and
62
+ a core maintainer will ping the right person.
63
+
64
+ Please tag a maximum of 2 people.
65
+
66
+ Questions on DiffusionPipeline (Saving, Loading, From pretrained, ...): @sayakpaul @DN6
67
+
68
+ Questions on pipelines:
69
+ - Stable Diffusion @yiyixuxu @asomoza
70
+ - Stable Diffusion XL @yiyixuxu @sayakpaul @DN6
71
+ - Stable Diffusion 3: @yiyixuxu @sayakpaul @DN6 @asomoza
72
+ - Kandinsky @yiyixuxu
73
+ - ControlNet @sayakpaul @yiyixuxu @DN6
74
+ - T2I Adapter @sayakpaul @yiyixuxu @DN6
75
+ - IF @DN6
76
+ - Text-to-Video / Video-to-Video @DN6 @a-r-r-o-w
77
+ - Wuerstchen @DN6
78
+ - Other: @yiyixuxu @DN6
79
+ - Improving generation quality: @asomoza
80
+
81
+ Questions on models:
82
+ - UNet @DN6 @yiyixuxu @sayakpaul
83
+ - VAE @sayakpaul @DN6 @yiyixuxu
84
+ - Transformers/Attention @DN6 @yiyixuxu @sayakpaul
85
+
86
+ Questions on single file checkpoints: @DN6
87
+
88
+ Questions on Schedulers: @yiyixuxu
89
+
90
+ Questions on LoRA: @sayakpaul
91
+
92
+ Questions on Textual Inversion: @sayakpaul
93
+
94
+ Questions on Training:
95
+ - DreamBooth @sayakpaul
96
+ - Text-to-Image Fine-tuning @sayakpaul
97
+ - Textual Inversion @sayakpaul
98
+ - ControlNet @sayakpaul
99
+
100
+ Questions on Tests: @DN6 @sayakpaul @yiyixuxu
101
+
102
+ Questions on Documentation: @stevhliu
103
+
104
+ Questions on JAX- and MPS-related things: @pcuenca
105
+
106
+ Questions on audio pipelines: @sanchit-gandhi
107
+
108
+
109
+
110
+ placeholder: "@Username ..."
diffusers/.github/ISSUE_TEMPLATE/feature_request.md ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: "\U0001F680 Feature Request"
3
+ about: Suggest an idea for this project
4
+ title: ''
5
+ labels: ''
6
+ assignees: ''
7
+
8
+ ---
9
+
10
+ **Is your feature request related to a problem? Please describe.**
11
+ A clear and concise description of what the problem is. Ex. I'm always frustrated when [...].
12
+
13
+ **Describe the solution you'd like.**
14
+ A clear and concise description of what you want to happen.
15
+
16
+ **Describe alternatives you've considered.**
17
+ A clear and concise description of any alternative solutions or features you've considered.
18
+
19
+ **Additional context.**
20
+ Add any other context or screenshots about the feature request here.
diffusers/.github/ISSUE_TEMPLATE/new-model-addition.yml ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: "\U0001F31F New Model/Pipeline/Scheduler Addition"
2
+ description: Submit a proposal/request to implement a new diffusion model/pipeline/scheduler
3
+ labels: [ "New model/pipeline/scheduler" ]
4
+
5
+ body:
6
+ - type: textarea
7
+ id: description-request
8
+ validations:
9
+ required: true
10
+ attributes:
11
+ label: Model/Pipeline/Scheduler description
12
+ description: |
13
+ Put any and all important information relative to the model/pipeline/scheduler
14
+
15
+ - type: checkboxes
16
+ id: information-tasks
17
+ attributes:
18
+ label: Open source status
19
+ description: |
20
+ Please note that if the model implementation isn't available or if the weights aren't open-source, we are less likely to implement it in `diffusers`.
21
+ options:
22
+ - label: "The model implementation is available."
23
+ - label: "The model weights are available (Only relevant if addition is not a scheduler)."
24
+
25
+ - type: textarea
26
+ id: additional-info
27
+ attributes:
28
+ label: Provide useful links for the implementation
29
+ description: |
30
+ Please provide information regarding the implementation, the weights, and the authors.
31
+ Please mention the authors by @gh-username if you're aware of their usernames.
diffusers/.github/ISSUE_TEMPLATE/remote-vae-pilot-feedback.yml ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: "\U0001F31F Remote VAE"
2
+ description: Feedback for remote VAE pilot
3
+ labels: [ "Remote VAE" ]
4
+
5
+ body:
6
+ - type: textarea
7
+ id: positive
8
+ validations:
9
+ required: true
10
+ attributes:
11
+ label: Did you like the remote VAE solution?
12
+ description: |
13
+ If you liked it, we would appreciate it if you could elaborate what you liked.
14
+
15
+ - type: textarea
16
+ id: feedback
17
+ validations:
18
+ required: true
19
+ attributes:
20
+ label: What can be improved about the current solution?
21
+ description: |
22
+ Let us know the things you would like to see improved. Note that we will work optimizing the solution once the pilot is over and we have usage.
23
+
24
+ - type: textarea
25
+ id: others
26
+ validations:
27
+ required: true
28
+ attributes:
29
+ label: What other VAEs you would like to see if the pilot goes well?
30
+ description: |
31
+ Provide a list of the VAEs you would like to see in the future if the pilot goes well.
32
+
33
+ - type: textarea
34
+ id: additional-info
35
+ attributes:
36
+ label: Notify the members of the team
37
+ description: |
38
+ Tag the following folks when submitting this feedback: @hlky @sayakpaul
diffusers/.github/actions/setup-miniconda/action.yml ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Set up conda environment for testing
2
+
3
+ description: Sets up miniconda in your ${RUNNER_TEMP} environment and gives you the ${CONDA_RUN} environment variable so you don't have to worry about polluting non-empeheral runners anymore
4
+
5
+ inputs:
6
+ python-version:
7
+ description: If set to any value, don't use sudo to clean the workspace
8
+ required: false
9
+ type: string
10
+ default: "3.9"
11
+ miniconda-version:
12
+ description: Miniconda version to install
13
+ required: false
14
+ type: string
15
+ default: "4.12.0"
16
+ environment-file:
17
+ description: Environment file to install dependencies from
18
+ required: false
19
+ type: string
20
+ default: ""
21
+
22
+ runs:
23
+ using: composite
24
+ steps:
25
+ # Use the same trick from https://github.com/marketplace/actions/setup-miniconda
26
+ # to refresh the cache daily. This is kind of optional though
27
+ - name: Get date
28
+ id: get-date
29
+ shell: bash
30
+ run: echo "today=$(/bin/date -u '+%Y%m%d')d" >> $GITHUB_OUTPUT
31
+ - name: Setup miniconda cache
32
+ id: miniconda-cache
33
+ uses: actions/cache@v2
34
+ with:
35
+ path: ${{ runner.temp }}/miniconda
36
+ key: miniconda-${{ runner.os }}-${{ runner.arch }}-${{ inputs.python-version }}-${{ steps.get-date.outputs.today }}
37
+ - name: Install miniconda (${{ inputs.miniconda-version }})
38
+ if: steps.miniconda-cache.outputs.cache-hit != 'true'
39
+ env:
40
+ MINICONDA_VERSION: ${{ inputs.miniconda-version }}
41
+ shell: bash -l {0}
42
+ run: |
43
+ MINICONDA_INSTALL_PATH="${RUNNER_TEMP}/miniconda"
44
+ mkdir -p "${MINICONDA_INSTALL_PATH}"
45
+ case ${RUNNER_OS}-${RUNNER_ARCH} in
46
+ Linux-X64)
47
+ MINICONDA_ARCH="Linux-x86_64"
48
+ ;;
49
+ macOS-ARM64)
50
+ MINICONDA_ARCH="MacOSX-arm64"
51
+ ;;
52
+ macOS-X64)
53
+ MINICONDA_ARCH="MacOSX-x86_64"
54
+ ;;
55
+ *)
56
+ echo "::error::Platform ${RUNNER_OS}-${RUNNER_ARCH} currently unsupported using this action"
57
+ exit 1
58
+ ;;
59
+ esac
60
+ MINICONDA_URL="https://repo.anaconda.com/miniconda/Miniconda3-py39_${MINICONDA_VERSION}-${MINICONDA_ARCH}.sh"
61
+ curl -fsSL "${MINICONDA_URL}" -o "${MINICONDA_INSTALL_PATH}/miniconda.sh"
62
+ bash "${MINICONDA_INSTALL_PATH}/miniconda.sh" -b -u -p "${MINICONDA_INSTALL_PATH}"
63
+ rm -rf "${MINICONDA_INSTALL_PATH}/miniconda.sh"
64
+ - name: Update GitHub path to include miniconda install
65
+ shell: bash
66
+ run: |
67
+ MINICONDA_INSTALL_PATH="${RUNNER_TEMP}/miniconda"
68
+ echo "${MINICONDA_INSTALL_PATH}/bin" >> $GITHUB_PATH
69
+ - name: Setup miniconda env cache (with env file)
70
+ id: miniconda-env-cache-env-file
71
+ if: ${{ runner.os }} == 'macOS' && ${{ inputs.environment-file }} != ''
72
+ uses: actions/cache@v2
73
+ with:
74
+ path: ${{ runner.temp }}/conda-python-${{ inputs.python-version }}
75
+ key: miniconda-env-${{ runner.os }}-${{ runner.arch }}-${{ inputs.python-version }}-${{ steps.get-date.outputs.today }}-${{ hashFiles(inputs.environment-file) }}
76
+ - name: Setup miniconda env cache (without env file)
77
+ id: miniconda-env-cache
78
+ if: ${{ runner.os }} == 'macOS' && ${{ inputs.environment-file }} == ''
79
+ uses: actions/cache@v2
80
+ with:
81
+ path: ${{ runner.temp }}/conda-python-${{ inputs.python-version }}
82
+ key: miniconda-env-${{ runner.os }}-${{ runner.arch }}-${{ inputs.python-version }}-${{ steps.get-date.outputs.today }}
83
+ - name: Setup conda environment with python (v${{ inputs.python-version }})
84
+ if: steps.miniconda-env-cache-env-file.outputs.cache-hit != 'true' && steps.miniconda-env-cache.outputs.cache-hit != 'true'
85
+ shell: bash
86
+ env:
87
+ PYTHON_VERSION: ${{ inputs.python-version }}
88
+ ENV_FILE: ${{ inputs.environment-file }}
89
+ run: |
90
+ CONDA_BASE_ENV="${RUNNER_TEMP}/conda-python-${PYTHON_VERSION}"
91
+ ENV_FILE_FLAG=""
92
+ if [[ -f "${ENV_FILE}" ]]; then
93
+ ENV_FILE_FLAG="--file ${ENV_FILE}"
94
+ elif [[ -n "${ENV_FILE}" ]]; then
95
+ echo "::warning::Specified env file (${ENV_FILE}) not found, not going to include it"
96
+ fi
97
+ conda create \
98
+ --yes \
99
+ --prefix "${CONDA_BASE_ENV}" \
100
+ "python=${PYTHON_VERSION}" \
101
+ ${ENV_FILE_FLAG} \
102
+ cmake=3.22 \
103
+ conda-build=3.21 \
104
+ ninja=1.10 \
105
+ pkg-config=0.29 \
106
+ wheel=0.37
107
+ - name: Clone the base conda environment and update GitHub env
108
+ shell: bash
109
+ env:
110
+ PYTHON_VERSION: ${{ inputs.python-version }}
111
+ CONDA_BASE_ENV: ${{ runner.temp }}/conda-python-${{ inputs.python-version }}
112
+ run: |
113
+ CONDA_ENV="${RUNNER_TEMP}/conda_environment_${GITHUB_RUN_ID}"
114
+ conda create \
115
+ --yes \
116
+ --prefix "${CONDA_ENV}" \
117
+ --clone "${CONDA_BASE_ENV}"
118
+ # TODO: conda-build could not be cloned because it hardcodes the path, so it
119
+ # could not be cached
120
+ conda install --yes -p ${CONDA_ENV} conda-build=3.21
121
+ echo "CONDA_ENV=${CONDA_ENV}" >> "${GITHUB_ENV}"
122
+ echo "CONDA_RUN=conda run -p ${CONDA_ENV} --no-capture-output" >> "${GITHUB_ENV}"
123
+ echo "CONDA_BUILD=conda run -p ${CONDA_ENV} conda-build" >> "${GITHUB_ENV}"
124
+ echo "CONDA_INSTALL=conda install -p ${CONDA_ENV}" >> "${GITHUB_ENV}"
125
+ - name: Get disk space usage and throw an error for low disk space
126
+ shell: bash
127
+ run: |
128
+ echo "Print the available disk space for manual inspection"
129
+ df -h
130
+ # Set the minimum requirement space to 4GB
131
+ MINIMUM_AVAILABLE_SPACE_IN_GB=4
132
+ MINIMUM_AVAILABLE_SPACE_IN_KB=$(($MINIMUM_AVAILABLE_SPACE_IN_GB * 1024 * 1024))
133
+ # Use KB to avoid floating point warning like 3.1GB
134
+ df -k | tr -s ' ' | cut -d' ' -f 4,9 | while read -r LINE;
135
+ do
136
+ AVAIL=$(echo $LINE | cut -f1 -d' ')
137
+ MOUNT=$(echo $LINE | cut -f2 -d' ')
138
+ if [ "$MOUNT" = "/" ]; then
139
+ if [ "$AVAIL" -lt "$MINIMUM_AVAILABLE_SPACE_IN_KB" ]; then
140
+ echo "There is only ${AVAIL}KB free space left in $MOUNT, which is less than the minimum requirement of ${MINIMUM_AVAILABLE_SPACE_IN_KB}KB. Please help create an issue to PyTorch Release Engineering via https://github.com/pytorch/test-infra/issues and provide the link to the workflow run."
141
+ exit 1;
142
+ else
143
+ echo "There is ${AVAIL}KB free space left in $MOUNT, continue"
144
+ fi
145
+ fi
146
+ done
diffusers/.github/workflows/benchmark.yml ADDED
@@ -0,0 +1,89 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Benchmarking tests
2
+
3
+ on:
4
+ workflow_dispatch:
5
+ schedule:
6
+ - cron: "30 1 1,15 * *" # every 2 weeks on the 1st and the 15th of every month at 1:30 AM
7
+
8
+ env:
9
+ DIFFUSERS_IS_CI: yes
10
+ HF_HUB_ENABLE_HF_TRANSFER: 1
11
+ HF_HOME: /mnt/cache
12
+ OMP_NUM_THREADS: 8
13
+ MKL_NUM_THREADS: 8
14
+ BASE_PATH: benchmark_outputs
15
+
16
+ jobs:
17
+ torch_models_cuda_benchmark_tests:
18
+ env:
19
+ SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL_BENCHMARK }}
20
+ name: Torch Core Models CUDA Benchmarking Tests
21
+ strategy:
22
+ fail-fast: false
23
+ max-parallel: 1
24
+ runs-on:
25
+ group: aws-g6e-4xlarge
26
+ container:
27
+ image: diffusers/diffusers-pytorch-cuda
28
+ options: --shm-size "16gb" --ipc host --gpus 0
29
+ steps:
30
+ - name: Checkout diffusers
31
+ uses: actions/checkout@v3
32
+ with:
33
+ fetch-depth: 2
34
+ - name: NVIDIA-SMI
35
+ run: |
36
+ nvidia-smi
37
+ - name: Install dependencies
38
+ run: |
39
+ apt update
40
+ apt install -y libpq-dev postgresql-client
41
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
42
+ python -m uv pip install -e [quality,test]
43
+ python -m uv pip install -r benchmarks/requirements.txt
44
+ - name: Environment
45
+ run: |
46
+ python utils/print_env.py
47
+ - name: Diffusers Benchmarking
48
+ env:
49
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
50
+ run: |
51
+ cd benchmarks && python run_all.py
52
+
53
+ - name: Push results to the Hub
54
+ env:
55
+ HF_TOKEN: ${{ secrets.DIFFUSERS_BOT_TOKEN }}
56
+ run: |
57
+ cd benchmarks && python push_results.py
58
+ mkdir $BASE_PATH && cp *.csv $BASE_PATH
59
+
60
+ - name: Test suite reports artifacts
61
+ if: ${{ always() }}
62
+ uses: actions/upload-artifact@v4
63
+ with:
64
+ name: benchmark_test_reports
65
+ path: benchmarks/${{ env.BASE_PATH }}
66
+
67
+ # TODO: enable this once the connection problem has been resolved.
68
+ - name: Update benchmarking results to DB
69
+ env:
70
+ PGDATABASE: metrics
71
+ PGHOST: ${{ secrets.DIFFUSERS_BENCHMARKS_PGHOST }}
72
+ PGUSER: transformers_benchmarks
73
+ PGPASSWORD: ${{ secrets.DIFFUSERS_BENCHMARKS_PGPASSWORD }}
74
+ BRANCH_NAME: ${{ github.head_ref || github.ref_name }}
75
+ run: |
76
+ git config --global --add safe.directory /__w/diffusers/diffusers
77
+ commit_id=$GITHUB_SHA
78
+ commit_msg=$(git show -s --format=%s "$commit_id" | cut -c1-70)
79
+ cd benchmarks && python populate_into_db.py "$BRANCH_NAME" "$commit_id" "$commit_msg"
80
+
81
+ - name: Report success status
82
+ if: ${{ success() }}
83
+ run: |
84
+ pip install requests && python utils/notify_benchmarking_status.py --status=success
85
+
86
+ - name: Report failure status
87
+ if: ${{ failure() }}
88
+ run: |
89
+ pip install requests && python utils/notify_benchmarking_status.py --status=failure
diffusers/.github/workflows/build_docker_images.yml ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Test, build, and push Docker images
2
+
3
+ on:
4
+ pull_request: # During PRs, we just check if the changes Dockerfiles can be successfully built
5
+ branches:
6
+ - main
7
+ paths:
8
+ - "docker/**"
9
+ workflow_dispatch:
10
+ schedule:
11
+ - cron: "0 0 * * *" # every day at midnight
12
+
13
+ concurrency:
14
+ group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
15
+ cancel-in-progress: true
16
+
17
+ env:
18
+ REGISTRY: diffusers
19
+ CI_SLACK_CHANNEL: ${{ secrets.CI_DOCKER_CHANNEL }}
20
+
21
+ jobs:
22
+ test-build-docker-images:
23
+ runs-on:
24
+ group: aws-general-8-plus
25
+ if: github.event_name == 'pull_request'
26
+ steps:
27
+ - name: Set up Docker Buildx
28
+ uses: docker/setup-buildx-action@v1
29
+
30
+ - name: Check out code
31
+ uses: actions/checkout@v3
32
+
33
+ - name: Find Changed Dockerfiles
34
+ id: file_changes
35
+ uses: jitterbit/get-changed-files@v1
36
+ with:
37
+ format: "space-delimited"
38
+ token: ${{ secrets.GITHUB_TOKEN }}
39
+
40
+ - name: Build Changed Docker Images
41
+ env:
42
+ CHANGED_FILES: ${{ steps.file_changes.outputs.all }}
43
+ run: |
44
+ echo "$CHANGED_FILES"
45
+ for FILE in $CHANGED_FILES; do
46
+ # skip anything that isn't still on disk
47
+ if [[ ! -f "$FILE" ]]; then
48
+ echo "Skipping removed file $FILE"
49
+ continue
50
+ fi
51
+ if [[ "$FILE" == docker/*Dockerfile ]]; then
52
+ DOCKER_PATH="${FILE%/Dockerfile}"
53
+ DOCKER_TAG=$(basename "$DOCKER_PATH")
54
+ echo "Building Docker image for $DOCKER_TAG"
55
+ docker build -t "$DOCKER_TAG" "$DOCKER_PATH"
56
+ fi
57
+ done
58
+ if: steps.file_changes.outputs.all != ''
59
+
60
+ build-and-push-docker-images:
61
+ runs-on:
62
+ group: aws-general-8-plus
63
+ if: github.event_name != 'pull_request'
64
+
65
+ permissions:
66
+ contents: read
67
+ packages: write
68
+
69
+ strategy:
70
+ fail-fast: false
71
+ matrix:
72
+ image-name:
73
+ - diffusers-pytorch-cpu
74
+ - diffusers-pytorch-cuda
75
+ - diffusers-pytorch-cuda
76
+ - diffusers-pytorch-xformers-cuda
77
+ - diffusers-pytorch-minimum-cuda
78
+ - diffusers-doc-builder
79
+
80
+ steps:
81
+ - name: Checkout repository
82
+ uses: actions/checkout@v3
83
+ - name: Set up Docker Buildx
84
+ uses: docker/setup-buildx-action@v1
85
+ - name: Login to Docker Hub
86
+ uses: docker/login-action@v2
87
+ with:
88
+ username: ${{ env.REGISTRY }}
89
+ password: ${{ secrets.DOCKERHUB_TOKEN }}
90
+ - name: Build and push
91
+ uses: docker/build-push-action@v3
92
+ with:
93
+ no-cache: true
94
+ context: ./docker/${{ matrix.image-name }}
95
+ push: true
96
+ tags: ${{ env.REGISTRY }}/${{ matrix.image-name }}:latest
97
+
98
+ - name: Post to a Slack channel
99
+ id: slack
100
+ uses: huggingface/hf-workflows/.github/actions/post-slack@main
101
+ with:
102
+ # Slack channel id, channel name, or user id to post message.
103
+ # See also: https://api.slack.com/methods/chat.postMessage#channels
104
+ slack_channel: ${{ env.CI_SLACK_CHANNEL }}
105
+ title: "🤗 Results of the ${{ matrix.image-name }} Docker Image build"
106
+ status: ${{ job.status }}
107
+ slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
diffusers/.github/workflows/build_documentation.yml ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Build documentation
2
+
3
+ on:
4
+ push:
5
+ branches:
6
+ - main
7
+ - doc-builder*
8
+ - v*-release
9
+ - v*-patch
10
+ paths:
11
+ - "src/diffusers/**.py"
12
+ - "examples/**"
13
+ - "docs/**"
14
+
15
+ jobs:
16
+ build:
17
+ uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
18
+ with:
19
+ commit_sha: ${{ github.sha }}
20
+ install_libgl1: true
21
+ package: diffusers
22
+ notebook_folder: diffusers_doc
23
+ languages: en ko zh ja pt
24
+ custom_container: diffusers/diffusers-doc-builder
25
+ secrets:
26
+ token: ${{ secrets.HUGGINGFACE_PUSH }}
27
+ hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
diffusers/.github/workflows/build_pr_documentation.yml ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Build PR Documentation
2
+
3
+ on:
4
+ pull_request:
5
+ paths:
6
+ - "src/diffusers/**.py"
7
+ - "examples/**"
8
+ - "docs/**"
9
+
10
+ concurrency:
11
+ group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
12
+ cancel-in-progress: true
13
+
14
+ jobs:
15
+ build:
16
+ uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
17
+ with:
18
+ commit_sha: ${{ github.event.pull_request.head.sha }}
19
+ pr_number: ${{ github.event.number }}
20
+ install_libgl1: true
21
+ package: diffusers
22
+ languages: en ko zh ja pt
23
+ custom_container: diffusers/diffusers-doc-builder
diffusers/.github/workflows/mirror_community_pipeline.yml ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Mirror Community Pipeline
2
+
3
+ on:
4
+ # Push changes on the main branch
5
+ push:
6
+ branches:
7
+ - main
8
+ paths:
9
+ - 'examples/community/**.py'
10
+
11
+ # And on tag creation (e.g. `v0.28.1`)
12
+ tags:
13
+ - '*'
14
+
15
+ # Manual trigger with ref input
16
+ workflow_dispatch:
17
+ inputs:
18
+ ref:
19
+ description: "Either 'main' or a tag ref"
20
+ required: true
21
+ default: 'main'
22
+
23
+ jobs:
24
+ mirror_community_pipeline:
25
+ env:
26
+ SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL_COMMUNITY_MIRROR }}
27
+
28
+ runs-on: ubuntu-22.04
29
+ steps:
30
+ # Checkout to correct ref
31
+ # If workflow dispatch
32
+ # If ref is 'main', set:
33
+ # CHECKOUT_REF=refs/heads/main
34
+ # PATH_IN_REPO=main
35
+ # Else it must be a tag. Set:
36
+ # CHECKOUT_REF=refs/tags/{tag}
37
+ # PATH_IN_REPO={tag}
38
+ # If not workflow dispatch
39
+ # If ref is 'refs/heads/main' => set 'main'
40
+ # Else it must be a tag => set {tag}
41
+ - name: Set checkout_ref and path_in_repo
42
+ run: |
43
+ if [ "${{ github.event_name }}" == "workflow_dispatch" ]; then
44
+ if [ -z "${{ github.event.inputs.ref }}" ]; then
45
+ echo "Error: Missing ref input"
46
+ exit 1
47
+ elif [ "${{ github.event.inputs.ref }}" == "main" ]; then
48
+ echo "CHECKOUT_REF=refs/heads/main" >> $GITHUB_ENV
49
+ echo "PATH_IN_REPO=main" >> $GITHUB_ENV
50
+ else
51
+ echo "CHECKOUT_REF=refs/tags/${{ github.event.inputs.ref }}" >> $GITHUB_ENV
52
+ echo "PATH_IN_REPO=${{ github.event.inputs.ref }}" >> $GITHUB_ENV
53
+ fi
54
+ elif [ "${{ github.ref }}" == "refs/heads/main" ]; then
55
+ echo "CHECKOUT_REF=${{ github.ref }}" >> $GITHUB_ENV
56
+ echo "PATH_IN_REPO=main" >> $GITHUB_ENV
57
+ else
58
+ # e.g. refs/tags/v0.28.1 -> v0.28.1
59
+ echo "CHECKOUT_REF=${{ github.ref }}" >> $GITHUB_ENV
60
+ echo "PATH_IN_REPO=$(echo ${{ github.ref }} | sed 's/^refs\/tags\///')" >> $GITHUB_ENV
61
+ fi
62
+ - name: Print env vars
63
+ run: |
64
+ echo "CHECKOUT_REF: ${{ env.CHECKOUT_REF }}"
65
+ echo "PATH_IN_REPO: ${{ env.PATH_IN_REPO }}"
66
+ - uses: actions/checkout@v3
67
+ with:
68
+ ref: ${{ env.CHECKOUT_REF }}
69
+
70
+ # Setup + install dependencies
71
+ - name: Set up Python
72
+ uses: actions/setup-python@v4
73
+ with:
74
+ python-version: "3.10"
75
+ - name: Install dependencies
76
+ run: |
77
+ python -m pip install --upgrade pip
78
+ pip install --upgrade huggingface_hub
79
+
80
+ # Check secret is set
81
+ - name: whoami
82
+ run: huggingface-cli whoami
83
+ env:
84
+ HF_TOKEN: ${{ secrets.HF_TOKEN_MIRROR_COMMUNITY_PIPELINES }}
85
+
86
+ # Push to HF! (under subfolder based on checkout ref)
87
+ # https://huggingface.co/datasets/diffusers/community-pipelines-mirror
88
+ - name: Mirror community pipeline to HF
89
+ run: huggingface-cli upload diffusers/community-pipelines-mirror ./examples/community ${PATH_IN_REPO} --repo-type dataset
90
+ env:
91
+ PATH_IN_REPO: ${{ env.PATH_IN_REPO }}
92
+ HF_TOKEN: ${{ secrets.HF_TOKEN_MIRROR_COMMUNITY_PIPELINES }}
93
+
94
+ - name: Report success status
95
+ if: ${{ success() }}
96
+ run: |
97
+ pip install requests && python utils/notify_community_pipelines_mirror.py --status=success
98
+
99
+ - name: Report failure status
100
+ if: ${{ failure() }}
101
+ run: |
102
+ pip install requests && python utils/notify_community_pipelines_mirror.py --status=failure
diffusers/.github/workflows/nightly_tests.yml ADDED
@@ -0,0 +1,612 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Nightly and release tests on main/release branch
2
+
3
+ on:
4
+ workflow_dispatch:
5
+ schedule:
6
+ - cron: "0 0 * * *" # every day at midnight
7
+
8
+ env:
9
+ DIFFUSERS_IS_CI: yes
10
+ HF_HUB_ENABLE_HF_TRANSFER: 1
11
+ OMP_NUM_THREADS: 8
12
+ MKL_NUM_THREADS: 8
13
+ PYTEST_TIMEOUT: 600
14
+ RUN_SLOW: yes
15
+ RUN_NIGHTLY: yes
16
+ PIPELINE_USAGE_CUTOFF: 0
17
+ SLACK_API_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
18
+ CONSOLIDATED_REPORT_PATH: consolidated_test_report.md
19
+
20
+ jobs:
21
+ setup_torch_cuda_pipeline_matrix:
22
+ name: Setup Torch Pipelines CUDA Slow Tests Matrix
23
+ runs-on:
24
+ group: aws-general-8-plus
25
+ container:
26
+ image: diffusers/diffusers-pytorch-cpu
27
+ outputs:
28
+ pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
29
+ steps:
30
+ - name: Checkout diffusers
31
+ uses: actions/checkout@v3
32
+ with:
33
+ fetch-depth: 2
34
+ - name: Install dependencies
35
+ run: |
36
+ pip install -e .[test]
37
+ pip install huggingface_hub
38
+ - name: Fetch Pipeline Matrix
39
+ id: fetch_pipeline_matrix
40
+ run: |
41
+ matrix=$(python utils/fetch_torch_cuda_pipeline_test_matrix.py)
42
+ echo $matrix
43
+ echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
44
+
45
+ - name: Pipeline Tests Artifacts
46
+ if: ${{ always() }}
47
+ uses: actions/upload-artifact@v4
48
+ with:
49
+ name: test-pipelines.json
50
+ path: reports
51
+
52
+ run_nightly_tests_for_torch_pipelines:
53
+ name: Nightly Torch Pipelines CUDA Tests
54
+ needs: setup_torch_cuda_pipeline_matrix
55
+ strategy:
56
+ fail-fast: false
57
+ max-parallel: 8
58
+ matrix:
59
+ module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
60
+ runs-on:
61
+ group: aws-g4dn-2xlarge
62
+ container:
63
+ image: diffusers/diffusers-pytorch-cuda
64
+ options: --shm-size "16gb" --ipc host --gpus 0
65
+ steps:
66
+ - name: Checkout diffusers
67
+ uses: actions/checkout@v3
68
+ with:
69
+ fetch-depth: 2
70
+ - name: NVIDIA-SMI
71
+ run: nvidia-smi
72
+ - name: Install dependencies
73
+ run: |
74
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
75
+ python -m uv pip install -e [quality,test]
76
+ pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
77
+ python -m uv pip install pytest-reportlog
78
+ - name: Environment
79
+ run: |
80
+ python utils/print_env.py
81
+ - name: Pipeline CUDA Test
82
+ env:
83
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
84
+ # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
85
+ CUBLAS_WORKSPACE_CONFIG: :16:8
86
+ run: |
87
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
88
+ -s -v -k "not Flax and not Onnx" \
89
+ --make-reports=tests_pipeline_${{ matrix.module }}_cuda \
90
+ --report-log=tests_pipeline_${{ matrix.module }}_cuda.log \
91
+ tests/pipelines/${{ matrix.module }}
92
+ - name: Failure short reports
93
+ if: ${{ failure() }}
94
+ run: |
95
+ cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
96
+ cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
97
+ - name: Test suite reports artifacts
98
+ if: ${{ always() }}
99
+ uses: actions/upload-artifact@v4
100
+ with:
101
+ name: pipeline_${{ matrix.module }}_test_reports
102
+ path: reports
103
+
104
+ run_nightly_tests_for_other_torch_modules:
105
+ name: Nightly Torch CUDA Tests
106
+ runs-on:
107
+ group: aws-g4dn-2xlarge
108
+ container:
109
+ image: diffusers/diffusers-pytorch-cuda
110
+ options: --shm-size "16gb" --ipc host --gpus 0
111
+ defaults:
112
+ run:
113
+ shell: bash
114
+ strategy:
115
+ fail-fast: false
116
+ max-parallel: 2
117
+ matrix:
118
+ module: [models, schedulers, lora, others, single_file, examples]
119
+ steps:
120
+ - name: Checkout diffusers
121
+ uses: actions/checkout@v3
122
+ with:
123
+ fetch-depth: 2
124
+
125
+ - name: Install dependencies
126
+ run: |
127
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
128
+ python -m uv pip install -e [quality,test]
129
+ python -m uv pip install peft@git+https://github.com/huggingface/peft.git
130
+ pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
131
+ python -m uv pip install pytest-reportlog
132
+ - name: Environment
133
+ run: python utils/print_env.py
134
+
135
+ - name: Run nightly PyTorch CUDA tests for non-pipeline modules
136
+ if: ${{ matrix.module != 'examples'}}
137
+ env:
138
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
139
+ # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
140
+ CUBLAS_WORKSPACE_CONFIG: :16:8
141
+ run: |
142
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
143
+ -s -v -k "not Flax and not Onnx" \
144
+ --make-reports=tests_torch_${{ matrix.module }}_cuda \
145
+ --report-log=tests_torch_${{ matrix.module }}_cuda.log \
146
+ tests/${{ matrix.module }}
147
+
148
+ - name: Run nightly example tests with Torch
149
+ if: ${{ matrix.module == 'examples' }}
150
+ env:
151
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
152
+ # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
153
+ CUBLAS_WORKSPACE_CONFIG: :16:8
154
+ run: |
155
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
156
+ -s -v --make-reports=examples_torch_cuda \
157
+ --report-log=examples_torch_cuda.log \
158
+ examples/
159
+
160
+ - name: Failure short reports
161
+ if: ${{ failure() }}
162
+ run: |
163
+ cat reports/tests_torch_${{ matrix.module }}_cuda_stats.txt
164
+ cat reports/tests_torch_${{ matrix.module }}_cuda_failures_short.txt
165
+
166
+ - name: Test suite reports artifacts
167
+ if: ${{ always() }}
168
+ uses: actions/upload-artifact@v4
169
+ with:
170
+ name: torch_${{ matrix.module }}_cuda_test_reports
171
+ path: reports
172
+
173
+ run_torch_compile_tests:
174
+ name: PyTorch Compile CUDA tests
175
+
176
+ runs-on:
177
+ group: aws-g4dn-2xlarge
178
+
179
+ container:
180
+ image: diffusers/diffusers-pytorch-cuda
181
+ options: --gpus 0 --shm-size "16gb" --ipc host
182
+
183
+ steps:
184
+ - name: Checkout diffusers
185
+ uses: actions/checkout@v3
186
+ with:
187
+ fetch-depth: 2
188
+
189
+ - name: NVIDIA-SMI
190
+ run: |
191
+ nvidia-smi
192
+ - name: Install dependencies
193
+ run: |
194
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
195
+ python -m uv pip install -e [quality,test,training]
196
+ - name: Environment
197
+ run: |
198
+ python utils/print_env.py
199
+ - name: Run torch compile tests on GPU
200
+ env:
201
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
202
+ RUN_COMPILE: yes
203
+ run: |
204
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
205
+ - name: Failure short reports
206
+ if: ${{ failure() }}
207
+ run: cat reports/tests_torch_compile_cuda_failures_short.txt
208
+
209
+ - name: Test suite reports artifacts
210
+ if: ${{ always() }}
211
+ uses: actions/upload-artifact@v4
212
+ with:
213
+ name: torch_compile_test_reports
214
+ path: reports
215
+
216
+ run_big_gpu_torch_tests:
217
+ name: Torch tests on big GPU
218
+ strategy:
219
+ fail-fast: false
220
+ max-parallel: 2
221
+ runs-on:
222
+ group: aws-g6e-xlarge-plus
223
+ container:
224
+ image: diffusers/diffusers-pytorch-cuda
225
+ options: --shm-size "16gb" --ipc host --gpus 0
226
+ steps:
227
+ - name: Checkout diffusers
228
+ uses: actions/checkout@v3
229
+ with:
230
+ fetch-depth: 2
231
+ - name: NVIDIA-SMI
232
+ run: nvidia-smi
233
+ - name: Install dependencies
234
+ run: |
235
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
236
+ python -m uv pip install -e [quality,test]
237
+ python -m uv pip install peft@git+https://github.com/huggingface/peft.git
238
+ pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
239
+ python -m uv pip install pytest-reportlog
240
+ - name: Environment
241
+ run: |
242
+ python utils/print_env.py
243
+ - name: Selected Torch CUDA Test on big GPU
244
+ env:
245
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
246
+ # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
247
+ CUBLAS_WORKSPACE_CONFIG: :16:8
248
+ BIG_GPU_MEMORY: 40
249
+ run: |
250
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
251
+ -m "big_accelerator" \
252
+ --make-reports=tests_big_gpu_torch_cuda \
253
+ --report-log=tests_big_gpu_torch_cuda.log \
254
+ tests/
255
+ - name: Failure short reports
256
+ if: ${{ failure() }}
257
+ run: |
258
+ cat reports/tests_big_gpu_torch_cuda_stats.txt
259
+ cat reports/tests_big_gpu_torch_cuda_failures_short.txt
260
+ - name: Test suite reports artifacts
261
+ if: ${{ always() }}
262
+ uses: actions/upload-artifact@v4
263
+ with:
264
+ name: torch_cuda_big_gpu_test_reports
265
+ path: reports
266
+
267
+ torch_minimum_version_cuda_tests:
268
+ name: Torch Minimum Version CUDA Tests
269
+ runs-on:
270
+ group: aws-g4dn-2xlarge
271
+ container:
272
+ image: diffusers/diffusers-pytorch-minimum-cuda
273
+ options: --shm-size "16gb" --ipc host --gpus 0
274
+ defaults:
275
+ run:
276
+ shell: bash
277
+ steps:
278
+ - name: Checkout diffusers
279
+ uses: actions/checkout@v3
280
+ with:
281
+ fetch-depth: 2
282
+
283
+ - name: Install dependencies
284
+ run: |
285
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
286
+ python -m uv pip install -e [quality,test]
287
+ python -m uv pip install peft@git+https://github.com/huggingface/peft.git
288
+ pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
289
+
290
+ - name: Environment
291
+ run: |
292
+ python utils/print_env.py
293
+
294
+ - name: Run PyTorch CUDA tests
295
+ env:
296
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
297
+ # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
298
+ CUBLAS_WORKSPACE_CONFIG: :16:8
299
+ run: |
300
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
301
+ -s -v -k "not Flax and not Onnx" \
302
+ --make-reports=tests_torch_minimum_version_cuda \
303
+ tests/models/test_modeling_common.py \
304
+ tests/pipelines/test_pipelines_common.py \
305
+ tests/pipelines/test_pipeline_utils.py \
306
+ tests/pipelines/test_pipelines.py \
307
+ tests/pipelines/test_pipelines_auto.py \
308
+ tests/schedulers/test_schedulers.py \
309
+ tests/others
310
+
311
+ - name: Failure short reports
312
+ if: ${{ failure() }}
313
+ run: |
314
+ cat reports/tests_torch_minimum_version_cuda_stats.txt
315
+ cat reports/tests_torch_minimum_version_cuda_failures_short.txt
316
+
317
+ - name: Test suite reports artifacts
318
+ if: ${{ always() }}
319
+ uses: actions/upload-artifact@v4
320
+ with:
321
+ name: torch_minimum_version_cuda_test_reports
322
+ path: reports
323
+
324
+ run_nightly_quantization_tests:
325
+ name: Torch quantization nightly tests
326
+ strategy:
327
+ fail-fast: false
328
+ max-parallel: 2
329
+ matrix:
330
+ config:
331
+ - backend: "bitsandbytes"
332
+ test_location: "bnb"
333
+ additional_deps: ["peft"]
334
+ - backend: "gguf"
335
+ test_location: "gguf"
336
+ additional_deps: ["peft"]
337
+ - backend: "torchao"
338
+ test_location: "torchao"
339
+ additional_deps: []
340
+ - backend: "optimum_quanto"
341
+ test_location: "quanto"
342
+ additional_deps: []
343
+ runs-on:
344
+ group: aws-g6e-xlarge-plus
345
+ container:
346
+ image: diffusers/diffusers-pytorch-cuda
347
+ options: --shm-size "20gb" --ipc host --gpus 0
348
+ steps:
349
+ - name: Checkout diffusers
350
+ uses: actions/checkout@v3
351
+ with:
352
+ fetch-depth: 2
353
+ - name: NVIDIA-SMI
354
+ run: nvidia-smi
355
+ - name: Install dependencies
356
+ run: |
357
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
358
+ python -m uv pip install -e [quality,test]
359
+ python -m uv pip install -U ${{ matrix.config.backend }}
360
+ if [ "${{ join(matrix.config.additional_deps, ' ') }}" != "" ]; then
361
+ python -m uv pip install ${{ join(matrix.config.additional_deps, ' ') }}
362
+ fi
363
+ python -m uv pip install pytest-reportlog
364
+ - name: Environment
365
+ run: |
366
+ python utils/print_env.py
367
+ - name: ${{ matrix.config.backend }} quantization tests on GPU
368
+ env:
369
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
370
+ # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
371
+ CUBLAS_WORKSPACE_CONFIG: :16:8
372
+ BIG_GPU_MEMORY: 40
373
+ run: |
374
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
375
+ --make-reports=tests_${{ matrix.config.backend }}_torch_cuda \
376
+ --report-log=tests_${{ matrix.config.backend }}_torch_cuda.log \
377
+ tests/quantization/${{ matrix.config.test_location }}
378
+ - name: Failure short reports
379
+ if: ${{ failure() }}
380
+ run: |
381
+ cat reports/tests_${{ matrix.config.backend }}_torch_cuda_stats.txt
382
+ cat reports/tests_${{ matrix.config.backend }}_torch_cuda_failures_short.txt
383
+ - name: Test suite reports artifacts
384
+ if: ${{ always() }}
385
+ uses: actions/upload-artifact@v4
386
+ with:
387
+ name: torch_cuda_${{ matrix.config.backend }}_reports
388
+ path: reports
389
+
390
+ run_nightly_pipeline_level_quantization_tests:
391
+ name: Torch quantization nightly tests
392
+ strategy:
393
+ fail-fast: false
394
+ max-parallel: 2
395
+ runs-on:
396
+ group: aws-g6e-xlarge-plus
397
+ container:
398
+ image: diffusers/diffusers-pytorch-cuda
399
+ options: --shm-size "20gb" --ipc host --gpus 0
400
+ steps:
401
+ - name: Checkout diffusers
402
+ uses: actions/checkout@v3
403
+ with:
404
+ fetch-depth: 2
405
+ - name: NVIDIA-SMI
406
+ run: nvidia-smi
407
+ - name: Install dependencies
408
+ run: |
409
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
410
+ python -m uv pip install -e [quality,test]
411
+ python -m uv pip install -U bitsandbytes optimum_quanto
412
+ python -m uv pip install pytest-reportlog
413
+ - name: Environment
414
+ run: |
415
+ python utils/print_env.py
416
+ - name: Pipeline-level quantization tests on GPU
417
+ env:
418
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
419
+ # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
420
+ CUBLAS_WORKSPACE_CONFIG: :16:8
421
+ BIG_GPU_MEMORY: 40
422
+ run: |
423
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
424
+ --make-reports=tests_pipeline_level_quant_torch_cuda \
425
+ --report-log=tests_pipeline_level_quant_torch_cuda.log \
426
+ tests/quantization/test_pipeline_level_quantization.py
427
+ - name: Failure short reports
428
+ if: ${{ failure() }}
429
+ run: |
430
+ cat reports/tests_pipeline_level_quant_torch_cuda_stats.txt
431
+ cat reports/tests_pipeline_level_quant_torch_cuda_failures_short.txt
432
+ - name: Test suite reports artifacts
433
+ if: ${{ always() }}
434
+ uses: actions/upload-artifact@v4
435
+ with:
436
+ name: torch_cuda_pipeline_level_quant_reports
437
+ path: reports
438
+
439
+ generate_consolidated_report:
440
+ name: Generate Consolidated Test Report
441
+ needs: [
442
+ run_nightly_tests_for_torch_pipelines,
443
+ run_nightly_tests_for_other_torch_modules,
444
+ run_torch_compile_tests,
445
+ run_big_gpu_torch_tests,
446
+ run_nightly_quantization_tests,
447
+ run_nightly_pipeline_level_quantization_tests,
448
+ # run_nightly_onnx_tests,
449
+ torch_minimum_version_cuda_tests,
450
+ # run_flax_tpu_tests
451
+ ]
452
+ if: always()
453
+ runs-on:
454
+ group: aws-general-8-plus
455
+ container:
456
+ image: diffusers/diffusers-pytorch-cpu
457
+ steps:
458
+ - name: Checkout diffusers
459
+ uses: actions/checkout@v3
460
+ with:
461
+ fetch-depth: 2
462
+
463
+ - name: Create reports directory
464
+ run: mkdir -p combined_reports
465
+
466
+ - name: Download all test reports
467
+ uses: actions/download-artifact@v4
468
+ with:
469
+ path: artifacts
470
+
471
+ - name: Prepare reports
472
+ run: |
473
+ # Move all report files to a single directory for processing
474
+ find artifacts -name "*.txt" -exec cp {} combined_reports/ \;
475
+
476
+ - name: Install dependencies
477
+ run: |
478
+ pip install -e .[test]
479
+ pip install slack_sdk tabulate
480
+
481
+ - name: Generate consolidated report
482
+ run: |
483
+ python utils/consolidated_test_report.py \
484
+ --reports_dir combined_reports \
485
+ --output_file $CONSOLIDATED_REPORT_PATH \
486
+ --slack_channel_name diffusers-ci-nightly
487
+
488
+ - name: Show consolidated report
489
+ run: |
490
+ cat $CONSOLIDATED_REPORT_PATH >> $GITHUB_STEP_SUMMARY
491
+
492
+ - name: Upload consolidated report
493
+ uses: actions/upload-artifact@v4
494
+ with:
495
+ name: consolidated_test_report
496
+ path: ${{ env.CONSOLIDATED_REPORT_PATH }}
497
+
498
+ # M1 runner currently not well supported
499
+ # TODO: (Dhruv) add these back when we setup better testing for Apple Silicon
500
+ # run_nightly_tests_apple_m1:
501
+ # name: Nightly PyTorch MPS tests on MacOS
502
+ # runs-on: [ self-hosted, apple-m1 ]
503
+ # if: github.event_name == 'schedule'
504
+ #
505
+ # steps:
506
+ # - name: Checkout diffusers
507
+ # uses: actions/checkout@v3
508
+ # with:
509
+ # fetch-depth: 2
510
+ #
511
+ # - name: Clean checkout
512
+ # shell: arch -arch arm64 bash {0}
513
+ # run: |
514
+ # git clean -fxd
515
+ # - name: Setup miniconda
516
+ # uses: ./.github/actions/setup-miniconda
517
+ # with:
518
+ # python-version: 3.9
519
+ #
520
+ # - name: Install dependencies
521
+ # shell: arch -arch arm64 bash {0}
522
+ # run: |
523
+ # ${CONDA_RUN} python -m pip install --upgrade pip uv
524
+ # ${CONDA_RUN} python -m uv pip install -e [quality,test]
525
+ # ${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
526
+ # ${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
527
+ # ${CONDA_RUN} python -m uv pip install pytest-reportlog
528
+ # - name: Environment
529
+ # shell: arch -arch arm64 bash {0}
530
+ # run: |
531
+ # ${CONDA_RUN} python utils/print_env.py
532
+ # - name: Run nightly PyTorch tests on M1 (MPS)
533
+ # shell: arch -arch arm64 bash {0}
534
+ # env:
535
+ # HF_HOME: /System/Volumes/Data/mnt/cache
536
+ # HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
537
+ # run: |
538
+ # ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
539
+ # --report-log=tests_torch_mps.log \
540
+ # tests/
541
+ # - name: Failure short reports
542
+ # if: ${{ failure() }}
543
+ # run: cat reports/tests_torch_mps_failures_short.txt
544
+ #
545
+ # - name: Test suite reports artifacts
546
+ # if: ${{ always() }}
547
+ # uses: actions/upload-artifact@v4
548
+ # with:
549
+ # name: torch_mps_test_reports
550
+ # path: reports
551
+ #
552
+ # - name: Generate Report and Notify Channel
553
+ # if: always()
554
+ # run: |
555
+ # pip install slack_sdk tabulate
556
+ # python utils/log_reports.py >> $GITHUB_STEP_SUMMARY run_nightly_tests_apple_m1:
557
+ # name: Nightly PyTorch MPS tests on MacOS
558
+ # runs-on: [ self-hosted, apple-m1 ]
559
+ # if: github.event_name == 'schedule'
560
+ #
561
+ # steps:
562
+ # - name: Checkout diffusers
563
+ # uses: actions/checkout@v3
564
+ # with:
565
+ # fetch-depth: 2
566
+ #
567
+ # - name: Clean checkout
568
+ # shell: arch -arch arm64 bash {0}
569
+ # run: |
570
+ # git clean -fxd
571
+ # - name: Setup miniconda
572
+ # uses: ./.github/actions/setup-miniconda
573
+ # with:
574
+ # python-version: 3.9
575
+ #
576
+ # - name: Install dependencies
577
+ # shell: arch -arch arm64 bash {0}
578
+ # run: |
579
+ # ${CONDA_RUN} python -m pip install --upgrade pip uv
580
+ # ${CONDA_RUN} python -m uv pip install -e [quality,test]
581
+ # ${CONDA_RUN} python -m uv pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cpu
582
+ # ${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate
583
+ # ${CONDA_RUN} python -m uv pip install pytest-reportlog
584
+ # - name: Environment
585
+ # shell: arch -arch arm64 bash {0}
586
+ # run: |
587
+ # ${CONDA_RUN} python utils/print_env.py
588
+ # - name: Run nightly PyTorch tests on M1 (MPS)
589
+ # shell: arch -arch arm64 bash {0}
590
+ # env:
591
+ # HF_HOME: /System/Volumes/Data/mnt/cache
592
+ # HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
593
+ # run: |
594
+ # ${CONDA_RUN} python -m pytest -n 1 -s -v --make-reports=tests_torch_mps \
595
+ # --report-log=tests_torch_mps.log \
596
+ # tests/
597
+ # - name: Failure short reports
598
+ # if: ${{ failure() }}
599
+ # run: cat reports/tests_torch_mps_failures_short.txt
600
+ #
601
+ # - name: Test suite reports artifacts
602
+ # if: ${{ always() }}
603
+ # uses: actions/upload-artifact@v4
604
+ # with:
605
+ # name: torch_mps_test_reports
606
+ # path: reports
607
+ #
608
+ # - name: Generate Report and Notify Channel
609
+ # if: always()
610
+ # run: |
611
+ # pip install slack_sdk tabulate
612
+ # python utils/log_reports.py >> $GITHUB_STEP_SUMMARY
diffusers/.github/workflows/notify_slack_about_release.yml ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Notify Slack about a release
2
+
3
+ on:
4
+ workflow_dispatch:
5
+ release:
6
+ types: [published]
7
+
8
+ jobs:
9
+ build:
10
+ runs-on: ubuntu-22.04
11
+
12
+ steps:
13
+ - uses: actions/checkout@v3
14
+
15
+ - name: Setup Python
16
+ uses: actions/setup-python@v4
17
+ with:
18
+ python-version: '3.8'
19
+
20
+ - name: Notify Slack about the release
21
+ env:
22
+ SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
23
+ run: pip install requests && python utils/notify_slack_about_release.py
diffusers/.github/workflows/pr_dependency_test.yml ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Run dependency tests
2
+
3
+ on:
4
+ pull_request:
5
+ branches:
6
+ - main
7
+ paths:
8
+ - "src/diffusers/**.py"
9
+ push:
10
+ branches:
11
+ - main
12
+
13
+ concurrency:
14
+ group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
15
+ cancel-in-progress: true
16
+
17
+ jobs:
18
+ check_dependencies:
19
+ runs-on: ubuntu-22.04
20
+ steps:
21
+ - uses: actions/checkout@v3
22
+ - name: Set up Python
23
+ uses: actions/setup-python@v4
24
+ with:
25
+ python-version: "3.8"
26
+ - name: Install dependencies
27
+ run: |
28
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
29
+ python -m pip install --upgrade pip uv
30
+ python -m uv pip install -e .
31
+ python -m uv pip install pytest
32
+ - name: Check for soft dependencies
33
+ run: |
34
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
35
+ pytest tests/others/test_dependencies.py
diffusers/.github/workflows/pr_flax_dependency_test.yml ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Run Flax dependency tests
2
+
3
+ on:
4
+ pull_request:
5
+ branches:
6
+ - main
7
+ paths:
8
+ - "src/diffusers/**.py"
9
+ push:
10
+ branches:
11
+ - main
12
+
13
+ concurrency:
14
+ group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
15
+ cancel-in-progress: true
16
+
17
+ jobs:
18
+ check_flax_dependencies:
19
+ runs-on: ubuntu-22.04
20
+ steps:
21
+ - uses: actions/checkout@v3
22
+ - name: Set up Python
23
+ uses: actions/setup-python@v4
24
+ with:
25
+ python-version: "3.8"
26
+ - name: Install dependencies
27
+ run: |
28
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
29
+ python -m pip install --upgrade pip uv
30
+ python -m uv pip install -e .
31
+ python -m uv pip install "jax[cpu]>=0.2.16,!=0.3.2"
32
+ python -m uv pip install "flax>=0.4.1"
33
+ python -m uv pip install "jaxlib>=0.1.65"
34
+ python -m uv pip install pytest
35
+ - name: Check for soft dependencies
36
+ run: |
37
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
38
+ pytest tests/others/test_dependencies.py
diffusers/.github/workflows/pr_style_bot.yml ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: PR Style Bot
2
+
3
+ on:
4
+ issue_comment:
5
+ types: [created]
6
+
7
+ permissions:
8
+ contents: write
9
+ pull-requests: write
10
+
11
+ jobs:
12
+ style:
13
+ uses: huggingface/huggingface_hub/.github/workflows/style-bot-action.yml@main
14
+ with:
15
+ python_quality_dependencies: "[quality]"
16
+ secrets:
17
+ bot_token: ${{ secrets.HF_STYLE_BOT_ACTION }}
diffusers/.github/workflows/pr_test_fetcher.yml ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Fast tests for PRs - Test Fetcher
2
+
3
+ on: workflow_dispatch
4
+
5
+ env:
6
+ DIFFUSERS_IS_CI: yes
7
+ OMP_NUM_THREADS: 4
8
+ MKL_NUM_THREADS: 4
9
+ PYTEST_TIMEOUT: 60
10
+
11
+ concurrency:
12
+ group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
13
+ cancel-in-progress: true
14
+
15
+ jobs:
16
+ setup_pr_tests:
17
+ name: Setup PR Tests
18
+ runs-on:
19
+ group: aws-general-8-plus
20
+ container:
21
+ image: diffusers/diffusers-pytorch-cpu
22
+ options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
23
+ defaults:
24
+ run:
25
+ shell: bash
26
+ outputs:
27
+ matrix: ${{ steps.set_matrix.outputs.matrix }}
28
+ test_map: ${{ steps.set_matrix.outputs.test_map }}
29
+ steps:
30
+ - name: Checkout diffusers
31
+ uses: actions/checkout@v3
32
+ with:
33
+ fetch-depth: 0
34
+ - name: Install dependencies
35
+ run: |
36
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
37
+ python -m uv pip install -e [quality,test]
38
+ - name: Environment
39
+ run: |
40
+ python utils/print_env.py
41
+ echo $(git --version)
42
+ - name: Fetch Tests
43
+ run: |
44
+ python utils/tests_fetcher.py | tee test_preparation.txt
45
+ - name: Report fetched tests
46
+ uses: actions/upload-artifact@v3
47
+ with:
48
+ name: test_fetched
49
+ path: test_preparation.txt
50
+ - id: set_matrix
51
+ name: Create Test Matrix
52
+ # The `keys` is used as GitHub actions matrix for jobs, i.e. `models`, `pipelines`, etc.
53
+ # The `test_map` is used to get the actual identified test files under each key.
54
+ # If no test to run (so no `test_map.json` file), create a dummy map (empty matrix will fail)
55
+ run: |
56
+ if [ -f test_map.json ]; then
57
+ keys=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); d = list(test_map.keys()); print(json.dumps(d))')
58
+ test_map=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); print(json.dumps(test_map))')
59
+ else
60
+ keys=$(python3 -c 'keys = ["dummy"]; print(keys)')
61
+ test_map=$(python3 -c 'test_map = {"dummy": []}; print(test_map)')
62
+ fi
63
+ echo $keys
64
+ echo $test_map
65
+ echo "matrix=$keys" >> $GITHUB_OUTPUT
66
+ echo "test_map=$test_map" >> $GITHUB_OUTPUT
67
+
68
+ run_pr_tests:
69
+ name: Run PR Tests
70
+ needs: setup_pr_tests
71
+ if: contains(fromJson(needs.setup_pr_tests.outputs.matrix), 'dummy') != true
72
+ strategy:
73
+ fail-fast: false
74
+ max-parallel: 2
75
+ matrix:
76
+ modules: ${{ fromJson(needs.setup_pr_tests.outputs.matrix) }}
77
+ runs-on:
78
+ group: aws-general-8-plus
79
+ container:
80
+ image: diffusers/diffusers-pytorch-cpu
81
+ options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
82
+ defaults:
83
+ run:
84
+ shell: bash
85
+ steps:
86
+ - name: Checkout diffusers
87
+ uses: actions/checkout@v3
88
+ with:
89
+ fetch-depth: 2
90
+
91
+ - name: Install dependencies
92
+ run: |
93
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
94
+ python -m pip install -e [quality,test]
95
+ python -m pip install accelerate
96
+
97
+ - name: Environment
98
+ run: |
99
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
100
+ python utils/print_env.py
101
+
102
+ - name: Run all selected tests on CPU
103
+ run: |
104
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
105
+ python -m pytest -n 2 --dist=loadfile -v --make-reports=${{ matrix.modules }}_tests_cpu ${{ fromJson(needs.setup_pr_tests.outputs.test_map)[matrix.modules] }}
106
+
107
+ - name: Failure short reports
108
+ if: ${{ failure() }}
109
+ continue-on-error: true
110
+ run: |
111
+ cat reports/${{ matrix.modules }}_tests_cpu_stats.txt
112
+ cat reports/${{ matrix.modules }}_tests_cpu_failures_short.txt
113
+
114
+ - name: Test suite reports artifacts
115
+ if: ${{ always() }}
116
+ uses: actions/upload-artifact@v3
117
+ with:
118
+ name: ${{ matrix.modules }}_test_reports
119
+ path: reports
120
+
121
+ run_staging_tests:
122
+ strategy:
123
+ fail-fast: false
124
+ matrix:
125
+ config:
126
+ - name: Hub tests for models, schedulers, and pipelines
127
+ framework: hub_tests_pytorch
128
+ runner: aws-general-8-plus
129
+ image: diffusers/diffusers-pytorch-cpu
130
+ report: torch_hub
131
+
132
+ name: ${{ matrix.config.name }}
133
+ runs-on:
134
+ group: ${{ matrix.config.runner }}
135
+ container:
136
+ image: ${{ matrix.config.image }}
137
+ options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
138
+
139
+ defaults:
140
+ run:
141
+ shell: bash
142
+
143
+ steps:
144
+ - name: Checkout diffusers
145
+ uses: actions/checkout@v3
146
+ with:
147
+ fetch-depth: 2
148
+
149
+ - name: Install dependencies
150
+ run: |
151
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
152
+ python -m pip install -e [quality,test]
153
+
154
+ - name: Environment
155
+ run: |
156
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
157
+ python utils/print_env.py
158
+
159
+ - name: Run Hub tests for models, schedulers, and pipelines on a staging env
160
+ if: ${{ matrix.config.framework == 'hub_tests_pytorch' }}
161
+ run: |
162
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
163
+ HUGGINGFACE_CO_STAGING=true python -m pytest \
164
+ -m "is_staging_test" \
165
+ --make-reports=tests_${{ matrix.config.report }} \
166
+ tests
167
+
168
+ - name: Failure short reports
169
+ if: ${{ failure() }}
170
+ run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt
171
+
172
+ - name: Test suite reports artifacts
173
+ if: ${{ always() }}
174
+ uses: actions/upload-artifact@v4
175
+ with:
176
+ name: pr_${{ matrix.config.report }}_test_reports
177
+ path: reports
diffusers/.github/workflows/pr_tests.yml ADDED
@@ -0,0 +1,289 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Fast tests for PRs
2
+
3
+ on:
4
+ pull_request:
5
+ branches: [main]
6
+ paths:
7
+ - "src/diffusers/**.py"
8
+ - "benchmarks/**.py"
9
+ - "examples/**.py"
10
+ - "scripts/**.py"
11
+ - "tests/**.py"
12
+ - ".github/**.yml"
13
+ - "utils/**.py"
14
+ - "setup.py"
15
+ push:
16
+ branches:
17
+ - ci-*
18
+
19
+ concurrency:
20
+ group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
21
+ cancel-in-progress: true
22
+
23
+ env:
24
+ DIFFUSERS_IS_CI: yes
25
+ HF_HUB_ENABLE_HF_TRANSFER: 1
26
+ OMP_NUM_THREADS: 4
27
+ MKL_NUM_THREADS: 4
28
+ PYTEST_TIMEOUT: 60
29
+
30
+ jobs:
31
+ check_code_quality:
32
+ runs-on: ubuntu-22.04
33
+ steps:
34
+ - uses: actions/checkout@v3
35
+ - name: Set up Python
36
+ uses: actions/setup-python@v4
37
+ with:
38
+ python-version: "3.8"
39
+ - name: Install dependencies
40
+ run: |
41
+ python -m pip install --upgrade pip
42
+ pip install .[quality]
43
+ - name: Check quality
44
+ run: make quality
45
+ - name: Check if failure
46
+ if: ${{ failure() }}
47
+ run: |
48
+ echo "Quality check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make style && make quality'" >> $GITHUB_STEP_SUMMARY
49
+
50
+ check_repository_consistency:
51
+ needs: check_code_quality
52
+ runs-on: ubuntu-22.04
53
+ steps:
54
+ - uses: actions/checkout@v3
55
+ - name: Set up Python
56
+ uses: actions/setup-python@v4
57
+ with:
58
+ python-version: "3.8"
59
+ - name: Install dependencies
60
+ run: |
61
+ python -m pip install --upgrade pip
62
+ pip install .[quality]
63
+ - name: Check repo consistency
64
+ run: |
65
+ python utils/check_copies.py
66
+ python utils/check_dummies.py
67
+ python utils/check_support_list.py
68
+ make deps_table_check_updated
69
+ - name: Check if failure
70
+ if: ${{ failure() }}
71
+ run: |
72
+ echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
73
+
74
+ run_fast_tests:
75
+ needs: [check_code_quality, check_repository_consistency]
76
+ strategy:
77
+ fail-fast: false
78
+ matrix:
79
+ config:
80
+ - name: Fast PyTorch Pipeline CPU tests
81
+ framework: pytorch_pipelines
82
+ runner: aws-highmemory-32-plus
83
+ image: diffusers/diffusers-pytorch-cpu
84
+ report: torch_cpu_pipelines
85
+ - name: Fast PyTorch Models & Schedulers CPU tests
86
+ framework: pytorch_models
87
+ runner: aws-general-8-plus
88
+ image: diffusers/diffusers-pytorch-cpu
89
+ report: torch_cpu_models_schedulers
90
+ - name: PyTorch Example CPU tests
91
+ framework: pytorch_examples
92
+ runner: aws-general-8-plus
93
+ image: diffusers/diffusers-pytorch-cpu
94
+ report: torch_example_cpu
95
+
96
+ name: ${{ matrix.config.name }}
97
+
98
+ runs-on:
99
+ group: ${{ matrix.config.runner }}
100
+
101
+ container:
102
+ image: ${{ matrix.config.image }}
103
+ options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
104
+
105
+ defaults:
106
+ run:
107
+ shell: bash
108
+
109
+ steps:
110
+ - name: Checkout diffusers
111
+ uses: actions/checkout@v3
112
+ with:
113
+ fetch-depth: 2
114
+
115
+ - name: Install dependencies
116
+ run: |
117
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
118
+ python -m uv pip install -e [quality,test]
119
+ pip uninstall transformers -y && python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
120
+ pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git --no-deps
121
+
122
+ - name: Environment
123
+ run: |
124
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
125
+ python utils/print_env.py
126
+
127
+ - name: Run fast PyTorch Pipeline CPU tests
128
+ if: ${{ matrix.config.framework == 'pytorch_pipelines' }}
129
+ run: |
130
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
131
+ python -m pytest -n 8 --max-worker-restart=0 --dist=loadfile \
132
+ -s -v -k "not Flax and not Onnx" \
133
+ --make-reports=tests_${{ matrix.config.report }} \
134
+ tests/pipelines
135
+
136
+ - name: Run fast PyTorch Model Scheduler CPU tests
137
+ if: ${{ matrix.config.framework == 'pytorch_models' }}
138
+ run: |
139
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
140
+ python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
141
+ -s -v -k "not Flax and not Onnx and not Dependency" \
142
+ --make-reports=tests_${{ matrix.config.report }} \
143
+ tests/models tests/schedulers tests/others
144
+
145
+ - name: Run example PyTorch CPU tests
146
+ if: ${{ matrix.config.framework == 'pytorch_examples' }}
147
+ run: |
148
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
149
+ python -m uv pip install peft timm
150
+ python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
151
+ --make-reports=tests_${{ matrix.config.report }} \
152
+ examples
153
+
154
+ - name: Failure short reports
155
+ if: ${{ failure() }}
156
+ run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt
157
+
158
+ - name: Test suite reports artifacts
159
+ if: ${{ always() }}
160
+ uses: actions/upload-artifact@v4
161
+ with:
162
+ name: pr_${{ matrix.config.framework }}_${{ matrix.config.report }}_test_reports
163
+ path: reports
164
+
165
+ run_staging_tests:
166
+ needs: [check_code_quality, check_repository_consistency]
167
+ strategy:
168
+ fail-fast: false
169
+ matrix:
170
+ config:
171
+ - name: Hub tests for models, schedulers, and pipelines
172
+ framework: hub_tests_pytorch
173
+ runner:
174
+ group: aws-general-8-plus
175
+ image: diffusers/diffusers-pytorch-cpu
176
+ report: torch_hub
177
+
178
+ name: ${{ matrix.config.name }}
179
+
180
+ runs-on: ${{ matrix.config.runner }}
181
+
182
+ container:
183
+ image: ${{ matrix.config.image }}
184
+ options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
185
+
186
+ defaults:
187
+ run:
188
+ shell: bash
189
+
190
+ steps:
191
+ - name: Checkout diffusers
192
+ uses: actions/checkout@v3
193
+ with:
194
+ fetch-depth: 2
195
+
196
+ - name: Install dependencies
197
+ run: |
198
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
199
+ python -m uv pip install -e [quality,test]
200
+
201
+ - name: Environment
202
+ run: |
203
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
204
+ python utils/print_env.py
205
+
206
+ - name: Run Hub tests for models, schedulers, and pipelines on a staging env
207
+ if: ${{ matrix.config.framework == 'hub_tests_pytorch' }}
208
+ run: |
209
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
210
+ HUGGINGFACE_CO_STAGING=true python -m pytest \
211
+ -m "is_staging_test" \
212
+ --make-reports=tests_${{ matrix.config.report }} \
213
+ tests
214
+
215
+ - name: Failure short reports
216
+ if: ${{ failure() }}
217
+ run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt
218
+
219
+ - name: Test suite reports artifacts
220
+ if: ${{ always() }}
221
+ uses: actions/upload-artifact@v4
222
+ with:
223
+ name: pr_${{ matrix.config.report }}_test_reports
224
+ path: reports
225
+
226
+ run_lora_tests:
227
+ needs: [check_code_quality, check_repository_consistency]
228
+ strategy:
229
+ fail-fast: false
230
+
231
+ name: LoRA tests with PEFT main
232
+
233
+ runs-on:
234
+ group: aws-general-8-plus
235
+
236
+ container:
237
+ image: diffusers/diffusers-pytorch-cpu
238
+ options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
239
+
240
+ defaults:
241
+ run:
242
+ shell: bash
243
+
244
+ steps:
245
+ - name: Checkout diffusers
246
+ uses: actions/checkout@v3
247
+ with:
248
+ fetch-depth: 2
249
+
250
+ - name: Install dependencies
251
+ run: |
252
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
253
+ python -m uv pip install -e [quality,test]
254
+ # TODO (sayakpaul, DN6): revisit `--no-deps`
255
+ python -m pip install -U peft@git+https://github.com/huggingface/peft.git --no-deps
256
+ python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
257
+ python -m uv pip install -U tokenizers
258
+ pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git --no-deps
259
+
260
+ - name: Environment
261
+ run: |
262
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
263
+ python utils/print_env.py
264
+
265
+ - name: Run fast PyTorch LoRA tests with PEFT
266
+ run: |
267
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
268
+ python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
269
+ -s -v \
270
+ --make-reports=tests_peft_main \
271
+ tests/lora/
272
+ python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
273
+ -s -v \
274
+ --make-reports=tests_models_lora_peft_main \
275
+ tests/models/ -k "lora"
276
+
277
+ - name: Failure short reports
278
+ if: ${{ failure() }}
279
+ run: |
280
+ cat reports/tests_peft_main_failures_short.txt
281
+ cat reports/tests_models_lora_peft_main_failures_short.txt
282
+
283
+ - name: Test suite reports artifacts
284
+ if: ${{ always() }}
285
+ uses: actions/upload-artifact@v4
286
+ with:
287
+ name: pr_main_test_reports
288
+ path: reports
289
+
diffusers/.github/workflows/pr_tests_gpu.yml ADDED
@@ -0,0 +1,296 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Fast GPU Tests on PR
2
+
3
+ on:
4
+ pull_request:
5
+ branches: main
6
+ paths:
7
+ - "src/diffusers/models/modeling_utils.py"
8
+ - "src/diffusers/models/model_loading_utils.py"
9
+ - "src/diffusers/pipelines/pipeline_utils.py"
10
+ - "src/diffusers/pipeline_loading_utils.py"
11
+ - "src/diffusers/loaders/lora_base.py"
12
+ - "src/diffusers/loaders/lora_pipeline.py"
13
+ - "src/diffusers/loaders/peft.py"
14
+ - "tests/pipelines/test_pipelines_common.py"
15
+ - "tests/models/test_modeling_common.py"
16
+ workflow_dispatch:
17
+
18
+ concurrency:
19
+ group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
20
+ cancel-in-progress: true
21
+
22
+ env:
23
+ DIFFUSERS_IS_CI: yes
24
+ OMP_NUM_THREADS: 8
25
+ MKL_NUM_THREADS: 8
26
+ HF_HUB_ENABLE_HF_TRANSFER: 1
27
+ PYTEST_TIMEOUT: 600
28
+ PIPELINE_USAGE_CUTOFF: 1000000000 # set high cutoff so that only always-test pipelines run
29
+
30
+ jobs:
31
+ check_code_quality:
32
+ runs-on: ubuntu-22.04
33
+ steps:
34
+ - uses: actions/checkout@v3
35
+ - name: Set up Python
36
+ uses: actions/setup-python@v4
37
+ with:
38
+ python-version: "3.8"
39
+ - name: Install dependencies
40
+ run: |
41
+ python -m pip install --upgrade pip
42
+ pip install .[quality]
43
+ - name: Check quality
44
+ run: make quality
45
+ - name: Check if failure
46
+ if: ${{ failure() }}
47
+ run: |
48
+ echo "Quality check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make style && make quality'" >> $GITHUB_STEP_SUMMARY
49
+
50
+ check_repository_consistency:
51
+ needs: check_code_quality
52
+ runs-on: ubuntu-22.04
53
+ steps:
54
+ - uses: actions/checkout@v3
55
+ - name: Set up Python
56
+ uses: actions/setup-python@v4
57
+ with:
58
+ python-version: "3.8"
59
+ - name: Install dependencies
60
+ run: |
61
+ python -m pip install --upgrade pip
62
+ pip install .[quality]
63
+ - name: Check repo consistency
64
+ run: |
65
+ python utils/check_copies.py
66
+ python utils/check_dummies.py
67
+ python utils/check_support_list.py
68
+ make deps_table_check_updated
69
+ - name: Check if failure
70
+ if: ${{ failure() }}
71
+ run: |
72
+ echo "Repo consistency check failed. Please ensure the right dependency versions are installed with 'pip install -e .[quality]' and run 'make fix-copies'" >> $GITHUB_STEP_SUMMARY
73
+
74
+ setup_torch_cuda_pipeline_matrix:
75
+ needs: [check_code_quality, check_repository_consistency]
76
+ name: Setup Torch Pipelines CUDA Slow Tests Matrix
77
+ runs-on:
78
+ group: aws-general-8-plus
79
+ container:
80
+ image: diffusers/diffusers-pytorch-cpu
81
+ outputs:
82
+ pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
83
+ steps:
84
+ - name: Checkout diffusers
85
+ uses: actions/checkout@v3
86
+ with:
87
+ fetch-depth: 2
88
+ - name: Install dependencies
89
+ run: |
90
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
91
+ python -m uv pip install -e [quality,test]
92
+ - name: Environment
93
+ run: |
94
+ python utils/print_env.py
95
+ - name: Fetch Pipeline Matrix
96
+ id: fetch_pipeline_matrix
97
+ run: |
98
+ matrix=$(python utils/fetch_torch_cuda_pipeline_test_matrix.py)
99
+ echo $matrix
100
+ echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
101
+ - name: Pipeline Tests Artifacts
102
+ if: ${{ always() }}
103
+ uses: actions/upload-artifact@v4
104
+ with:
105
+ name: test-pipelines.json
106
+ path: reports
107
+
108
+ torch_pipelines_cuda_tests:
109
+ name: Torch Pipelines CUDA Tests
110
+ needs: setup_torch_cuda_pipeline_matrix
111
+ strategy:
112
+ fail-fast: false
113
+ max-parallel: 8
114
+ matrix:
115
+ module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
116
+ runs-on:
117
+ group: aws-g4dn-2xlarge
118
+ container:
119
+ image: diffusers/diffusers-pytorch-cuda
120
+ options: --shm-size "16gb" --ipc host --gpus 0
121
+ steps:
122
+ - name: Checkout diffusers
123
+ uses: actions/checkout@v3
124
+ with:
125
+ fetch-depth: 2
126
+
127
+ - name: NVIDIA-SMI
128
+ run: |
129
+ nvidia-smi
130
+ - name: Install dependencies
131
+ run: |
132
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
133
+ python -m uv pip install -e [quality,test]
134
+ pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
135
+ pip uninstall transformers -y && python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
136
+
137
+ - name: Environment
138
+ run: |
139
+ python utils/print_env.py
140
+ - name: Extract tests
141
+ id: extract_tests
142
+ run: |
143
+ pattern=$(python utils/extract_tests_from_mixin.py --type pipeline)
144
+ echo "$pattern" > /tmp/test_pattern.txt
145
+ echo "pattern_file=/tmp/test_pattern.txt" >> $GITHUB_OUTPUT
146
+
147
+ - name: PyTorch CUDA checkpoint tests on Ubuntu
148
+ env:
149
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
150
+ # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
151
+ CUBLAS_WORKSPACE_CONFIG: :16:8
152
+ run: |
153
+ if [ "${{ matrix.module }}" = "ip_adapters" ]; then
154
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
155
+ -s -v -k "not Flax and not Onnx" \
156
+ --make-reports=tests_pipeline_${{ matrix.module }}_cuda \
157
+ tests/pipelines/${{ matrix.module }}
158
+ else
159
+ pattern=$(cat ${{ steps.extract_tests.outputs.pattern_file }})
160
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
161
+ -s -v -k "not Flax and not Onnx and $pattern" \
162
+ --make-reports=tests_pipeline_${{ matrix.module }}_cuda \
163
+ tests/pipelines/${{ matrix.module }}
164
+ fi
165
+
166
+ - name: Failure short reports
167
+ if: ${{ failure() }}
168
+ run: |
169
+ cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
170
+ cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
171
+ - name: Test suite reports artifacts
172
+ if: ${{ always() }}
173
+ uses: actions/upload-artifact@v4
174
+ with:
175
+ name: pipeline_${{ matrix.module }}_test_reports
176
+ path: reports
177
+
178
+ torch_cuda_tests:
179
+ name: Torch CUDA Tests
180
+ needs: [check_code_quality, check_repository_consistency]
181
+ runs-on:
182
+ group: aws-g4dn-2xlarge
183
+ container:
184
+ image: diffusers/diffusers-pytorch-cuda
185
+ options: --shm-size "16gb" --ipc host --gpus 0
186
+ defaults:
187
+ run:
188
+ shell: bash
189
+ strategy:
190
+ fail-fast: false
191
+ max-parallel: 4
192
+ matrix:
193
+ module: [models, schedulers, lora, others]
194
+ steps:
195
+ - name: Checkout diffusers
196
+ uses: actions/checkout@v3
197
+ with:
198
+ fetch-depth: 2
199
+
200
+ - name: Install dependencies
201
+ run: |
202
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
203
+ python -m uv pip install -e [quality,test]
204
+ python -m uv pip install peft@git+https://github.com/huggingface/peft.git
205
+ pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
206
+ pip uninstall transformers -y && python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
207
+
208
+ - name: Environment
209
+ run: |
210
+ python utils/print_env.py
211
+
212
+ - name: Extract tests
213
+ id: extract_tests
214
+ run: |
215
+ pattern=$(python utils/extract_tests_from_mixin.py --type ${{ matrix.module }})
216
+ echo "$pattern" > /tmp/test_pattern.txt
217
+ echo "pattern_file=/tmp/test_pattern.txt" >> $GITHUB_OUTPUT
218
+
219
+ - name: Run PyTorch CUDA tests
220
+ env:
221
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
222
+ # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
223
+ CUBLAS_WORKSPACE_CONFIG: :16:8
224
+ run: |
225
+ pattern=$(cat ${{ steps.extract_tests.outputs.pattern_file }})
226
+ if [ -z "$pattern" ]; then
227
+ python -m pytest -n 1 -sv --max-worker-restart=0 --dist=loadfile -k "not Flax and not Onnx" tests/${{ matrix.module }} \
228
+ --make-reports=tests_torch_cuda_${{ matrix.module }}
229
+ else
230
+ python -m pytest -n 1 -sv --max-worker-restart=0 --dist=loadfile -k "not Flax and not Onnx and $pattern" tests/${{ matrix.module }} \
231
+ --make-reports=tests_torch_cuda_${{ matrix.module }}
232
+ fi
233
+
234
+ - name: Failure short reports
235
+ if: ${{ failure() }}
236
+ run: |
237
+ cat reports/tests_torch_cuda_${{ matrix.module }}_stats.txt
238
+ cat reports/tests_torch_cuda_${{ matrix.module }}_failures_short.txt
239
+
240
+ - name: Test suite reports artifacts
241
+ if: ${{ always() }}
242
+ uses: actions/upload-artifact@v4
243
+ with:
244
+ name: torch_cuda_test_reports_${{ matrix.module }}
245
+ path: reports
246
+
247
+ run_examples_tests:
248
+ name: Examples PyTorch CUDA tests on Ubuntu
249
+ needs: [check_code_quality, check_repository_consistency]
250
+ runs-on:
251
+ group: aws-g4dn-2xlarge
252
+
253
+ container:
254
+ image: diffusers/diffusers-pytorch-cuda
255
+ options: --gpus 0 --shm-size "16gb" --ipc host
256
+ steps:
257
+ - name: Checkout diffusers
258
+ uses: actions/checkout@v3
259
+ with:
260
+ fetch-depth: 2
261
+
262
+ - name: NVIDIA-SMI
263
+ run: |
264
+ nvidia-smi
265
+ - name: Install dependencies
266
+ run: |
267
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
268
+ pip uninstall transformers -y && python -m uv pip install -U transformers@git+https://github.com/huggingface/transformers.git --no-deps
269
+ python -m uv pip install -e [quality,test,training]
270
+
271
+ - name: Environment
272
+ run: |
273
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
274
+ python utils/print_env.py
275
+
276
+ - name: Run example tests on GPU
277
+ env:
278
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
279
+ run: |
280
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
281
+ python -m uv pip install timm
282
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=examples_torch_cuda examples/
283
+
284
+ - name: Failure short reports
285
+ if: ${{ failure() }}
286
+ run: |
287
+ cat reports/examples_torch_cuda_stats.txt
288
+ cat reports/examples_torch_cuda_failures_short.txt
289
+
290
+ - name: Test suite reports artifacts
291
+ if: ${{ always() }}
292
+ uses: actions/upload-artifact@v4
293
+ with:
294
+ name: examples_test_reports
295
+ path: reports
296
+
diffusers/.github/workflows/pr_torch_dependency_test.yml ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Run Torch dependency tests
2
+
3
+ on:
4
+ pull_request:
5
+ branches:
6
+ - main
7
+ paths:
8
+ - "src/diffusers/**.py"
9
+ push:
10
+ branches:
11
+ - main
12
+
13
+ concurrency:
14
+ group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
15
+ cancel-in-progress: true
16
+
17
+ jobs:
18
+ check_torch_dependencies:
19
+ runs-on: ubuntu-22.04
20
+ steps:
21
+ - uses: actions/checkout@v3
22
+ - name: Set up Python
23
+ uses: actions/setup-python@v4
24
+ with:
25
+ python-version: "3.8"
26
+ - name: Install dependencies
27
+ run: |
28
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
29
+ python -m pip install --upgrade pip uv
30
+ python -m uv pip install -e .
31
+ python -m uv pip install torch torchvision torchaudio
32
+ python -m uv pip install pytest
33
+ - name: Check for soft dependencies
34
+ run: |
35
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
36
+ pytest tests/others/test_dependencies.py
diffusers/.github/workflows/push_tests.yml ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Fast GPU Tests on main
2
+
3
+ on:
4
+ workflow_dispatch:
5
+ push:
6
+ branches:
7
+ - main
8
+ paths:
9
+ - "src/diffusers/**.py"
10
+ - "examples/**.py"
11
+ - "tests/**.py"
12
+
13
+ env:
14
+ DIFFUSERS_IS_CI: yes
15
+ OMP_NUM_THREADS: 8
16
+ MKL_NUM_THREADS: 8
17
+ HF_HUB_ENABLE_HF_TRANSFER: 1
18
+ PYTEST_TIMEOUT: 600
19
+ PIPELINE_USAGE_CUTOFF: 50000
20
+
21
+ jobs:
22
+ setup_torch_cuda_pipeline_matrix:
23
+ name: Setup Torch Pipelines CUDA Slow Tests Matrix
24
+ runs-on:
25
+ group: aws-general-8-plus
26
+ container:
27
+ image: diffusers/diffusers-pytorch-cpu
28
+ outputs:
29
+ pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
30
+ steps:
31
+ - name: Checkout diffusers
32
+ uses: actions/checkout@v3
33
+ with:
34
+ fetch-depth: 2
35
+ - name: Install dependencies
36
+ run: |
37
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
38
+ python -m uv pip install -e [quality,test]
39
+ - name: Environment
40
+ run: |
41
+ python utils/print_env.py
42
+ - name: Fetch Pipeline Matrix
43
+ id: fetch_pipeline_matrix
44
+ run: |
45
+ matrix=$(python utils/fetch_torch_cuda_pipeline_test_matrix.py)
46
+ echo $matrix
47
+ echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
48
+ - name: Pipeline Tests Artifacts
49
+ if: ${{ always() }}
50
+ uses: actions/upload-artifact@v4
51
+ with:
52
+ name: test-pipelines.json
53
+ path: reports
54
+
55
+ torch_pipelines_cuda_tests:
56
+ name: Torch Pipelines CUDA Tests
57
+ needs: setup_torch_cuda_pipeline_matrix
58
+ strategy:
59
+ fail-fast: false
60
+ max-parallel: 8
61
+ matrix:
62
+ module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
63
+ runs-on:
64
+ group: aws-g4dn-2xlarge
65
+ container:
66
+ image: diffusers/diffusers-pytorch-cuda
67
+ options: --shm-size "16gb" --ipc host --gpus 0
68
+ steps:
69
+ - name: Checkout diffusers
70
+ uses: actions/checkout@v3
71
+ with:
72
+ fetch-depth: 2
73
+ - name: NVIDIA-SMI
74
+ run: |
75
+ nvidia-smi
76
+ - name: Install dependencies
77
+ run: |
78
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
79
+ python -m uv pip install -e [quality,test]
80
+ pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
81
+ - name: Environment
82
+ run: |
83
+ python utils/print_env.py
84
+ - name: PyTorch CUDA checkpoint tests on Ubuntu
85
+ env:
86
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
87
+ # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
88
+ CUBLAS_WORKSPACE_CONFIG: :16:8
89
+ run: |
90
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
91
+ -s -v -k "not Flax and not Onnx" \
92
+ --make-reports=tests_pipeline_${{ matrix.module }}_cuda \
93
+ tests/pipelines/${{ matrix.module }}
94
+ - name: Failure short reports
95
+ if: ${{ failure() }}
96
+ run: |
97
+ cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
98
+ cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
99
+ - name: Test suite reports artifacts
100
+ if: ${{ always() }}
101
+ uses: actions/upload-artifact@v4
102
+ with:
103
+ name: pipeline_${{ matrix.module }}_test_reports
104
+ path: reports
105
+
106
+ torch_cuda_tests:
107
+ name: Torch CUDA Tests
108
+ runs-on:
109
+ group: aws-g4dn-2xlarge
110
+ container:
111
+ image: diffusers/diffusers-pytorch-cuda
112
+ options: --shm-size "16gb" --ipc host --gpus 0
113
+ defaults:
114
+ run:
115
+ shell: bash
116
+ strategy:
117
+ fail-fast: false
118
+ max-parallel: 2
119
+ matrix:
120
+ module: [models, schedulers, lora, others, single_file]
121
+ steps:
122
+ - name: Checkout diffusers
123
+ uses: actions/checkout@v3
124
+ with:
125
+ fetch-depth: 2
126
+
127
+ - name: Install dependencies
128
+ run: |
129
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
130
+ python -m uv pip install -e [quality,test]
131
+ python -m uv pip install peft@git+https://github.com/huggingface/peft.git
132
+ pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
133
+
134
+ - name: Environment
135
+ run: |
136
+ python utils/print_env.py
137
+
138
+ - name: Run PyTorch CUDA tests
139
+ env:
140
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
141
+ # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
142
+ CUBLAS_WORKSPACE_CONFIG: :16:8
143
+ run: |
144
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
145
+ -s -v -k "not Flax and not Onnx" \
146
+ --make-reports=tests_torch_cuda_${{ matrix.module }} \
147
+ tests/${{ matrix.module }}
148
+
149
+ - name: Failure short reports
150
+ if: ${{ failure() }}
151
+ run: |
152
+ cat reports/tests_torch_cuda_${{ matrix.module }}_stats.txt
153
+ cat reports/tests_torch_cuda_${{ matrix.module }}_failures_short.txt
154
+
155
+ - name: Test suite reports artifacts
156
+ if: ${{ always() }}
157
+ uses: actions/upload-artifact@v4
158
+ with:
159
+ name: torch_cuda_test_reports_${{ matrix.module }}
160
+ path: reports
161
+
162
+ run_torch_compile_tests:
163
+ name: PyTorch Compile CUDA tests
164
+
165
+ runs-on:
166
+ group: aws-g4dn-2xlarge
167
+
168
+ container:
169
+ image: diffusers/diffusers-pytorch-cuda
170
+ options: --gpus 0 --shm-size "16gb" --ipc host
171
+
172
+ steps:
173
+ - name: Checkout diffusers
174
+ uses: actions/checkout@v3
175
+ with:
176
+ fetch-depth: 2
177
+
178
+ - name: NVIDIA-SMI
179
+ run: |
180
+ nvidia-smi
181
+ - name: Install dependencies
182
+ run: |
183
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
184
+ python -m uv pip install -e [quality,test,training]
185
+ - name: Environment
186
+ run: |
187
+ python utils/print_env.py
188
+ - name: Run example tests on GPU
189
+ env:
190
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
191
+ RUN_COMPILE: yes
192
+ run: |
193
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
194
+ - name: Failure short reports
195
+ if: ${{ failure() }}
196
+ run: cat reports/tests_torch_compile_cuda_failures_short.txt
197
+
198
+ - name: Test suite reports artifacts
199
+ if: ${{ always() }}
200
+ uses: actions/upload-artifact@v4
201
+ with:
202
+ name: torch_compile_test_reports
203
+ path: reports
204
+
205
+ run_xformers_tests:
206
+ name: PyTorch xformers CUDA tests
207
+
208
+ runs-on:
209
+ group: aws-g4dn-2xlarge
210
+
211
+ container:
212
+ image: diffusers/diffusers-pytorch-xformers-cuda
213
+ options: --gpus 0 --shm-size "16gb" --ipc host
214
+
215
+ steps:
216
+ - name: Checkout diffusers
217
+ uses: actions/checkout@v3
218
+ with:
219
+ fetch-depth: 2
220
+
221
+ - name: NVIDIA-SMI
222
+ run: |
223
+ nvidia-smi
224
+ - name: Install dependencies
225
+ run: |
226
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
227
+ python -m uv pip install -e [quality,test,training]
228
+ - name: Environment
229
+ run: |
230
+ python utils/print_env.py
231
+ - name: Run example tests on GPU
232
+ env:
233
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
234
+ run: |
235
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "xformers" --make-reports=tests_torch_xformers_cuda tests/
236
+ - name: Failure short reports
237
+ if: ${{ failure() }}
238
+ run: cat reports/tests_torch_xformers_cuda_failures_short.txt
239
+
240
+ - name: Test suite reports artifacts
241
+ if: ${{ always() }}
242
+ uses: actions/upload-artifact@v4
243
+ with:
244
+ name: torch_xformers_test_reports
245
+ path: reports
246
+
247
+ run_examples_tests:
248
+ name: Examples PyTorch CUDA tests on Ubuntu
249
+
250
+ runs-on:
251
+ group: aws-g4dn-2xlarge
252
+
253
+ container:
254
+ image: diffusers/diffusers-pytorch-cuda
255
+ options: --gpus 0 --shm-size "16gb" --ipc host
256
+ steps:
257
+ - name: Checkout diffusers
258
+ uses: actions/checkout@v3
259
+ with:
260
+ fetch-depth: 2
261
+
262
+ - name: NVIDIA-SMI
263
+ run: |
264
+ nvidia-smi
265
+ - name: Install dependencies
266
+ run: |
267
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
268
+ python -m uv pip install -e [quality,test,training]
269
+
270
+ - name: Environment
271
+ run: |
272
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
273
+ python utils/print_env.py
274
+
275
+ - name: Run example tests on GPU
276
+ env:
277
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
278
+ run: |
279
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
280
+ python -m uv pip install timm
281
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=examples_torch_cuda examples/
282
+
283
+ - name: Failure short reports
284
+ if: ${{ failure() }}
285
+ run: |
286
+ cat reports/examples_torch_cuda_stats.txt
287
+ cat reports/examples_torch_cuda_failures_short.txt
288
+
289
+ - name: Test suite reports artifacts
290
+ if: ${{ always() }}
291
+ uses: actions/upload-artifact@v4
292
+ with:
293
+ name: examples_test_reports
294
+ path: reports
diffusers/.github/workflows/push_tests_fast.yml ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Fast tests on main
2
+
3
+ on:
4
+ push:
5
+ branches:
6
+ - main
7
+ paths:
8
+ - "src/diffusers/**.py"
9
+ - "examples/**.py"
10
+ - "tests/**.py"
11
+
12
+ concurrency:
13
+ group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
14
+ cancel-in-progress: true
15
+
16
+ env:
17
+ DIFFUSERS_IS_CI: yes
18
+ HF_HOME: /mnt/cache
19
+ OMP_NUM_THREADS: 8
20
+ MKL_NUM_THREADS: 8
21
+ HF_HUB_ENABLE_HF_TRANSFER: 1
22
+ PYTEST_TIMEOUT: 600
23
+ RUN_SLOW: no
24
+
25
+ jobs:
26
+ run_fast_tests:
27
+ strategy:
28
+ fail-fast: false
29
+ matrix:
30
+ config:
31
+ - name: Fast PyTorch CPU tests on Ubuntu
32
+ framework: pytorch
33
+ runner: aws-general-8-plus
34
+ image: diffusers/diffusers-pytorch-cpu
35
+ report: torch_cpu
36
+ - name: PyTorch Example CPU tests on Ubuntu
37
+ framework: pytorch_examples
38
+ runner: aws-general-8-plus
39
+ image: diffusers/diffusers-pytorch-cpu
40
+ report: torch_example_cpu
41
+
42
+ name: ${{ matrix.config.name }}
43
+
44
+ runs-on:
45
+ group: ${{ matrix.config.runner }}
46
+
47
+ container:
48
+ image: ${{ matrix.config.image }}
49
+ options: --shm-size "16gb" --ipc host -v /mnt/hf_cache:/mnt/cache/
50
+
51
+ defaults:
52
+ run:
53
+ shell: bash
54
+
55
+ steps:
56
+ - name: Checkout diffusers
57
+ uses: actions/checkout@v3
58
+ with:
59
+ fetch-depth: 2
60
+
61
+ - name: Install dependencies
62
+ run: |
63
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
64
+ python -m uv pip install -e [quality,test]
65
+
66
+ - name: Environment
67
+ run: |
68
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
69
+ python utils/print_env.py
70
+
71
+ - name: Run fast PyTorch CPU tests
72
+ if: ${{ matrix.config.framework == 'pytorch' }}
73
+ run: |
74
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
75
+ python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
76
+ -s -v -k "not Flax and not Onnx" \
77
+ --make-reports=tests_${{ matrix.config.report }} \
78
+ tests/
79
+
80
+ - name: Run example PyTorch CPU tests
81
+ if: ${{ matrix.config.framework == 'pytorch_examples' }}
82
+ run: |
83
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
84
+ python -m uv pip install peft timm
85
+ python -m pytest -n 4 --max-worker-restart=0 --dist=loadfile \
86
+ --make-reports=tests_${{ matrix.config.report }} \
87
+ examples
88
+
89
+ - name: Failure short reports
90
+ if: ${{ failure() }}
91
+ run: cat reports/tests_${{ matrix.config.report }}_failures_short.txt
92
+
93
+ - name: Test suite reports artifacts
94
+ if: ${{ always() }}
95
+ uses: actions/upload-artifact@v4
96
+ with:
97
+ name: pr_${{ matrix.config.report }}_test_reports
98
+ path: reports
diffusers/.github/workflows/push_tests_mps.yml ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Fast mps tests on main
2
+
3
+ on:
4
+ workflow_dispatch:
5
+
6
+ env:
7
+ DIFFUSERS_IS_CI: yes
8
+ HF_HOME: /mnt/cache
9
+ OMP_NUM_THREADS: 8
10
+ MKL_NUM_THREADS: 8
11
+ HF_HUB_ENABLE_HF_TRANSFER: 1
12
+ PYTEST_TIMEOUT: 600
13
+ RUN_SLOW: no
14
+
15
+ concurrency:
16
+ group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
17
+ cancel-in-progress: true
18
+
19
+ jobs:
20
+ run_fast_tests_apple_m1:
21
+ name: Fast PyTorch MPS tests on MacOS
22
+ runs-on: macos-13-xlarge
23
+
24
+ steps:
25
+ - name: Checkout diffusers
26
+ uses: actions/checkout@v3
27
+ with:
28
+ fetch-depth: 2
29
+
30
+ - name: Clean checkout
31
+ shell: arch -arch arm64 bash {0}
32
+ run: |
33
+ git clean -fxd
34
+
35
+ - name: Setup miniconda
36
+ uses: ./.github/actions/setup-miniconda
37
+ with:
38
+ python-version: 3.9
39
+
40
+ - name: Install dependencies
41
+ shell: arch -arch arm64 bash {0}
42
+ run: |
43
+ ${CONDA_RUN} python -m pip install --upgrade pip uv
44
+ ${CONDA_RUN} python -m uv pip install -e ".[quality,test]"
45
+ ${CONDA_RUN} python -m uv pip install torch torchvision torchaudio
46
+ ${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
47
+ ${CONDA_RUN} python -m uv pip install transformers --upgrade
48
+
49
+ - name: Environment
50
+ shell: arch -arch arm64 bash {0}
51
+ run: |
52
+ ${CONDA_RUN} python utils/print_env.py
53
+
54
+ - name: Run fast PyTorch tests on M1 (MPS)
55
+ shell: arch -arch arm64 bash {0}
56
+ env:
57
+ HF_HOME: /System/Volumes/Data/mnt/cache
58
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
59
+ run: |
60
+ ${CONDA_RUN} python -m pytest -n 0 -s -v --make-reports=tests_torch_mps tests/
61
+
62
+ - name: Failure short reports
63
+ if: ${{ failure() }}
64
+ run: cat reports/tests_torch_mps_failures_short.txt
65
+
66
+ - name: Test suite reports artifacts
67
+ if: ${{ always() }}
68
+ uses: actions/upload-artifact@v4
69
+ with:
70
+ name: pr_torch_mps_test_reports
71
+ path: reports
diffusers/.github/workflows/pypi_publish.yaml ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Adapted from https://blog.deepjyoti30.dev/pypi-release-github-action
2
+
3
+ name: PyPI release
4
+
5
+ on:
6
+ workflow_dispatch:
7
+ push:
8
+ tags:
9
+ - "*"
10
+
11
+ jobs:
12
+ find-and-checkout-latest-branch:
13
+ runs-on: ubuntu-22.04
14
+ outputs:
15
+ latest_branch: ${{ steps.set_latest_branch.outputs.latest_branch }}
16
+ steps:
17
+ - name: Checkout Repo
18
+ uses: actions/checkout@v3
19
+
20
+ - name: Set up Python
21
+ uses: actions/setup-python@v4
22
+ with:
23
+ python-version: '3.8'
24
+
25
+ - name: Fetch latest branch
26
+ id: fetch_latest_branch
27
+ run: |
28
+ pip install -U requests packaging
29
+ LATEST_BRANCH=$(python utils/fetch_latest_release_branch.py)
30
+ echo "Latest branch: $LATEST_BRANCH"
31
+ echo "latest_branch=$LATEST_BRANCH" >> $GITHUB_ENV
32
+
33
+ - name: Set latest branch output
34
+ id: set_latest_branch
35
+ run: echo "::set-output name=latest_branch::${{ env.latest_branch }}"
36
+
37
+ release:
38
+ needs: find-and-checkout-latest-branch
39
+ runs-on: ubuntu-22.04
40
+
41
+ steps:
42
+ - name: Checkout Repo
43
+ uses: actions/checkout@v3
44
+ with:
45
+ ref: ${{ needs.find-and-checkout-latest-branch.outputs.latest_branch }}
46
+
47
+ - name: Setup Python
48
+ uses: actions/setup-python@v4
49
+ with:
50
+ python-version: "3.8"
51
+
52
+ - name: Install dependencies
53
+ run: |
54
+ python -m pip install --upgrade pip
55
+ pip install -U setuptools wheel twine
56
+ pip install -U torch --index-url https://download.pytorch.org/whl/cpu
57
+ pip install -U transformers
58
+
59
+ - name: Build the dist files
60
+ run: python setup.py bdist_wheel && python setup.py sdist
61
+
62
+ - name: Publish to the test PyPI
63
+ env:
64
+ TWINE_USERNAME: ${{ secrets.TEST_PYPI_USERNAME }}
65
+ TWINE_PASSWORD: ${{ secrets.TEST_PYPI_PASSWORD }}
66
+ run: twine upload dist/* -r pypitest --repository-url=https://test.pypi.org/legacy/
67
+
68
+ - name: Test installing diffusers and importing
69
+ run: |
70
+ pip install diffusers && pip uninstall diffusers -y
71
+ pip install -i https://test.pypi.org/simple/ diffusers
72
+ python -c "from diffusers import __version__; print(__version__)"
73
+ python -c "from diffusers import DiffusionPipeline; pipe = DiffusionPipeline.from_pretrained('fusing/unet-ldm-dummy-update'); pipe()"
74
+ python -c "from diffusers import DiffusionPipeline; pipe = DiffusionPipeline.from_pretrained('hf-internal-testing/tiny-stable-diffusion-pipe', safety_checker=None); pipe('ah suh du')"
75
+ python -c "from diffusers import *"
76
+
77
+ - name: Publish to PyPI
78
+ env:
79
+ TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
80
+ TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
81
+ run: twine upload dist/* -r pypi
diffusers/.github/workflows/release_tests_fast.yml ADDED
@@ -0,0 +1,351 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Duplicate workflow to push_tests.yml that is meant to run on release/patch branches as a final check
2
+ # Creating a duplicate workflow here is simpler than adding complex path/branch parsing logic to push_tests.yml
3
+ # Needs to be updated if push_tests.yml updated
4
+ name: (Release) Fast GPU Tests on main
5
+
6
+ on:
7
+ push:
8
+ branches:
9
+ - "v*.*.*-release"
10
+ - "v*.*.*-patch"
11
+
12
+ env:
13
+ DIFFUSERS_IS_CI: yes
14
+ OMP_NUM_THREADS: 8
15
+ MKL_NUM_THREADS: 8
16
+ PYTEST_TIMEOUT: 600
17
+ PIPELINE_USAGE_CUTOFF: 50000
18
+
19
+ jobs:
20
+ setup_torch_cuda_pipeline_matrix:
21
+ name: Setup Torch Pipelines CUDA Slow Tests Matrix
22
+ runs-on:
23
+ group: aws-general-8-plus
24
+ container:
25
+ image: diffusers/diffusers-pytorch-cpu
26
+ outputs:
27
+ pipeline_test_matrix: ${{ steps.fetch_pipeline_matrix.outputs.pipeline_test_matrix }}
28
+ steps:
29
+ - name: Checkout diffusers
30
+ uses: actions/checkout@v3
31
+ with:
32
+ fetch-depth: 2
33
+ - name: Install dependencies
34
+ run: |
35
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
36
+ python -m uv pip install -e [quality,test]
37
+ - name: Environment
38
+ run: |
39
+ python utils/print_env.py
40
+ - name: Fetch Pipeline Matrix
41
+ id: fetch_pipeline_matrix
42
+ run: |
43
+ matrix=$(python utils/fetch_torch_cuda_pipeline_test_matrix.py)
44
+ echo $matrix
45
+ echo "pipeline_test_matrix=$matrix" >> $GITHUB_OUTPUT
46
+ - name: Pipeline Tests Artifacts
47
+ if: ${{ always() }}
48
+ uses: actions/upload-artifact@v4
49
+ with:
50
+ name: test-pipelines.json
51
+ path: reports
52
+
53
+ torch_pipelines_cuda_tests:
54
+ name: Torch Pipelines CUDA Tests
55
+ needs: setup_torch_cuda_pipeline_matrix
56
+ strategy:
57
+ fail-fast: false
58
+ max-parallel: 8
59
+ matrix:
60
+ module: ${{ fromJson(needs.setup_torch_cuda_pipeline_matrix.outputs.pipeline_test_matrix) }}
61
+ runs-on:
62
+ group: aws-g4dn-2xlarge
63
+ container:
64
+ image: diffusers/diffusers-pytorch-cuda
65
+ options: --shm-size "16gb" --ipc host --gpus 0
66
+ steps:
67
+ - name: Checkout diffusers
68
+ uses: actions/checkout@v3
69
+ with:
70
+ fetch-depth: 2
71
+ - name: NVIDIA-SMI
72
+ run: |
73
+ nvidia-smi
74
+ - name: Install dependencies
75
+ run: |
76
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
77
+ python -m uv pip install -e [quality,test]
78
+ pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
79
+ - name: Environment
80
+ run: |
81
+ python utils/print_env.py
82
+ - name: Slow PyTorch CUDA checkpoint tests on Ubuntu
83
+ env:
84
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
85
+ # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
86
+ CUBLAS_WORKSPACE_CONFIG: :16:8
87
+ run: |
88
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
89
+ -s -v -k "not Flax and not Onnx" \
90
+ --make-reports=tests_pipeline_${{ matrix.module }}_cuda \
91
+ tests/pipelines/${{ matrix.module }}
92
+ - name: Failure short reports
93
+ if: ${{ failure() }}
94
+ run: |
95
+ cat reports/tests_pipeline_${{ matrix.module }}_cuda_stats.txt
96
+ cat reports/tests_pipeline_${{ matrix.module }}_cuda_failures_short.txt
97
+ - name: Test suite reports artifacts
98
+ if: ${{ always() }}
99
+ uses: actions/upload-artifact@v4
100
+ with:
101
+ name: pipeline_${{ matrix.module }}_test_reports
102
+ path: reports
103
+
104
+ torch_cuda_tests:
105
+ name: Torch CUDA Tests
106
+ runs-on:
107
+ group: aws-g4dn-2xlarge
108
+ container:
109
+ image: diffusers/diffusers-pytorch-cuda
110
+ options: --shm-size "16gb" --ipc host --gpus 0
111
+ defaults:
112
+ run:
113
+ shell: bash
114
+ strategy:
115
+ fail-fast: false
116
+ max-parallel: 2
117
+ matrix:
118
+ module: [models, schedulers, lora, others, single_file]
119
+ steps:
120
+ - name: Checkout diffusers
121
+ uses: actions/checkout@v3
122
+ with:
123
+ fetch-depth: 2
124
+
125
+ - name: Install dependencies
126
+ run: |
127
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
128
+ python -m uv pip install -e [quality,test]
129
+ python -m uv pip install peft@git+https://github.com/huggingface/peft.git
130
+ pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
131
+
132
+ - name: Environment
133
+ run: |
134
+ python utils/print_env.py
135
+
136
+ - name: Run PyTorch CUDA tests
137
+ env:
138
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
139
+ # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
140
+ CUBLAS_WORKSPACE_CONFIG: :16:8
141
+ run: |
142
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
143
+ -s -v -k "not Flax and not Onnx" \
144
+ --make-reports=tests_torch_${{ matrix.module }}_cuda \
145
+ tests/${{ matrix.module }}
146
+
147
+ - name: Failure short reports
148
+ if: ${{ failure() }}
149
+ run: |
150
+ cat reports/tests_torch_${{ matrix.module }}_cuda_stats.txt
151
+ cat reports/tests_torch_${{ matrix.module }}_cuda_failures_short.txt
152
+
153
+ - name: Test suite reports artifacts
154
+ if: ${{ always() }}
155
+ uses: actions/upload-artifact@v4
156
+ with:
157
+ name: torch_cuda_${{ matrix.module }}_test_reports
158
+ path: reports
159
+
160
+ torch_minimum_version_cuda_tests:
161
+ name: Torch Minimum Version CUDA Tests
162
+ runs-on:
163
+ group: aws-g4dn-2xlarge
164
+ container:
165
+ image: diffusers/diffusers-pytorch-minimum-cuda
166
+ options: --shm-size "16gb" --ipc host --gpus 0
167
+ defaults:
168
+ run:
169
+ shell: bash
170
+ steps:
171
+ - name: Checkout diffusers
172
+ uses: actions/checkout@v3
173
+ with:
174
+ fetch-depth: 2
175
+
176
+ - name: Install dependencies
177
+ run: |
178
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
179
+ python -m uv pip install -e [quality,test]
180
+ python -m uv pip install peft@git+https://github.com/huggingface/peft.git
181
+ pip uninstall accelerate -y && python -m uv pip install -U accelerate@git+https://github.com/huggingface/accelerate.git
182
+
183
+ - name: Environment
184
+ run: |
185
+ python utils/print_env.py
186
+
187
+ - name: Run PyTorch CUDA tests
188
+ env:
189
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
190
+ # https://pytorch.org/docs/stable/notes/randomness.html#avoiding-nondeterministic-algorithms
191
+ CUBLAS_WORKSPACE_CONFIG: :16:8
192
+ run: |
193
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile \
194
+ -s -v -k "not Flax and not Onnx" \
195
+ --make-reports=tests_torch_minimum_cuda \
196
+ tests/models/test_modeling_common.py \
197
+ tests/pipelines/test_pipelines_common.py \
198
+ tests/pipelines/test_pipeline_utils.py \
199
+ tests/pipelines/test_pipelines.py \
200
+ tests/pipelines/test_pipelines_auto.py \
201
+ tests/schedulers/test_schedulers.py \
202
+ tests/others
203
+
204
+ - name: Failure short reports
205
+ if: ${{ failure() }}
206
+ run: |
207
+ cat reports/tests_torch_minimum_version_cuda_stats.txt
208
+ cat reports/tests_torch_minimum_version_cuda_failures_short.txt
209
+
210
+ - name: Test suite reports artifacts
211
+ if: ${{ always() }}
212
+ uses: actions/upload-artifact@v4
213
+ with:
214
+ name: torch_minimum_version_cuda_test_reports
215
+ path: reports
216
+
217
+ run_torch_compile_tests:
218
+ name: PyTorch Compile CUDA tests
219
+
220
+ runs-on:
221
+ group: aws-g4dn-2xlarge
222
+
223
+ container:
224
+ image: diffusers/diffusers-pytorch-cuda
225
+ options: --gpus 0 --shm-size "16gb" --ipc host
226
+
227
+ steps:
228
+ - name: Checkout diffusers
229
+ uses: actions/checkout@v3
230
+ with:
231
+ fetch-depth: 2
232
+
233
+ - name: NVIDIA-SMI
234
+ run: |
235
+ nvidia-smi
236
+ - name: Install dependencies
237
+ run: |
238
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
239
+ python -m uv pip install -e [quality,test,training]
240
+ - name: Environment
241
+ run: |
242
+ python utils/print_env.py
243
+ - name: Run torch compile tests on GPU
244
+ env:
245
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
246
+ RUN_COMPILE: yes
247
+ run: |
248
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "compile" --make-reports=tests_torch_compile_cuda tests/
249
+ - name: Failure short reports
250
+ if: ${{ failure() }}
251
+ run: cat reports/tests_torch_compile_cuda_failures_short.txt
252
+
253
+ - name: Test suite reports artifacts
254
+ if: ${{ always() }}
255
+ uses: actions/upload-artifact@v4
256
+ with:
257
+ name: torch_compile_test_reports
258
+ path: reports
259
+
260
+ run_xformers_tests:
261
+ name: PyTorch xformers CUDA tests
262
+
263
+ runs-on:
264
+ group: aws-g4dn-2xlarge
265
+
266
+ container:
267
+ image: diffusers/diffusers-pytorch-xformers-cuda
268
+ options: --gpus 0 --shm-size "16gb" --ipc host
269
+
270
+ steps:
271
+ - name: Checkout diffusers
272
+ uses: actions/checkout@v3
273
+ with:
274
+ fetch-depth: 2
275
+
276
+ - name: NVIDIA-SMI
277
+ run: |
278
+ nvidia-smi
279
+ - name: Install dependencies
280
+ run: |
281
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
282
+ python -m uv pip install -e [quality,test,training]
283
+ - name: Environment
284
+ run: |
285
+ python utils/print_env.py
286
+ - name: Run example tests on GPU
287
+ env:
288
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
289
+ run: |
290
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v -k "xformers" --make-reports=tests_torch_xformers_cuda tests/
291
+ - name: Failure short reports
292
+ if: ${{ failure() }}
293
+ run: cat reports/tests_torch_xformers_cuda_failures_short.txt
294
+
295
+ - name: Test suite reports artifacts
296
+ if: ${{ always() }}
297
+ uses: actions/upload-artifact@v4
298
+ with:
299
+ name: torch_xformers_test_reports
300
+ path: reports
301
+
302
+ run_examples_tests:
303
+ name: Examples PyTorch CUDA tests on Ubuntu
304
+
305
+ runs-on:
306
+ group: aws-g4dn-2xlarge
307
+
308
+ container:
309
+ image: diffusers/diffusers-pytorch-cuda
310
+ options: --gpus 0 --shm-size "16gb" --ipc host
311
+
312
+ steps:
313
+ - name: Checkout diffusers
314
+ uses: actions/checkout@v3
315
+ with:
316
+ fetch-depth: 2
317
+
318
+ - name: NVIDIA-SMI
319
+ run: |
320
+ nvidia-smi
321
+
322
+ - name: Install dependencies
323
+ run: |
324
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
325
+ python -m uv pip install -e [quality,test,training]
326
+
327
+ - name: Environment
328
+ run: |
329
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
330
+ python utils/print_env.py
331
+
332
+ - name: Run example tests on GPU
333
+ env:
334
+ HF_TOKEN: ${{ secrets.DIFFUSERS_HF_HUB_READ_TOKEN }}
335
+ run: |
336
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
337
+ python -m uv pip install timm
338
+ python -m pytest -n 1 --max-worker-restart=0 --dist=loadfile -s -v --make-reports=examples_torch_cuda examples/
339
+
340
+ - name: Failure short reports
341
+ if: ${{ failure() }}
342
+ run: |
343
+ cat reports/examples_torch_cuda_stats.txt
344
+ cat reports/examples_torch_cuda_failures_short.txt
345
+
346
+ - name: Test suite reports artifacts
347
+ if: ${{ always() }}
348
+ uses: actions/upload-artifact@v4
349
+ with:
350
+ name: examples_test_reports
351
+ path: reports
diffusers/.github/workflows/run_tests_from_a_pr.yml ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Check running SLOW tests from a PR (only GPU)
2
+
3
+ on:
4
+ workflow_dispatch:
5
+ inputs:
6
+ docker_image:
7
+ default: 'diffusers/diffusers-pytorch-cuda'
8
+ description: 'Name of the Docker image'
9
+ required: true
10
+ pr_number:
11
+ description: 'PR number to test on'
12
+ required: true
13
+ test:
14
+ description: 'Tests to run (e.g.: `tests/models`).'
15
+ required: true
16
+
17
+ env:
18
+ DIFFUSERS_IS_CI: yes
19
+ IS_GITHUB_CI: "1"
20
+ HF_HOME: /mnt/cache
21
+ OMP_NUM_THREADS: 8
22
+ MKL_NUM_THREADS: 8
23
+ PYTEST_TIMEOUT: 600
24
+ RUN_SLOW: yes
25
+
26
+ jobs:
27
+ run_tests:
28
+ name: "Run a test on our runner from a PR"
29
+ runs-on:
30
+ group: aws-g4dn-2xlarge
31
+ container:
32
+ image: ${{ github.event.inputs.docker_image }}
33
+ options: --gpus 0 --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
34
+
35
+ steps:
36
+ - name: Validate test files input
37
+ id: validate_test_files
38
+ env:
39
+ PY_TEST: ${{ github.event.inputs.test }}
40
+ run: |
41
+ if [[ ! "$PY_TEST" =~ ^tests/ ]]; then
42
+ echo "Error: The input string must start with 'tests/'."
43
+ exit 1
44
+ fi
45
+
46
+ if [[ ! "$PY_TEST" =~ ^tests/(models|pipelines|lora) ]]; then
47
+ echo "Error: The input string must contain either 'models', 'pipelines', or 'lora' after 'tests/'."
48
+ exit 1
49
+ fi
50
+
51
+ if [[ "$PY_TEST" == *";"* ]]; then
52
+ echo "Error: The input string must not contain ';'."
53
+ exit 1
54
+ fi
55
+ echo "$PY_TEST"
56
+
57
+ shell: bash -e {0}
58
+
59
+ - name: Checkout PR branch
60
+ uses: actions/checkout@v4
61
+ with:
62
+ ref: refs/pull/${{ inputs.pr_number }}/head
63
+
64
+ - name: Install pytest
65
+ run: |
66
+ python -m venv /opt/venv && export PATH="/opt/venv/bin:$PATH"
67
+ python -m uv pip install -e [quality,test]
68
+ python -m uv pip install peft
69
+
70
+ - name: Run tests
71
+ env:
72
+ PY_TEST: ${{ github.event.inputs.test }}
73
+ run: |
74
+ pytest "$PY_TEST"
diffusers/.github/workflows/ssh-pr-runner.yml ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: SSH into PR runners
2
+
3
+ on:
4
+ workflow_dispatch:
5
+ inputs:
6
+ docker_image:
7
+ description: 'Name of the Docker image'
8
+ required: true
9
+
10
+ env:
11
+ IS_GITHUB_CI: "1"
12
+ HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
13
+ HF_HOME: /mnt/cache
14
+ DIFFUSERS_IS_CI: yes
15
+ OMP_NUM_THREADS: 8
16
+ MKL_NUM_THREADS: 8
17
+ RUN_SLOW: yes
18
+
19
+ jobs:
20
+ ssh_runner:
21
+ name: "SSH"
22
+ runs-on:
23
+ group: aws-highmemory-32-plus
24
+ container:
25
+ image: ${{ github.event.inputs.docker_image }}
26
+ options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface/diffusers:/mnt/cache/ --privileged
27
+
28
+ steps:
29
+ - name: Checkout diffusers
30
+ uses: actions/checkout@v3
31
+ with:
32
+ fetch-depth: 2
33
+
34
+ - name: Tailscale # In order to be able to SSH when a test fails
35
+ uses: huggingface/tailscale-action@main
36
+ with:
37
+ authkey: ${{ secrets.TAILSCALE_SSH_AUTHKEY }}
38
+ slackChannel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}
39
+ slackToken: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
40
+ waitForSSH: true
diffusers/.github/workflows/ssh-runner.yml ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: SSH into GPU runners
2
+
3
+ on:
4
+ workflow_dispatch:
5
+ inputs:
6
+ runner_type:
7
+ description: 'Type of runner to test (aws-g6-4xlarge-plus: a10, aws-g4dn-2xlarge: t4, aws-g6e-xlarge-plus: L40)'
8
+ type: choice
9
+ required: true
10
+ options:
11
+ - aws-g6-4xlarge-plus
12
+ - aws-g4dn-2xlarge
13
+ - aws-g6e-xlarge-plus
14
+ docker_image:
15
+ description: 'Name of the Docker image'
16
+ required: true
17
+
18
+ env:
19
+ IS_GITHUB_CI: "1"
20
+ HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
21
+ HF_HOME: /mnt/cache
22
+ DIFFUSERS_IS_CI: yes
23
+ OMP_NUM_THREADS: 8
24
+ MKL_NUM_THREADS: 8
25
+ RUN_SLOW: yes
26
+
27
+ jobs:
28
+ ssh_runner:
29
+ name: "SSH"
30
+ runs-on:
31
+ group: "${{ github.event.inputs.runner_type }}"
32
+ container:
33
+ image: ${{ github.event.inputs.docker_image }}
34
+ options: --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface/diffusers:/mnt/cache/ --gpus 0 --privileged
35
+
36
+ steps:
37
+ - name: Checkout diffusers
38
+ uses: actions/checkout@v3
39
+ with:
40
+ fetch-depth: 2
41
+
42
+ - name: NVIDIA-SMI
43
+ run: |
44
+ nvidia-smi
45
+
46
+ - name: Tailscale # In order to be able to SSH when a test fails
47
+ uses: huggingface/tailscale-action@main
48
+ with:
49
+ authkey: ${{ secrets.TAILSCALE_SSH_AUTHKEY }}
50
+ slackChannel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}
51
+ slackToken: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
52
+ waitForSSH: true
diffusers/.github/workflows/stale.yml ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Stale Bot
2
+
3
+ on:
4
+ schedule:
5
+ - cron: "0 15 * * *"
6
+
7
+ jobs:
8
+ close_stale_issues:
9
+ name: Close Stale Issues
10
+ if: github.repository == 'huggingface/diffusers'
11
+ runs-on: ubuntu-22.04
12
+ permissions:
13
+ issues: write
14
+ pull-requests: write
15
+ env:
16
+ GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
17
+ steps:
18
+ - uses: actions/checkout@v2
19
+
20
+ - name: Setup Python
21
+ uses: actions/setup-python@v1
22
+ with:
23
+ python-version: 3.8
24
+
25
+ - name: Install requirements
26
+ run: |
27
+ pip install PyGithub
28
+ - name: Close stale issues
29
+ run: |
30
+ python utils/stale.py
diffusers/.github/workflows/trufflehog.yml ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ on:
2
+ push:
3
+
4
+ name: Secret Leaks
5
+
6
+ jobs:
7
+ trufflehog:
8
+ runs-on: ubuntu-22.04
9
+ steps:
10
+ - name: Checkout code
11
+ uses: actions/checkout@v4
12
+ with:
13
+ fetch-depth: 0
14
+ - name: Secret Scanning
15
+ uses: trufflesecurity/trufflehog@main
16
+ with:
17
+ extra_args: --results=verified,unknown
18
+
diffusers/.github/workflows/typos.yml ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Check typos
2
+
3
+ on:
4
+ workflow_dispatch:
5
+
6
+ jobs:
7
+ build:
8
+ runs-on: ubuntu-22.04
9
+
10
+ steps:
11
+ - uses: actions/checkout@v3
12
+
13
+ - name: typos-action
14
+ uses: crate-ci/typos@v1.12.4
diffusers/.github/workflows/update_metadata.yml ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Update Diffusers metadata
2
+
3
+ on:
4
+ workflow_dispatch:
5
+ push:
6
+ branches:
7
+ - main
8
+ - update_diffusers_metadata*
9
+
10
+ jobs:
11
+ update_metadata:
12
+ runs-on: ubuntu-22.04
13
+ defaults:
14
+ run:
15
+ shell: bash -l {0}
16
+
17
+ steps:
18
+ - uses: actions/checkout@v3
19
+
20
+ - name: Setup environment
21
+ run: |
22
+ pip install --upgrade pip
23
+ pip install datasets pandas
24
+ pip install .[torch]
25
+
26
+ - name: Update metadata
27
+ env:
28
+ HF_TOKEN: ${{ secrets.SAYAK_HF_TOKEN }}
29
+ run: |
30
+ python utils/update_metadata.py --commit_sha ${{ github.sha }}
diffusers/.github/workflows/upload_pr_documentation.yml ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Upload PR Documentation
2
+
3
+ on:
4
+ workflow_run:
5
+ workflows: ["Build PR Documentation"]
6
+ types:
7
+ - completed
8
+
9
+ jobs:
10
+ build:
11
+ uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@main
12
+ with:
13
+ package_name: diffusers
14
+ secrets:
15
+ hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
16
+ comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}
diffusers/docs/source/_config.py ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ # docstyle-ignore
2
+ INSTALL_CONTENT = """
3
+ # Diffusers installation
4
+ ! pip install diffusers transformers datasets accelerate
5
+ # To install from source instead of the last release, comment the command above and uncomment the following one.
6
+ # ! pip install git+https://github.com/huggingface/diffusers.git
7
+ """
8
+
9
+ notebook_first_cells = [{"type": "code", "content": INSTALL_CONTENT}]
diffusers/docs/source/en/_toctree.yml ADDED
@@ -0,0 +1,701 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ - sections:
2
+ - local: index
3
+ title: 🧨 Diffusers
4
+ - local: quicktour
5
+ title: Quicktour
6
+ - local: stable_diffusion
7
+ title: Effective and efficient diffusion
8
+ - local: installation
9
+ title: Installation
10
+ title: Get started
11
+ - sections:
12
+ - local: tutorials/tutorial_overview
13
+ title: Overview
14
+ - local: using-diffusers/write_own_pipeline
15
+ title: Understanding pipelines, models and schedulers
16
+ - local: tutorials/autopipeline
17
+ title: AutoPipeline
18
+ - local: tutorials/basic_training
19
+ title: Train a diffusion model
20
+ title: Tutorials
21
+ - sections:
22
+ - local: using-diffusers/loading
23
+ title: Load pipelines
24
+ - local: using-diffusers/custom_pipeline_overview
25
+ title: Load community pipelines and components
26
+ - local: using-diffusers/schedulers
27
+ title: Load schedulers and models
28
+ - local: using-diffusers/other-formats
29
+ title: Model files and layouts
30
+ - local: using-diffusers/push_to_hub
31
+ title: Push files to the Hub
32
+ title: Load pipelines and adapters
33
+ - sections:
34
+ - local: tutorials/using_peft_for_inference
35
+ title: LoRA
36
+ - local: using-diffusers/ip_adapter
37
+ title: IP-Adapter
38
+ - local: using-diffusers/controlnet
39
+ title: ControlNet
40
+ - local: using-diffusers/t2i_adapter
41
+ title: T2I-Adapter
42
+ - local: using-diffusers/dreambooth
43
+ title: DreamBooth
44
+ - local: using-diffusers/textual_inversion_inference
45
+ title: Textual inversion
46
+ title: Adapters
47
+ isExpanded: false
48
+ - sections:
49
+ - local: using-diffusers/unconditional_image_generation
50
+ title: Unconditional image generation
51
+ - local: using-diffusers/conditional_image_generation
52
+ title: Text-to-image
53
+ - local: using-diffusers/img2img
54
+ title: Image-to-image
55
+ - local: using-diffusers/inpaint
56
+ title: Inpainting
57
+ - local: using-diffusers/text-img2vid
58
+ title: Video generation
59
+ - local: using-diffusers/depth2img
60
+ title: Depth-to-image
61
+ title: Generative tasks
62
+ - sections:
63
+ - local: using-diffusers/overview_techniques
64
+ title: Overview
65
+ - local: using-diffusers/create_a_server
66
+ title: Create a server
67
+ - local: using-diffusers/batched_inference
68
+ title: Batch inference
69
+ - local: training/distributed_inference
70
+ title: Distributed inference
71
+ - local: using-diffusers/scheduler_features
72
+ title: Scheduler features
73
+ - local: using-diffusers/callback
74
+ title: Pipeline callbacks
75
+ - local: using-diffusers/reusing_seeds
76
+ title: Reproducible pipelines
77
+ - local: using-diffusers/image_quality
78
+ title: Controlling image quality
79
+ - local: using-diffusers/weighted_prompts
80
+ title: Prompt techniques
81
+ title: Inference techniques
82
+ - sections:
83
+ - local: advanced_inference/outpaint
84
+ title: Outpainting
85
+ title: Advanced inference
86
+ - sections:
87
+ - local: hybrid_inference/overview
88
+ title: Overview
89
+ - local: hybrid_inference/vae_decode
90
+ title: VAE Decode
91
+ - local: hybrid_inference/vae_encode
92
+ title: VAE Encode
93
+ - local: hybrid_inference/api_reference
94
+ title: API Reference
95
+ title: Hybrid Inference
96
+ - sections:
97
+ - local: modular_diffusers/overview
98
+ title: Overview
99
+ - local: modular_diffusers/modular_pipeline
100
+ title: Modular Pipeline
101
+ - local: modular_diffusers/components_manager
102
+ title: Components Manager
103
+ - local: modular_diffusers/modular_diffusers_states
104
+ title: Modular Diffusers States
105
+ - local: modular_diffusers/pipeline_block
106
+ title: Pipeline Block
107
+ - local: modular_diffusers/sequential_pipeline_blocks
108
+ title: Sequential Pipeline Blocks
109
+ - local: modular_diffusers/loop_sequential_pipeline_blocks
110
+ title: Loop Sequential Pipeline Blocks
111
+ - local: modular_diffusers/auto_pipeline_blocks
112
+ title: Auto Pipeline Blocks
113
+ - local: modular_diffusers/end_to_end_guide
114
+ title: End-to-End Example
115
+ title: Modular Diffusers
116
+ - sections:
117
+ - local: using-diffusers/consisid
118
+ title: ConsisID
119
+ - local: using-diffusers/sdxl
120
+ title: Stable Diffusion XL
121
+ - local: using-diffusers/sdxl_turbo
122
+ title: SDXL Turbo
123
+ - local: using-diffusers/kandinsky
124
+ title: Kandinsky
125
+ - local: using-diffusers/omnigen
126
+ title: OmniGen
127
+ - local: using-diffusers/pag
128
+ title: PAG
129
+ - local: using-diffusers/inference_with_lcm
130
+ title: Latent Consistency Model
131
+ - local: using-diffusers/shap-e
132
+ title: Shap-E
133
+ - local: using-diffusers/diffedit
134
+ title: DiffEdit
135
+ - local: using-diffusers/inference_with_tcd_lora
136
+ title: Trajectory Consistency Distillation-LoRA
137
+ - local: using-diffusers/svd
138
+ title: Stable Video Diffusion
139
+ - local: using-diffusers/marigold_usage
140
+ title: Marigold Computer Vision
141
+ title: Specific pipeline examples
142
+ - sections:
143
+ - local: training/overview
144
+ title: Overview
145
+ - local: training/create_dataset
146
+ title: Create a dataset for training
147
+ - local: training/adapt_a_model
148
+ title: Adapt a model to a new task
149
+ - isExpanded: false
150
+ sections:
151
+ - local: training/unconditional_training
152
+ title: Unconditional image generation
153
+ - local: training/text2image
154
+ title: Text-to-image
155
+ - local: training/sdxl
156
+ title: Stable Diffusion XL
157
+ - local: training/kandinsky
158
+ title: Kandinsky 2.2
159
+ - local: training/wuerstchen
160
+ title: Wuerstchen
161
+ - local: training/controlnet
162
+ title: ControlNet
163
+ - local: training/t2i_adapters
164
+ title: T2I-Adapters
165
+ - local: training/instructpix2pix
166
+ title: InstructPix2Pix
167
+ - local: training/cogvideox
168
+ title: CogVideoX
169
+ title: Models
170
+ - isExpanded: false
171
+ sections:
172
+ - local: training/text_inversion
173
+ title: Textual Inversion
174
+ - local: training/dreambooth
175
+ title: DreamBooth
176
+ - local: training/lora
177
+ title: LoRA
178
+ - local: training/custom_diffusion
179
+ title: Custom Diffusion
180
+ - local: training/lcm_distill
181
+ title: Latent Consistency Distillation
182
+ - local: training/ddpo
183
+ title: Reinforcement learning training with DDPO
184
+ title: Methods
185
+ title: Training
186
+ - sections:
187
+ - local: quantization/overview
188
+ title: Getting Started
189
+ - local: quantization/bitsandbytes
190
+ title: bitsandbytes
191
+ - local: quantization/gguf
192
+ title: gguf
193
+ - local: quantization/torchao
194
+ title: torchao
195
+ - local: quantization/quanto
196
+ title: quanto
197
+ title: Quantization Methods
198
+ - sections:
199
+ - local: optimization/fp16
200
+ title: Accelerate inference
201
+ - local: optimization/cache
202
+ title: Caching
203
+ - local: optimization/memory
204
+ title: Reduce memory usage
205
+ - local: optimization/speed-memory-optims
206
+ title: Compile and offloading quantized models
207
+ - local: optimization/pruna
208
+ title: Pruna
209
+ - local: optimization/xformers
210
+ title: xFormers
211
+ - local: optimization/tome
212
+ title: Token merging
213
+ - local: optimization/deepcache
214
+ title: DeepCache
215
+ - local: optimization/tgate
216
+ title: TGATE
217
+ - local: optimization/xdit
218
+ title: xDiT
219
+ - local: optimization/para_attn
220
+ title: ParaAttention
221
+ - sections:
222
+ - local: using-diffusers/stable_diffusion_jax_how_to
223
+ title: JAX/Flax
224
+ - local: optimization/onnx
225
+ title: ONNX
226
+ - local: optimization/open_vino
227
+ title: OpenVINO
228
+ - local: optimization/coreml
229
+ title: Core ML
230
+ title: Optimized model formats
231
+ - sections:
232
+ - local: optimization/mps
233
+ title: Metal Performance Shaders (MPS)
234
+ - local: optimization/habana
235
+ title: Intel Gaudi
236
+ - local: optimization/neuron
237
+ title: AWS Neuron
238
+ title: Optimized hardware
239
+ title: Accelerate inference and reduce memory
240
+ - sections:
241
+ - local: conceptual/philosophy
242
+ title: Philosophy
243
+ - local: using-diffusers/controlling_generation
244
+ title: Controlled generation
245
+ - local: conceptual/contribution
246
+ title: How to contribute?
247
+ - local: conceptual/ethical_guidelines
248
+ title: Diffusers' Ethical Guidelines
249
+ - local: conceptual/evaluation
250
+ title: Evaluating Diffusion Models
251
+ title: Conceptual Guides
252
+ - sections:
253
+ - local: community_projects
254
+ title: Projects built with Diffusers
255
+ title: Community Projects
256
+ - sections:
257
+ - isExpanded: false
258
+ sections:
259
+ - local: api/configuration
260
+ title: Configuration
261
+ - local: api/logging
262
+ title: Logging
263
+ - local: api/outputs
264
+ title: Outputs
265
+ - local: api/quantization
266
+ title: Quantization
267
+ title: Main Classes
268
+ - isExpanded: false
269
+ sections:
270
+ - local: api/loaders/ip_adapter
271
+ title: IP-Adapter
272
+ - local: api/loaders/lora
273
+ title: LoRA
274
+ - local: api/loaders/single_file
275
+ title: Single files
276
+ - local: api/loaders/textual_inversion
277
+ title: Textual Inversion
278
+ - local: api/loaders/unet
279
+ title: UNet
280
+ - local: api/loaders/transformer_sd3
281
+ title: SD3Transformer2D
282
+ - local: api/loaders/peft
283
+ title: PEFT
284
+ title: Loaders
285
+ - isExpanded: false
286
+ sections:
287
+ - local: api/models/overview
288
+ title: Overview
289
+ - local: api/models/auto_model
290
+ title: AutoModel
291
+ - sections:
292
+ - local: api/models/controlnet
293
+ title: ControlNetModel
294
+ - local: api/models/controlnet_union
295
+ title: ControlNetUnionModel
296
+ - local: api/models/controlnet_flux
297
+ title: FluxControlNetModel
298
+ - local: api/models/controlnet_hunyuandit
299
+ title: HunyuanDiT2DControlNetModel
300
+ - local: api/models/controlnet_sana
301
+ title: SanaControlNetModel
302
+ - local: api/models/controlnet_sd3
303
+ title: SD3ControlNetModel
304
+ - local: api/models/controlnet_sparsectrl
305
+ title: SparseControlNetModel
306
+ title: ControlNets
307
+ - sections:
308
+ - local: api/models/allegro_transformer3d
309
+ title: AllegroTransformer3DModel
310
+ - local: api/models/aura_flow_transformer2d
311
+ title: AuraFlowTransformer2DModel
312
+ - local: api/models/chroma_transformer
313
+ title: ChromaTransformer2DModel
314
+ - local: api/models/cogvideox_transformer3d
315
+ title: CogVideoXTransformer3DModel
316
+ - local: api/models/cogview3plus_transformer2d
317
+ title: CogView3PlusTransformer2DModel
318
+ - local: api/models/cogview4_transformer2d
319
+ title: CogView4Transformer2DModel
320
+ - local: api/models/consisid_transformer3d
321
+ title: ConsisIDTransformer3DModel
322
+ - local: api/models/cosmos_transformer3d
323
+ title: CosmosTransformer3DModel
324
+ - local: api/models/dit_transformer2d
325
+ title: DiTTransformer2DModel
326
+ - local: api/models/easyanimate_transformer3d
327
+ title: EasyAnimateTransformer3DModel
328
+ - local: api/models/flux_transformer
329
+ title: FluxTransformer2DModel
330
+ - local: api/models/hidream_image_transformer
331
+ title: HiDreamImageTransformer2DModel
332
+ - local: api/models/hunyuan_transformer2d
333
+ title: HunyuanDiT2DModel
334
+ - local: api/models/hunyuan_video_transformer_3d
335
+ title: HunyuanVideoTransformer3DModel
336
+ - local: api/models/latte_transformer3d
337
+ title: LatteTransformer3DModel
338
+ - local: api/models/ltx_video_transformer3d
339
+ title: LTXVideoTransformer3DModel
340
+ - local: api/models/lumina2_transformer2d
341
+ title: Lumina2Transformer2DModel
342
+ - local: api/models/lumina_nextdit2d
343
+ title: LuminaNextDiT2DModel
344
+ - local: api/models/mochi_transformer3d
345
+ title: MochiTransformer3DModel
346
+ - local: api/models/omnigen_transformer
347
+ title: OmniGenTransformer2DModel
348
+ - local: api/models/pixart_transformer2d
349
+ title: PixArtTransformer2DModel
350
+ - local: api/models/prior_transformer
351
+ title: PriorTransformer
352
+ - local: api/models/sana_transformer2d
353
+ title: SanaTransformer2DModel
354
+ - local: api/models/sd3_transformer2d
355
+ title: SD3Transformer2DModel
356
+ - local: api/models/stable_audio_transformer
357
+ title: StableAudioDiTModel
358
+ - local: api/models/transformer2d
359
+ title: Transformer2DModel
360
+ - local: api/models/transformer_temporal
361
+ title: TransformerTemporalModel
362
+ - local: api/models/wan_transformer_3d
363
+ title: WanTransformer3DModel
364
+ title: Transformers
365
+ - sections:
366
+ - local: api/models/stable_cascade_unet
367
+ title: StableCascadeUNet
368
+ - local: api/models/unet
369
+ title: UNet1DModel
370
+ - local: api/models/unet2d-cond
371
+ title: UNet2DConditionModel
372
+ - local: api/models/unet2d
373
+ title: UNet2DModel
374
+ - local: api/models/unet3d-cond
375
+ title: UNet3DConditionModel
376
+ - local: api/models/unet-motion
377
+ title: UNetMotionModel
378
+ - local: api/models/uvit2d
379
+ title: UViT2DModel
380
+ title: UNets
381
+ - sections:
382
+ - local: api/models/asymmetricautoencoderkl
383
+ title: AsymmetricAutoencoderKL
384
+ - local: api/models/autoencoder_dc
385
+ title: AutoencoderDC
386
+ - local: api/models/autoencoderkl
387
+ title: AutoencoderKL
388
+ - local: api/models/autoencoderkl_allegro
389
+ title: AutoencoderKLAllegro
390
+ - local: api/models/autoencoderkl_cogvideox
391
+ title: AutoencoderKLCogVideoX
392
+ - local: api/models/autoencoderkl_cosmos
393
+ title: AutoencoderKLCosmos
394
+ - local: api/models/autoencoder_kl_hunyuan_video
395
+ title: AutoencoderKLHunyuanVideo
396
+ - local: api/models/autoencoderkl_ltx_video
397
+ title: AutoencoderKLLTXVideo
398
+ - local: api/models/autoencoderkl_magvit
399
+ title: AutoencoderKLMagvit
400
+ - local: api/models/autoencoderkl_mochi
401
+ title: AutoencoderKLMochi
402
+ - local: api/models/autoencoder_kl_wan
403
+ title: AutoencoderKLWan
404
+ - local: api/models/consistency_decoder_vae
405
+ title: ConsistencyDecoderVAE
406
+ - local: api/models/autoencoder_oobleck
407
+ title: Oobleck AutoEncoder
408
+ - local: api/models/autoencoder_tiny
409
+ title: Tiny AutoEncoder
410
+ - local: api/models/vq
411
+ title: VQModel
412
+ title: VAEs
413
+ title: Models
414
+ - isExpanded: false
415
+ sections:
416
+ - local: api/pipelines/overview
417
+ title: Overview
418
+ - local: api/pipelines/allegro
419
+ title: Allegro
420
+ - local: api/pipelines/amused
421
+ title: aMUSEd
422
+ - local: api/pipelines/animatediff
423
+ title: AnimateDiff
424
+ - local: api/pipelines/attend_and_excite
425
+ title: Attend-and-Excite
426
+ - local: api/pipelines/audioldm
427
+ title: AudioLDM
428
+ - local: api/pipelines/audioldm2
429
+ title: AudioLDM 2
430
+ - local: api/pipelines/aura_flow
431
+ title: AuraFlow
432
+ - local: api/pipelines/auto_pipeline
433
+ title: AutoPipeline
434
+ - local: api/pipelines/blip_diffusion
435
+ title: BLIP-Diffusion
436
+ - local: api/pipelines/chroma
437
+ title: Chroma
438
+ - local: api/pipelines/cogvideox
439
+ title: CogVideoX
440
+ - local: api/pipelines/cogview3
441
+ title: CogView3
442
+ - local: api/pipelines/cogview4
443
+ title: CogView4
444
+ - local: api/pipelines/consisid
445
+ title: ConsisID
446
+ - local: api/pipelines/consistency_models
447
+ title: Consistency Models
448
+ - local: api/pipelines/controlnet
449
+ title: ControlNet
450
+ - local: api/pipelines/controlnet_flux
451
+ title: ControlNet with Flux.1
452
+ - local: api/pipelines/controlnet_hunyuandit
453
+ title: ControlNet with Hunyuan-DiT
454
+ - local: api/pipelines/controlnet_sd3
455
+ title: ControlNet with Stable Diffusion 3
456
+ - local: api/pipelines/controlnet_sdxl
457
+ title: ControlNet with Stable Diffusion XL
458
+ - local: api/pipelines/controlnet_sana
459
+ title: ControlNet-Sana
460
+ - local: api/pipelines/controlnetxs
461
+ title: ControlNet-XS
462
+ - local: api/pipelines/controlnetxs_sdxl
463
+ title: ControlNet-XS with Stable Diffusion XL
464
+ - local: api/pipelines/controlnet_union
465
+ title: ControlNetUnion
466
+ - local: api/pipelines/cosmos
467
+ title: Cosmos
468
+ - local: api/pipelines/dance_diffusion
469
+ title: Dance Diffusion
470
+ - local: api/pipelines/ddim
471
+ title: DDIM
472
+ - local: api/pipelines/ddpm
473
+ title: DDPM
474
+ - local: api/pipelines/deepfloyd_if
475
+ title: DeepFloyd IF
476
+ - local: api/pipelines/diffedit
477
+ title: DiffEdit
478
+ - local: api/pipelines/dit
479
+ title: DiT
480
+ - local: api/pipelines/easyanimate
481
+ title: EasyAnimate
482
+ - local: api/pipelines/flux
483
+ title: Flux
484
+ - local: api/pipelines/control_flux_inpaint
485
+ title: FluxControlInpaint
486
+ - local: api/pipelines/framepack
487
+ title: Framepack
488
+ - local: api/pipelines/hidream
489
+ title: HiDream-I1
490
+ - local: api/pipelines/hunyuandit
491
+ title: Hunyuan-DiT
492
+ - local: api/pipelines/hunyuan_video
493
+ title: HunyuanVideo
494
+ - local: api/pipelines/i2vgenxl
495
+ title: I2VGen-XL
496
+ - local: api/pipelines/pix2pix
497
+ title: InstructPix2Pix
498
+ - local: api/pipelines/kandinsky
499
+ title: Kandinsky 2.1
500
+ - local: api/pipelines/kandinsky_v22
501
+ title: Kandinsky 2.2
502
+ - local: api/pipelines/kandinsky3
503
+ title: Kandinsky 3
504
+ - local: api/pipelines/kolors
505
+ title: Kolors
506
+ - local: api/pipelines/latent_consistency_models
507
+ title: Latent Consistency Models
508
+ - local: api/pipelines/latent_diffusion
509
+ title: Latent Diffusion
510
+ - local: api/pipelines/latte
511
+ title: Latte
512
+ - local: api/pipelines/ledits_pp
513
+ title: LEDITS++
514
+ - local: api/pipelines/ltx_video
515
+ title: LTXVideo
516
+ - local: api/pipelines/lumina2
517
+ title: Lumina 2.0
518
+ - local: api/pipelines/lumina
519
+ title: Lumina-T2X
520
+ - local: api/pipelines/marigold
521
+ title: Marigold
522
+ - local: api/pipelines/mochi
523
+ title: Mochi
524
+ - local: api/pipelines/panorama
525
+ title: MultiDiffusion
526
+ - local: api/pipelines/musicldm
527
+ title: MusicLDM
528
+ - local: api/pipelines/omnigen
529
+ title: OmniGen
530
+ - local: api/pipelines/pag
531
+ title: PAG
532
+ - local: api/pipelines/paint_by_example
533
+ title: Paint by Example
534
+ - local: api/pipelines/pia
535
+ title: Personalized Image Animator (PIA)
536
+ - local: api/pipelines/pixart
537
+ title: PixArt-α
538
+ - local: api/pipelines/pixart_sigma
539
+ title: PixArt-Σ
540
+ - local: api/pipelines/sana
541
+ title: Sana
542
+ - local: api/pipelines/sana_sprint
543
+ title: Sana Sprint
544
+ - local: api/pipelines/self_attention_guidance
545
+ title: Self-Attention Guidance
546
+ - local: api/pipelines/semantic_stable_diffusion
547
+ title: Semantic Guidance
548
+ - local: api/pipelines/shap_e
549
+ title: Shap-E
550
+ - local: api/pipelines/stable_audio
551
+ title: Stable Audio
552
+ - local: api/pipelines/stable_cascade
553
+ title: Stable Cascade
554
+ - sections:
555
+ - local: api/pipelines/stable_diffusion/overview
556
+ title: Overview
557
+ - local: api/pipelines/stable_diffusion/depth2img
558
+ title: Depth-to-image
559
+ - local: api/pipelines/stable_diffusion/gligen
560
+ title: GLIGEN (Grounded Language-to-Image Generation)
561
+ - local: api/pipelines/stable_diffusion/image_variation
562
+ title: Image variation
563
+ - local: api/pipelines/stable_diffusion/img2img
564
+ title: Image-to-image
565
+ - local: api/pipelines/stable_diffusion/svd
566
+ title: Image-to-video
567
+ - local: api/pipelines/stable_diffusion/inpaint
568
+ title: Inpainting
569
+ - local: api/pipelines/stable_diffusion/k_diffusion
570
+ title: K-Diffusion
571
+ - local: api/pipelines/stable_diffusion/latent_upscale
572
+ title: Latent upscaler
573
+ - local: api/pipelines/stable_diffusion/ldm3d_diffusion
574
+ title: LDM3D Text-to-(RGB, Depth), Text-to-(RGB-pano, Depth-pano), LDM3D Upscaler
575
+ - local: api/pipelines/stable_diffusion/stable_diffusion_safe
576
+ title: Safe Stable Diffusion
577
+ - local: api/pipelines/stable_diffusion/sdxl_turbo
578
+ title: SDXL Turbo
579
+ - local: api/pipelines/stable_diffusion/stable_diffusion_2
580
+ title: Stable Diffusion 2
581
+ - local: api/pipelines/stable_diffusion/stable_diffusion_3
582
+ title: Stable Diffusion 3
583
+ - local: api/pipelines/stable_diffusion/stable_diffusion_xl
584
+ title: Stable Diffusion XL
585
+ - local: api/pipelines/stable_diffusion/upscale
586
+ title: Super-resolution
587
+ - local: api/pipelines/stable_diffusion/adapter
588
+ title: T2I-Adapter
589
+ - local: api/pipelines/stable_diffusion/text2img
590
+ title: Text-to-image
591
+ title: Stable Diffusion
592
+ - local: api/pipelines/stable_unclip
593
+ title: Stable unCLIP
594
+ - local: api/pipelines/text_to_video
595
+ title: Text-to-video
596
+ - local: api/pipelines/text_to_video_zero
597
+ title: Text2Video-Zero
598
+ - local: api/pipelines/unclip
599
+ title: unCLIP
600
+ - local: api/pipelines/unidiffuser
601
+ title: UniDiffuser
602
+ - local: api/pipelines/value_guided_sampling
603
+ title: Value-guided sampling
604
+ - local: api/pipelines/visualcloze
605
+ title: VisualCloze
606
+ - local: api/pipelines/wan
607
+ title: Wan
608
+ - local: api/pipelines/wuerstchen
609
+ title: Wuerstchen
610
+ title: Pipelines
611
+ - isExpanded: false
612
+ sections:
613
+ - local: api/schedulers/overview
614
+ title: Overview
615
+ - local: api/schedulers/cm_stochastic_iterative
616
+ title: CMStochasticIterativeScheduler
617
+ - local: api/schedulers/ddim_cogvideox
618
+ title: CogVideoXDDIMScheduler
619
+ - local: api/schedulers/multistep_dpm_solver_cogvideox
620
+ title: CogVideoXDPMScheduler
621
+ - local: api/schedulers/consistency_decoder
622
+ title: ConsistencyDecoderScheduler
623
+ - local: api/schedulers/cosine_dpm
624
+ title: CosineDPMSolverMultistepScheduler
625
+ - local: api/schedulers/ddim_inverse
626
+ title: DDIMInverseScheduler
627
+ - local: api/schedulers/ddim
628
+ title: DDIMScheduler
629
+ - local: api/schedulers/ddpm
630
+ title: DDPMScheduler
631
+ - local: api/schedulers/deis
632
+ title: DEISMultistepScheduler
633
+ - local: api/schedulers/multistep_dpm_solver_inverse
634
+ title: DPMSolverMultistepInverse
635
+ - local: api/schedulers/multistep_dpm_solver
636
+ title: DPMSolverMultistepScheduler
637
+ - local: api/schedulers/dpm_sde
638
+ title: DPMSolverSDEScheduler
639
+ - local: api/schedulers/singlestep_dpm_solver
640
+ title: DPMSolverSinglestepScheduler
641
+ - local: api/schedulers/edm_multistep_dpm_solver
642
+ title: EDMDPMSolverMultistepScheduler
643
+ - local: api/schedulers/edm_euler
644
+ title: EDMEulerScheduler
645
+ - local: api/schedulers/euler_ancestral
646
+ title: EulerAncestralDiscreteScheduler
647
+ - local: api/schedulers/euler
648
+ title: EulerDiscreteScheduler
649
+ - local: api/schedulers/flow_match_euler_discrete
650
+ title: FlowMatchEulerDiscreteScheduler
651
+ - local: api/schedulers/flow_match_heun_discrete
652
+ title: FlowMatchHeunDiscreteScheduler
653
+ - local: api/schedulers/heun
654
+ title: HeunDiscreteScheduler
655
+ - local: api/schedulers/ipndm
656
+ title: IPNDMScheduler
657
+ - local: api/schedulers/stochastic_karras_ve
658
+ title: KarrasVeScheduler
659
+ - local: api/schedulers/dpm_discrete_ancestral
660
+ title: KDPM2AncestralDiscreteScheduler
661
+ - local: api/schedulers/dpm_discrete
662
+ title: KDPM2DiscreteScheduler
663
+ - local: api/schedulers/lcm
664
+ title: LCMScheduler
665
+ - local: api/schedulers/lms_discrete
666
+ title: LMSDiscreteScheduler
667
+ - local: api/schedulers/pndm
668
+ title: PNDMScheduler
669
+ - local: api/schedulers/repaint
670
+ title: RePaintScheduler
671
+ - local: api/schedulers/score_sde_ve
672
+ title: ScoreSdeVeScheduler
673
+ - local: api/schedulers/score_sde_vp
674
+ title: ScoreSdeVpScheduler
675
+ - local: api/schedulers/tcd
676
+ title: TCDScheduler
677
+ - local: api/schedulers/unipc
678
+ title: UniPCMultistepScheduler
679
+ - local: api/schedulers/vq_diffusion
680
+ title: VQDiffusionScheduler
681
+ title: Schedulers
682
+ - isExpanded: false
683
+ sections:
684
+ - local: api/internal_classes_overview
685
+ title: Overview
686
+ - local: api/attnprocessor
687
+ title: Attention Processor
688
+ - local: api/activations
689
+ title: Custom activation functions
690
+ - local: api/cache
691
+ title: Caching methods
692
+ - local: api/normalization
693
+ title: Custom normalization layers
694
+ - local: api/utilities
695
+ title: Utilities
696
+ - local: api/image_processor
697
+ title: VAE Image Processor
698
+ - local: api/video_processor
699
+ title: Video Processor
700
+ title: Internal classes
701
+ title: API
diffusers/docs/source/en/community_projects.md ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2025 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Community Projects
14
+
15
+ Welcome to Community Projects. This space is dedicated to showcasing the incredible work and innovative applications created by our vibrant community using the `diffusers` library.
16
+
17
+ This section aims to:
18
+
19
+ - Highlight diverse and inspiring projects built with `diffusers`
20
+ - Foster knowledge sharing within our community
21
+ - Provide real-world examples of how `diffusers` can be leveraged
22
+
23
+ Happy exploring, and thank you for being part of the Diffusers community!
24
+
25
+ <table>
26
+ <tr>
27
+ <th>Project Name</th>
28
+ <th>Description</th>
29
+ </tr>
30
+ <tr style="border-top: 2px solid black">
31
+ <td><a href="https://github.com/carson-katri/dream-textures"> dream-textures </a></td>
32
+ <td>Stable Diffusion built-in to Blender</td>
33
+ </tr>
34
+ <tr style="border-top: 2px solid black">
35
+ <td><a href="https://github.com/megvii-research/HiDiffusion"> HiDiffusion </a></td>
36
+ <td>Increases the resolution and speed of your diffusion model by only adding a single line of code</td>
37
+ </tr>
38
+ <tr style="border-top: 2px solid black">
39
+ <td><a href="https://github.com/lllyasviel/IC-Light"> IC-Light </a></td>
40
+ <td>IC-Light is a project to manipulate the illumination of images</td>
41
+ </tr>
42
+ <tr style="border-top: 2px solid black">
43
+ <td><a href="https://github.com/InstantID/InstantID"> InstantID </a></td>
44
+ <td>InstantID : Zero-shot Identity-Preserving Generation in Seconds</td>
45
+ </tr>
46
+ <tr style="border-top: 2px solid black">
47
+ <td><a href="https://github.com/Sanster/IOPaint"> IOPaint </a></td>
48
+ <td>Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.</td>
49
+ </tr>
50
+ <tr style="border-top: 2px solid black">
51
+ <td><a href="https://github.com/bmaltais/kohya_ss"> Kohya </a></td>
52
+ <td>Gradio GUI for Kohya's Stable Diffusion trainers</td>
53
+ </tr>
54
+ <tr style="border-top: 2px solid black">
55
+ <td><a href="https://github.com/magic-research/magic-animate"> MagicAnimate </a></td>
56
+ <td>MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model</td>
57
+ </tr>
58
+ <tr style="border-top: 2px solid black">
59
+ <td><a href="https://github.com/levihsu/OOTDiffusion"> OOTDiffusion </a></td>
60
+ <td>Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on</td>
61
+ </tr>
62
+ <tr style="border-top: 2px solid black">
63
+ <td><a href="https://github.com/vladmandic/automatic"> SD.Next </a></td>
64
+ <td>SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models</td>
65
+ </tr>
66
+ <tr style="border-top: 2px solid black">
67
+ <td><a href="https://github.com/ashawkey/stable-dreamfusion"> stable-dreamfusion </a></td>
68
+ <td>Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion</td>
69
+ </tr>
70
+ <tr style="border-top: 2px solid black">
71
+ <td><a href="https://github.com/HVision-NKU/StoryDiffusion"> StoryDiffusion </a></td>
72
+ <td>StoryDiffusion can create a magic story by generating consistent images and videos.</td>
73
+ </tr>
74
+ <tr style="border-top: 2px solid black">
75
+ <td><a href="https://github.com/cumulo-autumn/StreamDiffusion"> StreamDiffusion </a></td>
76
+ <td>A Pipeline-Level Solution for Real-Time Interactive Generation</td>
77
+ </tr>
78
+ <tr style="border-top: 2px solid black">
79
+ <td><a href="https://github.com/Netwrck/stable-diffusion-server"> Stable Diffusion Server </a></td>
80
+ <td>A server configured for Inpainting/Generation/img2img with one stable diffusion model</td>
81
+ </tr>
82
+ <tr style="border-top: 2px solid black">
83
+ <td><a href="https://github.com/suzukimain/auto_diffusers"> Model Search </a></td>
84
+ <td>Search models on Civitai and Hugging Face</td>
85
+ </tr>
86
+ <tr style="border-top: 2px solid black">
87
+ <td><a href="https://github.com/beinsezii/skrample"> Skrample </a></td>
88
+ <td>Fully modular scheduler functions with 1st class diffusers integration.</td>
89
+ </tr>
90
+ </table>
diffusers/docs/source/en/conceptual/contribution.md ADDED
@@ -0,0 +1,568 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2025 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # How to contribute to Diffusers 🧨
14
+
15
+ We ❤️ contributions from the open-source community! Everyone is welcome, and all types of participation –not just code– are valued and appreciated. Answering questions, helping others, reaching out, and improving the documentation are all immensely valuable to the community, so don't be afraid and get involved if you're up for it!
16
+
17
+ Everyone is encouraged to start by saying 👋 in our public Discord channel. We discuss the latest trends in diffusion models, ask questions, show off personal projects, help each other with contributions, or just hang out ☕. <a href="https://Discord.gg/G7tWnz98XR"><img alt="Join us on Discord" src="https://img.shields.io/discord/823813159592001537?color=5865F2&logo=discord&logoColor=white"></a>
18
+
19
+ Whichever way you choose to contribute, we strive to be part of an open, welcoming, and kind community. Please, read our [code of conduct](https://github.com/huggingface/diffusers/blob/main/CODE_OF_CONDUCT.md) and be mindful to respect it during your interactions. We also recommend you become familiar with the [ethical guidelines](https://huggingface.co/docs/diffusers/conceptual/ethical_guidelines) that guide our project and ask you to adhere to the same principles of transparency and responsibility.
20
+
21
+ We enormously value feedback from the community, so please do not be afraid to speak up if you believe you have valuable feedback that can help improve the library - every message, comment, issue, and pull request (PR) is read and considered.
22
+
23
+ ## Overview
24
+
25
+ You can contribute in many ways ranging from answering questions on issues and discussions to adding new diffusion models to the core library.
26
+
27
+ In the following, we give an overview of different ways to contribute, ranked by difficulty in ascending order. All of them are valuable to the community.
28
+
29
+ * 1. Asking and answering questions on [the Diffusers discussion forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers) or on [Discord](https://discord.gg/G7tWnz98XR).
30
+ * 2. Opening new issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues/new/choose) or new discussions on [the GitHub Discussions tab](https://github.com/huggingface/diffusers/discussions/new/choose).
31
+ * 3. Answering issues on [the GitHub Issues tab](https://github.com/huggingface/diffusers/issues) or discussions on [the GitHub Discussions tab](https://github.com/huggingface/diffusers/discussions).
32
+ * 4. Fix a simple issue, marked by the "Good first issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
33
+ * 5. Contribute to the [documentation](https://github.com/huggingface/diffusers/tree/main/docs/source).
34
+ * 6. Contribute a [Community Pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3Acommunity-examples).
35
+ * 7. Contribute to the [examples](https://github.com/huggingface/diffusers/tree/main/examples).
36
+ * 8. Fix a more difficult issue, marked by the "Good second issue" label, see [here](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22Good+second+issue%22).
37
+ * 9. Add a new pipeline, model, or scheduler, see ["New Pipeline/Model"](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22) and ["New scheduler"](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22) issues. For this contribution, please have a look at [Design Philosophy](https://github.com/huggingface/diffusers/blob/main/PHILOSOPHY.md).
38
+
39
+ As said before, **all contributions are valuable to the community**.
40
+ In the following, we will explain each contribution a bit more in detail.
41
+
42
+ For all contributions 4 - 9, you will need to open a PR. It is explained in detail how to do so in [Opening a pull request](#how-to-open-a-pr).
43
+
44
+ ### 1. Asking and answering questions on the Diffusers discussion forum or on the Diffusers Discord
45
+
46
+ Any question or comment related to the Diffusers library can be asked on the [discussion forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/) or on [Discord](https://discord.gg/G7tWnz98XR). Such questions and comments include (but are not limited to):
47
+ - Reports of training or inference experiments in an attempt to share knowledge
48
+ - Presentation of personal projects
49
+ - Questions to non-official training examples
50
+ - Project proposals
51
+ - General feedback
52
+ - Paper summaries
53
+ - Asking for help on personal projects that build on top of the Diffusers library
54
+ - General questions
55
+ - Ethical questions regarding diffusion models
56
+ - ...
57
+
58
+ Every question that is asked on the forum or on Discord actively encourages the community to publicly
59
+ share knowledge and might very well help a beginner in the future who has the same question you're
60
+ having. Please do pose any questions you might have.
61
+ In the same spirit, you are of immense help to the community by answering such questions because this way you are publicly documenting knowledge for everybody to learn from.
62
+
63
+ **Please** keep in mind that the more effort you put into asking or answering a question, the higher
64
+ the quality of the publicly documented knowledge. In the same way, well-posed and well-answered questions create a high-quality knowledge database accessible to everybody, while badly posed questions or answers reduce the overall quality of the public knowledge database.
65
+ In short, a high quality question or answer is *precise*, *concise*, *relevant*, *easy-to-understand*, *accessible*, and *well-formatted/well-posed*. For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
66
+
67
+ **NOTE about channels**:
68
+ [*The forum*](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) is much better indexed by search engines, such as Google. Posts are ranked by popularity rather than chronologically. Hence, it's easier to look up questions and answers that we posted some time ago.
69
+ In addition, questions and answers posted in the forum can easily be linked to.
70
+ In contrast, *Discord* has a chat-like format that invites fast back-and-forth communication.
71
+ While it will most likely take less time for you to get an answer to your question on Discord, your
72
+ question won't be visible anymore over time. Also, it's much harder to find information that was posted a while back on Discord. We therefore strongly recommend using the forum for high-quality questions and answers in an attempt to create long-lasting knowledge for the community. If discussions on Discord lead to very interesting answers and conclusions, we recommend posting the results on the forum to make the information more available for future readers.
73
+
74
+ ### 2. Opening new issues on the GitHub issues tab
75
+
76
+ The 🧨 Diffusers library is robust and reliable thanks to the users who notify us of
77
+ the problems they encounter. So thank you for reporting an issue.
78
+
79
+ Remember, GitHub issues are reserved for technical questions directly related to the Diffusers library, bug reports, feature requests, or feedback on the library design.
80
+
81
+ In a nutshell, this means that everything that is **not** related to the **code of the Diffusers library** (including the documentation) should **not** be asked on GitHub, but rather on either the [forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) or [Discord](https://discord.gg/G7tWnz98XR).
82
+
83
+ **Please consider the following guidelines when opening a new issue**:
84
+ - Make sure you have searched whether your issue has already been asked before (use the search bar on GitHub under Issues).
85
+ - Please never report a new issue on another (related) issue. If another issue is highly related, please
86
+ open a new issue nevertheless and link to the related issue.
87
+ - Make sure your issue is written in English. Please use one of the great, free online translation services, such as [DeepL](https://www.deepl.com/translator) to translate from your native language to English if you are not comfortable in English.
88
+ - Check whether your issue might be solved by updating to the newest Diffusers version. Before posting your issue, please make sure that `python -c "import diffusers; print(diffusers.__version__)"` is higher or matches the latest Diffusers version.
89
+ - Remember that the more effort you put into opening a new issue, the higher the quality of your answer will be and the better the overall quality of the Diffusers issues.
90
+
91
+ New issues usually include the following.
92
+
93
+ #### 2.1. Reproducible, minimal bug reports
94
+
95
+ A bug report should always have a reproducible code snippet and be as minimal and concise as possible.
96
+ This means in more detail:
97
+ - Narrow the bug down as much as you can, **do not just dump your whole code file**.
98
+ - Format your code.
99
+ - Do not include any external libraries except for Diffusers depending on them.
100
+ - **Always** provide all necessary information about your environment; for this, you can run: `diffusers-cli env` in your shell and copy-paste the displayed information to the issue.
101
+ - Explain the issue. If the reader doesn't know what the issue is and why it is an issue, (s)he cannot solve it.
102
+ - **Always** make sure the reader can reproduce your issue with as little effort as possible. If your code snippet cannot be run because of missing libraries or undefined variables, the reader cannot help you. Make sure your reproducible code snippet is as minimal as possible and can be copy-pasted into a simple Python shell.
103
+ - If in order to reproduce your issue a model and/or dataset is required, make sure the reader has access to that model or dataset. You can always upload your model or dataset to the [Hub](https://huggingface.co) to make it easily downloadable. Try to keep your model and dataset as small as possible, to make the reproduction of your issue as effortless as possible.
104
+
105
+ For more information, please have a look through the [How to write a good issue](#how-to-write-a-good-issue) section.
106
+
107
+ You can open a bug report [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=bug&projects=&template=bug-report.yml).
108
+
109
+ #### 2.2. Feature requests
110
+
111
+ A world-class feature request addresses the following points:
112
+
113
+ 1. Motivation first:
114
+ * Is it related to a problem/frustration with the library? If so, please explain
115
+ why. Providing a code snippet that demonstrates the problem is best.
116
+ * Is it related to something you would need for a project? We'd love to hear
117
+ about it!
118
+ * Is it something you worked on and think could benefit the community?
119
+ Awesome! Tell us what problem it solved for you.
120
+ 2. Write a *full paragraph* describing the feature;
121
+ 3. Provide a **code snippet** that demonstrates its future use;
122
+ 4. In case this is related to a paper, please attach a link;
123
+ 5. Attach any additional information (drawings, screenshots, etc.) you think may help.
124
+
125
+ You can open a feature request [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=).
126
+
127
+ #### 2.3 Feedback
128
+
129
+ Feedback about the library design and why it is good or not good helps the core maintainers immensely to build a user-friendly library. To understand the philosophy behind the current design philosophy, please have a look [here](https://huggingface.co/docs/diffusers/conceptual/philosophy). If you feel like a certain design choice does not fit with the current design philosophy, please explain why and how it should be changed. If a certain design choice follows the design philosophy too much, hence restricting use cases, explain why and how it should be changed.
130
+ If a certain design choice is very useful for you, please also leave a note as this is great feedback for future design decisions.
131
+
132
+ You can open an issue about feedback [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=).
133
+
134
+ #### 2.4 Technical questions
135
+
136
+ Technical questions are mainly about why certain code of the library was written in a certain way, or what a certain part of the code does. Please make sure to link to the code in question and please provide details on
137
+ why this part of the code is difficult to understand.
138
+
139
+ You can open an issue about a technical question [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=bug&template=bug-report.yml).
140
+
141
+ #### 2.5 Proposal to add a new model, scheduler, or pipeline
142
+
143
+ If the diffusion model community released a new model, pipeline, or scheduler that you would like to see in the Diffusers library, please provide the following information:
144
+
145
+ * Short description of the diffusion pipeline, model, or scheduler and link to the paper or public release.
146
+ * Link to any of its open-source implementation(s).
147
+ * Link to the model weights if they are available.
148
+
149
+ If you are willing to contribute to the model yourself, let us know so we can best guide you. Also, don't forget
150
+ to tag the original author of the component (model, scheduler, pipeline, etc.) by GitHub handle if you can find it.
151
+
152
+ You can open a request for a model/pipeline/scheduler [here](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=New+model%2Fpipeline%2Fscheduler&template=new-model-addition.yml).
153
+
154
+ ### 3. Answering issues on the GitHub issues tab
155
+
156
+ Answering issues on GitHub might require some technical knowledge of Diffusers, but we encourage everybody to give it a try even if you are not 100% certain that your answer is correct.
157
+ Some tips to give a high-quality answer to an issue:
158
+ - Be as concise and minimal as possible.
159
+ - Stay on topic. An answer to the issue should concern the issue and only the issue.
160
+ - Provide links to code, papers, or other sources that prove or encourage your point.
161
+ - Answer in code. If a simple code snippet is the answer to the issue or shows how the issue can be solved, please provide a fully reproducible code snippet.
162
+
163
+ Also, many issues tend to be simply off-topic, duplicates of other issues, or irrelevant. It is of great
164
+ help to the maintainers if you can answer such issues, encouraging the author of the issue to be
165
+ more precise, provide the link to a duplicated issue or redirect them to [the forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) or [Discord](https://discord.gg/G7tWnz98XR).
166
+
167
+ If you have verified that the issued bug report is correct and requires a correction in the source code,
168
+ please have a look at the next sections.
169
+
170
+ For all of the following contributions, you will need to open a PR. It is explained in detail how to do so in the [Opening a pull request](#how-to-open-a-pr) section.
171
+
172
+ ### 4. Fixing a "Good first issue"
173
+
174
+ *Good first issues* are marked by the [Good first issue](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) label. Usually, the issue already
175
+ explains how a potential solution should look so that it is easier to fix.
176
+ If the issue hasn't been closed and you would like to try to fix this issue, you can just leave a message "I would like to try this issue.". There are usually three scenarios:
177
+ - a.) The issue description already proposes a fix. In this case and if the solution makes sense to you, you can open a PR or draft PR to fix it.
178
+ - b.) The issue description does not propose a fix. In this case, you can ask what a proposed fix could look like and someone from the Diffusers team should answer shortly. If you have a good idea of how to fix it, feel free to directly open a PR.
179
+ - c.) There is already an open PR to fix the issue, but the issue hasn't been closed yet. If the PR has gone stale, you can simply open a new PR and link to the stale PR. PRs often go stale if the original contributor who wanted to fix the issue suddenly cannot find the time anymore to proceed. This often happens in open-source and is very normal. In this case, the community will be very happy if you give it a new try and leverage the knowledge of the existing PR. If there is already a PR and it is active, you can help the author by giving suggestions, reviewing the PR or even asking whether you can contribute to the PR.
180
+
181
+
182
+ ### 5. Contribute to the documentation
183
+
184
+ A good library **always** has good documentation! The official documentation is often one of the first points of contact for new users of the library, and therefore contributing to the documentation is a **highly
185
+ valuable contribution**.
186
+
187
+ Contributing to the library can have many forms:
188
+
189
+ - Correcting spelling or grammatical errors.
190
+ - Correct incorrect formatting of the docstring. If you see that the official documentation is weirdly displayed or a link is broken, we would be very happy if you take some time to correct it.
191
+ - Correct the shape or dimensions of a docstring input or output tensor.
192
+ - Clarify documentation that is hard to understand or incorrect.
193
+ - Update outdated code examples.
194
+ - Translating the documentation to another language.
195
+
196
+ Anything displayed on [the official Diffusers doc page](https://huggingface.co/docs/diffusers/index) is part of the official documentation and can be corrected, adjusted in the respective [documentation source](https://github.com/huggingface/diffusers/tree/main/docs/source).
197
+
198
+ Please have a look at [this page](https://github.com/huggingface/diffusers/tree/main/docs) on how to verify changes made to the documentation locally.
199
+
200
+ ### 6. Contribute a community pipeline
201
+
202
+ > [!TIP]
203
+ > Read the [Community pipelines](../using-diffusers/custom_pipeline_overview#community-pipelines) guide to learn more about the difference between a GitHub and Hugging Face Hub community pipeline. If you're interested in why we have community pipelines, take a look at GitHub Issue [#841](https://github.com/huggingface/diffusers/issues/841) (basically, we can't maintain all the possible ways diffusion models can be used for inference but we also don't want to prevent the community from building them).
204
+
205
+ Contributing a community pipeline is a great way to share your creativity and work with the community. It lets you build on top of the [`DiffusionPipeline`] so that anyone can load and use it by setting the `custom_pipeline` parameter. This section will walk you through how to create a simple pipeline where the UNet only does a single forward pass and calls the scheduler once (a "one-step" pipeline).
206
+
207
+ 1. Create a one_step_unet.py file for your community pipeline. This file can contain whatever package you want to use as long as it's installed by the user. Make sure you only have one pipeline class that inherits from [`DiffusionPipeline`] to load model weights and the scheduler configuration from the Hub. Add a UNet and scheduler to the `__init__` function.
208
+
209
+ You should also add the `register_modules` function to ensure your pipeline and its components can be saved with [`~DiffusionPipeline.save_pretrained`].
210
+
211
+ ```py
212
+ from diffusers import DiffusionPipeline
213
+ import torch
214
+
215
+ class UnetSchedulerOneForwardPipeline(DiffusionPipeline):
216
+ def __init__(self, unet, scheduler):
217
+ super().__init__()
218
+
219
+ self.register_modules(unet=unet, scheduler=scheduler)
220
+ ```
221
+
222
+ 1. In the forward pass (which we recommend defining as `__call__`), you can add any feature you'd like. For the "one-step" pipeline, create a random image and call the UNet and scheduler once by setting `timestep=1`.
223
+
224
+ ```py
225
+ from diffusers import DiffusionPipeline
226
+ import torch
227
+
228
+ class UnetSchedulerOneForwardPipeline(DiffusionPipeline):
229
+ def __init__(self, unet, scheduler):
230
+ super().__init__()
231
+
232
+ self.register_modules(unet=unet, scheduler=scheduler)
233
+
234
+ def __call__(self):
235
+ image = torch.randn(
236
+ (1, self.unet.config.in_channels, self.unet.config.sample_size, self.unet.config.sample_size),
237
+ )
238
+ timestep = 1
239
+
240
+ model_output = self.unet(image, timestep).sample
241
+ scheduler_output = self.scheduler.step(model_output, timestep, image).prev_sample
242
+
243
+ return scheduler_output
244
+ ```
245
+
246
+ Now you can run the pipeline by passing a UNet and scheduler to it or load pretrained weights if the pipeline structure is identical.
247
+
248
+ ```py
249
+ from diffusers import DDPMScheduler, UNet2DModel
250
+
251
+ scheduler = DDPMScheduler()
252
+ unet = UNet2DModel()
253
+
254
+ pipeline = UnetSchedulerOneForwardPipeline(unet=unet, scheduler=scheduler)
255
+ output = pipeline()
256
+ # load pretrained weights
257
+ pipeline = UnetSchedulerOneForwardPipeline.from_pretrained("google/ddpm-cifar10-32", use_safetensors=True)
258
+ output = pipeline()
259
+ ```
260
+
261
+ You can either share your pipeline as a GitHub community pipeline or Hub community pipeline.
262
+
263
+ <hfoptions id="pipeline type">
264
+ <hfoption id="GitHub pipeline">
265
+
266
+ Share your GitHub pipeline by opening a pull request on the Diffusers [repository](https://github.com/huggingface/diffusers) and add the one_step_unet.py file to the [examples/community](https://github.com/huggingface/diffusers/tree/main/examples/community) subfolder.
267
+
268
+ </hfoption>
269
+ <hfoption id="Hub pipeline">
270
+
271
+ Share your Hub pipeline by creating a model repository on the Hub and uploading the one_step_unet.py file to it.
272
+
273
+ </hfoption>
274
+ </hfoptions>
275
+
276
+ ### 7. Contribute to training examples
277
+
278
+ Diffusers examples are a collection of training scripts that reside in [examples](https://github.com/huggingface/diffusers/tree/main/examples).
279
+
280
+ We support two types of training examples:
281
+
282
+ - Official training examples
283
+ - Research training examples
284
+
285
+ Research training examples are located in [examples/research_projects](https://github.com/huggingface/diffusers/tree/main/examples/research_projects) whereas official training examples include all folders under [examples](https://github.com/huggingface/diffusers/tree/main/examples) except the `research_projects` and `community` folders.
286
+ The official training examples are maintained by the Diffusers' core maintainers whereas the research training examples are maintained by the community.
287
+ This is because of the same reasons put forward in [6. Contribute a community pipeline](#6-contribute-a-community-pipeline) for official pipelines vs. community pipelines: It is not feasible for the core maintainers to maintain all possible training methods for diffusion models.
288
+ If the Diffusers core maintainers and the community consider a certain training paradigm to be too experimental or not popular enough, the corresponding training code should be put in the `research_projects` folder and maintained by the author.
289
+
290
+ Both official training and research examples consist of a directory that contains one or more training scripts, a `requirements.txt` file, and a `README.md` file. In order for the user to make use of the
291
+ training examples, it is required to clone the repository:
292
+
293
+ ```bash
294
+ git clone https://github.com/huggingface/diffusers
295
+ ```
296
+
297
+ as well as to install all additional dependencies required for training:
298
+
299
+ ```bash
300
+ cd diffusers
301
+ pip install -r examples/<your-example-folder>/requirements.txt
302
+ ```
303
+
304
+ Therefore when adding an example, the `requirements.txt` file shall define all pip dependencies required for your training example so that once all those are installed, the user can run the example's training script. See, for example, the [DreamBooth `requirements.txt` file](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/requirements.txt).
305
+
306
+ Training examples of the Diffusers library should adhere to the following philosophy:
307
+ - All the code necessary to run the examples should be found in a single Python file.
308
+ - One should be able to run the example from the command line with `python <your-example>.py --args`.
309
+ - Examples should be kept simple and serve as **an example** on how to use Diffusers for training. The purpose of example scripts is **not** to create state-of-the-art diffusion models, but rather to reproduce known training schemes without adding too much custom logic. As a byproduct of this point, our examples also strive to serve as good educational materials.
310
+
311
+ To contribute an example, it is highly recommended to look at already existing examples such as [dreambooth](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py) to get an idea of how they should look like.
312
+ We strongly advise contributors to make use of the [Accelerate library](https://github.com/huggingface/accelerate) as it's tightly integrated
313
+ with Diffusers.
314
+ Once an example script works, please make sure to add a comprehensive `README.md` that states how to use the example exactly. This README should include:
315
+ - An example command on how to run the example script as shown [here](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth#running-locally-with-pytorch).
316
+ - A link to some training results (logs, models, etc.) that show what the user can expect as shown [here](https://api.wandb.ai/report/patrickvonplaten/xm6cd5q5).
317
+ - If you are adding a non-official/research training example, **please don't forget** to add a sentence that you are maintaining this training example which includes your git handle as shown [here](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/intel_opts#diffusers-examples-with-intel-optimizations).
318
+
319
+ If you are contributing to the official training examples, please also make sure to add a test to its folder such as [examples/dreambooth/test_dreambooth.py](https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/test_dreambooth.py). This is not necessary for non-official training examples.
320
+
321
+ ### 8. Fixing a "Good second issue"
322
+
323
+ *Good second issues* are marked by the [Good second issue](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22Good+second+issue%22) label. Good second issues are
324
+ usually more complicated to solve than [Good first issues](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22).
325
+ The issue description usually gives less guidance on how to fix the issue and requires
326
+ a decent understanding of the library by the interested contributor.
327
+ If you are interested in tackling a good second issue, feel free to open a PR to fix it and link the PR to the issue. If you see that a PR has already been opened for this issue but did not get merged, have a look to understand why it wasn't merged and try to open an improved PR.
328
+ Good second issues are usually more difficult to get merged compared to good first issues, so don't hesitate to ask for help from the core maintainers. If your PR is almost finished the core maintainers can also jump into your PR and commit to it in order to get it merged.
329
+
330
+ ### 9. Adding pipelines, models, schedulers
331
+
332
+ Pipelines, models, and schedulers are the most important pieces of the Diffusers library.
333
+ They provide easy access to state-of-the-art diffusion technologies and thus allow the community to
334
+ build powerful generative AI applications.
335
+
336
+ By adding a new model, pipeline, or scheduler you might enable a new powerful use case for any of the user interfaces relying on Diffusers which can be of immense value for the whole generative AI ecosystem.
337
+
338
+ Diffusers has a couple of open feature requests for all three components - feel free to gloss over them
339
+ if you don't know yet what specific component you would like to add:
340
+ - [Model or pipeline](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+pipeline%2Fmodel%22)
341
+ - [Scheduler](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22New+scheduler%22)
342
+
343
+ Before adding any of the three components, it is strongly recommended that you give the [Philosophy guide](philosophy) a read to better understand the design of any of the three components. Please be aware that we cannot merge model, scheduler, or pipeline additions that strongly diverge from our design philosophy
344
+ as it will lead to API inconsistencies. If you fundamentally disagree with a design choice, please open a [Feedback issue](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=) instead so that it can be discussed whether a certain design pattern/design choice shall be changed everywhere in the library and whether we shall update our design philosophy. Consistency across the library is very important for us.
345
+
346
+ Please make sure to add links to the original codebase/paper to the PR and ideally also ping the original author directly on the PR so that they can follow the progress and potentially help with questions.
347
+
348
+ If you are unsure or stuck in the PR, don't hesitate to leave a message to ask for a first review or help.
349
+
350
+ #### Copied from mechanism
351
+
352
+ A unique and important feature to understand when adding any pipeline, model or scheduler code is the `# Copied from` mechanism. You'll see this all over the Diffusers codebase, and the reason we use it is to keep the codebase easy to understand and maintain. Marking code with the `# Copied from` mechanism forces the marked code to be identical to the code it was copied from. This makes it easy to update and propagate changes across many files whenever you run `make fix-copies`.
353
+
354
+ For example, in the code example below, [`~diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput`] is the original code and `AltDiffusionPipelineOutput` uses the `# Copied from` mechanism to copy it. The only difference is changing the class prefix from `Stable` to `Alt`.
355
+
356
+ ```py
357
+ # Copied from diffusers.pipelines.stable_diffusion.pipeline_output.StableDiffusionPipelineOutput with Stable->Alt
358
+ class AltDiffusionPipelineOutput(BaseOutput):
359
+ """
360
+ Output class for Alt Diffusion pipelines.
361
+
362
+ Args:
363
+ images (`List[PIL.Image.Image]` or `np.ndarray`)
364
+ List of denoised PIL images of length `batch_size` or NumPy array of shape `(batch_size, height, width,
365
+ num_channels)`.
366
+ nsfw_content_detected (`List[bool]`)
367
+ List indicating whether the corresponding generated image contains "not-safe-for-work" (nsfw) content or
368
+ `None` if safety checking could not be performed.
369
+ """
370
+ ```
371
+
372
+ To learn more, read this section of the [~Don't~ Repeat Yourself*](https://huggingface.co/blog/transformers-design-philosophy#4-machine-learning-models-are-static) blog post.
373
+
374
+ ## How to write a good issue
375
+
376
+ **The better your issue is written, the higher the chances that it will be quickly resolved.**
377
+
378
+ 1. Make sure that you've used the correct template for your issue. You can pick between *Bug Report*, *Feature Request*, *Feedback about API Design*, *New model/pipeline/scheduler addition*, *Forum*, or a blank issue. Make sure to pick the correct one when opening [a new issue](https://github.com/huggingface/diffusers/issues/new/choose).
379
+ 2. **Be precise**: Give your issue a fitting title. Try to formulate your issue description as simple as possible. The more precise you are when submitting an issue, the less time it takes to understand the issue and potentially solve it. Make sure to open an issue for one issue only and not for multiple issues. If you found multiple issues, simply open multiple issues. If your issue is a bug, try to be as precise as possible about what bug it is - you should not just write "Error in diffusers".
380
+ 3. **Reproducibility**: No reproducible code snippet == no solution. If you encounter a bug, maintainers **have to be able to reproduce** it. Make sure that you include a code snippet that can be copy-pasted into a Python interpreter to reproduce the issue. Make sure that your code snippet works, *i.e.* that there are no missing imports or missing links to images, ... Your issue should contain an error message **and** a code snippet that can be copy-pasted without any changes to reproduce the exact same error message. If your issue is using local model weights or local data that cannot be accessed by the reader, the issue cannot be solved. If you cannot share your data or model, try to make a dummy model or dummy data.
381
+ 4. **Minimalistic**: Try to help the reader as much as you can to understand the issue as quickly as possible by staying as concise as possible. Remove all code / all information that is irrelevant to the issue. If you have found a bug, try to create the easiest code example you can to demonstrate your issue, do not just dump your whole workflow into the issue as soon as you have found a bug. E.g., if you train a model and get an error at some point during the training, you should first try to understand what part of the training code is responsible for the error and try to reproduce it with a couple of lines. Try to use dummy data instead of full datasets.
382
+ 5. Add links. If you are referring to a certain naming, method, or model make sure to provide a link so that the reader can better understand what you mean. If you are referring to a specific PR or issue, make sure to link it to your issue. Do not assume that the reader knows what you are talking about. The more links you add to your issue the better.
383
+ 6. Formatting. Make sure to nicely format your issue by formatting code into Python code syntax, and error messages into normal code syntax. See the [official GitHub formatting docs](https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github/basic-writing-and-formatting-syntax) for more information.
384
+ 7. Think of your issue not as a ticket to be solved, but rather as a beautiful entry to a well-written encyclopedia. Every added issue is a contribution to publicly available knowledge. By adding a nicely written issue you not only make it easier for maintainers to solve your issue, but you are helping the whole community to better understand a certain aspect of the library.
385
+
386
+ ## How to write a good PR
387
+
388
+ 1. Be a chameleon. Understand existing design patterns and syntax and make sure your code additions flow seamlessly into the existing code base. Pull requests that significantly diverge from existing design patterns or user interfaces will not be merged.
389
+ 2. Be laser focused. A pull request should solve one problem and one problem only. Make sure to not fall into the trap of "also fixing another problem while we're adding it". It is much more difficult to review pull requests that solve multiple, unrelated problems at once.
390
+ 3. If helpful, try to add a code snippet that displays an example of how your addition can be used.
391
+ 4. The title of your pull request should be a summary of its contribution.
392
+ 5. If your pull request addresses an issue, please mention the issue number in
393
+ the pull request description to make sure they are linked (and people
394
+ consulting the issue know you are working on it);
395
+ 6. To indicate a work in progress please prefix the title with `[WIP]`. These
396
+ are useful to avoid duplicated work, and to differentiate it from PRs ready
397
+ to be merged;
398
+ 7. Try to formulate and format your text as explained in [How to write a good issue](#how-to-write-a-good-issue).
399
+ 8. Make sure existing tests pass;
400
+ 9. Add high-coverage tests. No quality testing = no merge.
401
+ - If you are adding new `@slow` tests, make sure they pass using
402
+ `RUN_SLOW=1 python -m pytest tests/test_my_new_model.py`.
403
+ CircleCI does not run the slow tests, but GitHub Actions does every night!
404
+ 10. All public methods must have informative docstrings that work nicely with markdown. See [`pipeline_latent_diffusion.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py) for an example.
405
+ 11. Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos, and other non-text files. We prefer to leverage a hf.co hosted `dataset` like
406
+ [`hf-internal-testing`](https://huggingface.co/hf-internal-testing) or [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images) to place these files.
407
+ If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images
408
+ to this dataset.
409
+
410
+ ## How to open a PR
411
+
412
+ Before writing code, we strongly advise you to search through the existing PRs or
413
+ issues to make sure that nobody is already working on the same thing. If you are
414
+ unsure, it is always a good idea to open an issue to get some feedback.
415
+
416
+ You will need basic `git` proficiency to be able to contribute to
417
+ 🧨 Diffusers. `git` is not the easiest tool to use but it has the greatest
418
+ manual. Type `git --help` in a shell and enjoy. If you prefer books, [Pro
419
+ Git](https://git-scm.com/book/en/v2) is a very good reference.
420
+
421
+ Follow these steps to start contributing ([supported Python versions](https://github.com/huggingface/diffusers/blob/83bc6c94eaeb6f7704a2a428931cf2d9ad973ae9/setup.py#L270)):
422
+
423
+ 1. Fork the [repository](https://github.com/huggingface/diffusers) by
424
+ clicking on the 'Fork' button on the repository's page. This creates a copy of the code
425
+ under your GitHub user account.
426
+
427
+ 2. Clone your fork to your local disk, and add the base repository as a remote:
428
+
429
+ ```bash
430
+ $ git clone git@github.com:<your GitHub handle>/diffusers.git
431
+ $ cd diffusers
432
+ $ git remote add upstream https://github.com/huggingface/diffusers.git
433
+ ```
434
+
435
+ 3. Create a new branch to hold your development changes:
436
+
437
+ ```bash
438
+ $ git checkout -b a-descriptive-name-for-my-changes
439
+ ```
440
+
441
+ **Do not** work on the `main` branch.
442
+
443
+ 4. Set up a development environment by running the following command in a virtual environment:
444
+
445
+ ```bash
446
+ $ pip install -e ".[dev]"
447
+ ```
448
+
449
+ If you have already cloned the repo, you might need to `git pull` to get the most recent changes in the
450
+ library.
451
+
452
+ 5. Develop the features on your branch.
453
+
454
+ As you work on the features, you should make sure that the test suite
455
+ passes. You should run the tests impacted by your changes like this:
456
+
457
+ ```bash
458
+ $ pytest tests/<TEST_TO_RUN>.py
459
+ ```
460
+
461
+ Before you run the tests, please make sure you install the dependencies required for testing. You can do so
462
+ with this command:
463
+
464
+ ```bash
465
+ $ pip install -e ".[test]"
466
+ ```
467
+
468
+ You can also run the full test suite with the following command, but it takes
469
+ a beefy machine to produce a result in a decent amount of time now that
470
+ Diffusers has grown a lot. Here is the command for it:
471
+
472
+ ```bash
473
+ $ make test
474
+ ```
475
+
476
+ 🧨 Diffusers relies on `black` and `isort` to format its source code
477
+ consistently. After you make changes, apply automatic style corrections and code verifications
478
+ that can't be automated in one go with:
479
+
480
+ ```bash
481
+ $ make style
482
+ ```
483
+
484
+ 🧨 Diffusers also uses `ruff` and a few custom scripts to check for coding mistakes. Quality
485
+ control runs in CI, however, you can also run the same checks with:
486
+
487
+ ```bash
488
+ $ make quality
489
+ ```
490
+
491
+ Once you're happy with your changes, add changed files using `git add` and
492
+ make a commit with `git commit` to record your changes locally:
493
+
494
+ ```bash
495
+ $ git add modified_file.py
496
+ $ git commit -m "A descriptive message about your changes."
497
+ ```
498
+
499
+ It is a good idea to sync your copy of the code with the original
500
+ repository regularly. This way you can quickly account for changes:
501
+
502
+ ```bash
503
+ $ git pull upstream main
504
+ ```
505
+
506
+ Push the changes to your account using:
507
+
508
+ ```bash
509
+ $ git push -u origin a-descriptive-name-for-my-changes
510
+ ```
511
+
512
+ 6. Once you are satisfied, go to the
513
+ webpage of your fork on GitHub. Click on 'Pull request' to send your changes
514
+ to the project maintainers for review.
515
+
516
+ 7. It's OK if maintainers ask you for changes. It happens to core contributors
517
+ too! So everyone can see the changes in the Pull request, work in your local
518
+ branch and push the changes to your fork. They will automatically appear in
519
+ the pull request.
520
+
521
+ ### Tests
522
+
523
+ An extensive test suite is included to test the library behavior and several examples. Library tests can be found in
524
+ the [tests folder](https://github.com/huggingface/diffusers/tree/main/tests).
525
+
526
+ We like `pytest` and `pytest-xdist` because it's faster. From the root of the
527
+ repository, here's how to run tests with `pytest` for the library:
528
+
529
+ ```bash
530
+ $ python -m pytest -n auto --dist=loadfile -s -v ./tests/
531
+ ```
532
+
533
+ In fact, that's how `make test` is implemented!
534
+
535
+ You can specify a smaller set of tests in order to test only the feature
536
+ you're working on.
537
+
538
+ By default, slow tests are skipped. Set the `RUN_SLOW` environment variable to
539
+ `yes` to run them. This will download many gigabytes of models — make sure you
540
+ have enough disk space and a good Internet connection, or a lot of patience!
541
+
542
+ ```bash
543
+ $ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./tests/
544
+ ```
545
+
546
+ `unittest` is fully supported, here's how to run tests with it:
547
+
548
+ ```bash
549
+ $ python -m unittest discover -s tests -t . -v
550
+ $ python -m unittest discover -s examples -t examples -v
551
+ ```
552
+
553
+ ### Syncing forked main with upstream (HuggingFace) main
554
+
555
+ To avoid pinging the upstream repository which adds reference notes to each upstream PR and sends unnecessary notifications to the developers involved in these PRs,
556
+ when syncing the main branch of a forked repository, please, follow these steps:
557
+ 1. When possible, avoid syncing with the upstream using a branch and PR on the forked repository. Instead, merge directly into the forked main.
558
+ 2. If a PR is absolutely necessary, use the following steps after checking out your branch:
559
+ ```bash
560
+ $ git checkout -b your-branch-for-syncing
561
+ $ git pull --squash --no-commit upstream main
562
+ $ git commit -m '<your message without GitHub references>'
563
+ $ git push --set-upstream origin your-branch-for-syncing
564
+ ```
565
+
566
+ ### Style guide
567
+
568
+ For documentation strings, 🧨 Diffusers follows the [Google style](https://google.github.io/styleguide/pyguide.html).
diffusers/docs/source/en/conceptual/ethical_guidelines.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2025 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # 🧨 Diffusers’ Ethical Guidelines
14
+
15
+ ## Preamble
16
+
17
+ [Diffusers](https://huggingface.co/docs/diffusers/index) provides pre-trained diffusion models and serves as a modular toolbox for inference and training.
18
+
19
+ Given its real case applications in the world and potential negative impacts on society, we think it is important to provide the project with ethical guidelines to guide the development, users’ contributions, and usage of the Diffusers library.
20
+
21
+ The risks associated with using this technology are still being examined, but to name a few: copyrights issues for artists; deep-fake exploitation; sexual content generation in inappropriate contexts; non-consensual impersonation; harmful social biases perpetuating the oppression of marginalized groups.
22
+ We will keep tracking risks and adapt the following guidelines based on the community's responsiveness and valuable feedback.
23
+
24
+
25
+ ## Scope
26
+
27
+ The Diffusers community will apply the following ethical guidelines to the project’s development and help coordinate how the community will integrate the contributions, especially concerning sensitive topics related to ethical concerns.
28
+
29
+
30
+ ## Ethical guidelines
31
+
32
+ The following ethical guidelines apply generally, but we will primarily implement them when dealing with ethically sensitive issues while making a technical choice. Furthermore, we commit to adapting those ethical principles over time following emerging harms related to the state of the art of the technology in question.
33
+
34
+ - **Transparency**: we are committed to being transparent in managing PRs, explaining our choices to users, and making technical decisions.
35
+
36
+ - **Consistency**: we are committed to guaranteeing our users the same level of attention in project management, keeping it technically stable and consistent.
37
+
38
+ - **Simplicity**: with a desire to make it easy to use and exploit the Diffusers library, we are committed to keeping the project’s goals lean and coherent.
39
+
40
+ - **Accessibility**: the Diffusers project helps lower the entry bar for contributors who can help run it even without technical expertise. Doing so makes research artifacts more accessible to the community.
41
+
42
+ - **Reproducibility**: we aim to be transparent about the reproducibility of upstream code, models, and datasets when made available through the Diffusers library.
43
+
44
+ - **Responsibility**: as a community and through teamwork, we hold a collective responsibility to our users by anticipating and mitigating this technology's potential risks and dangers.
45
+
46
+
47
+ ## Examples of implementations: Safety features and Mechanisms
48
+
49
+ The team works daily to make the technical and non-technical tools available to deal with the potential ethical and social risks associated with diffusion technology. Moreover, the community's input is invaluable in ensuring these features' implementation and raising awareness with us.
50
+
51
+ - [**Community tab**](https://huggingface.co/docs/hub/repositories-pull-requests-discussions): it enables the community to discuss and better collaborate on a project.
52
+
53
+ - **Bias exploration and evaluation**: the Hugging Face team provides a [space](https://huggingface.co/spaces/society-ethics/DiffusionBiasExplorer) to demonstrate the biases in Stable Diffusion interactively. In this sense, we support and encourage bias explorers and evaluations.
54
+
55
+ - **Encouraging safety in deployment**
56
+
57
+ - [**Safe Stable Diffusion**](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_safe): It mitigates the well-known issue that models, like Stable Diffusion, that are trained on unfiltered, web-crawled datasets tend to suffer from inappropriate degeneration. Related paper: [Safe Latent Diffusion: Mitigating Inappropriate Degeneration in Diffusion Models](https://huggingface.co/papers/2211.05105).
58
+
59
+ - [**Safety Checker**](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py): It checks and compares the class probability of a set of hard-coded harmful concepts in the embedding space against an image after it has been generated. The harmful concepts are intentionally hidden to prevent reverse engineering of the checker.
60
+
61
+ - **Staged released on the Hub**: in particularly sensitive situations, access to some repositories should be restricted. This staged release is an intermediary step that allows the repository’s authors to have more control over its use.
62
+
63
+ - **Licensing**: [OpenRAILs](https://huggingface.co/blog/open_rail), a new type of licensing, allow us to ensure free access while having a set of restrictions that ensure more responsible use.
diffusers/docs/source/en/conceptual/evaluation.md ADDED
@@ -0,0 +1,578 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2025 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Evaluating Diffusion Models
14
+
15
+ <a target="_blank" href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/evaluation.ipynb">
16
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
17
+ </a>
18
+
19
+ > [!TIP]
20
+ > This document has now grown outdated given the emergence of existing evaluation frameworks for diffusion models for image generation. Please check
21
+ > out works like [HEIM](https://crfm.stanford.edu/helm/heim/latest/), [T2I-Compbench](https://huggingface.co/papers/2307.06350),
22
+ > [GenEval](https://huggingface.co/papers/2310.11513).
23
+
24
+ Evaluation of generative models like [Stable Diffusion](https://huggingface.co/docs/diffusers/stable_diffusion) is subjective in nature. But as practitioners and researchers, we often have to make careful choices amongst many different possibilities. So, when working with different generative models (like GANs, Diffusion, etc.), how do we choose one over the other?
25
+
26
+ Qualitative evaluation of such models can be error-prone and might incorrectly influence a decision.
27
+ However, quantitative metrics don't necessarily correspond to image quality. So, usually, a combination
28
+ of both qualitative and quantitative evaluations provides a stronger signal when choosing one model
29
+ over the other.
30
+
31
+ In this document, we provide a non-exhaustive overview of qualitative and quantitative methods to evaluate Diffusion models. For quantitative methods, we specifically focus on how to implement them alongside `diffusers`.
32
+
33
+ The methods shown in this document can also be used to evaluate different [noise schedulers](https://huggingface.co/docs/diffusers/main/en/api/schedulers/overview) keeping the underlying generation model fixed.
34
+
35
+ ## Scenarios
36
+
37
+ We cover Diffusion models with the following pipelines:
38
+
39
+ - Text-guided image generation (such as the [`StableDiffusionPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/text2img)).
40
+ - Text-guided image generation, additionally conditioned on an input image (such as the [`StableDiffusionImg2ImgPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/img2img) and [`StableDiffusionInstructPix2PixPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pix2pix)).
41
+ - Class-conditioned image generation models (such as the [`DiTPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/dit)).
42
+
43
+ ## Qualitative Evaluation
44
+
45
+ Qualitative evaluation typically involves human assessment of generated images. Quality is measured across aspects such as compositionality, image-text alignment, and spatial relations. Common prompts provide a degree of uniformity for subjective metrics.
46
+ DrawBench and PartiPrompts are prompt datasets used for qualitative benchmarking. DrawBench and PartiPrompts were introduced by [Imagen](https://imagen.research.google/) and [Parti](https://parti.research.google/) respectively.
47
+
48
+ From the [official Parti website](https://parti.research.google/):
49
+
50
+ > PartiPrompts (P2) is a rich set of over 1600 prompts in English that we release as part of this work. P2 can be used to measure model capabilities across various categories and challenge aspects.
51
+
52
+ ![parti-prompts](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/parti-prompts.png)
53
+
54
+ PartiPrompts has the following columns:
55
+
56
+ - Prompt
57
+ - Category of the prompt (such as “Abstract”, “World Knowledge”, etc.)
58
+ - Challenge reflecting the difficulty (such as “Basic”, “Complex”, “Writing & Symbols”, etc.)
59
+
60
+ These benchmarks allow for side-by-side human evaluation of different image generation models.
61
+
62
+ For this, the 🧨 Diffusers team has built **Open Parti Prompts**, which is a community-driven qualitative benchmark based on Parti Prompts to compare state-of-the-art open-source diffusion models:
63
+ - [Open Parti Prompts Game](https://huggingface.co/spaces/OpenGenAI/open-parti-prompts): For 10 parti prompts, 4 generated images are shown and the user selects the image that suits the prompt best.
64
+ - [Open Parti Prompts Leaderboard](https://huggingface.co/spaces/OpenGenAI/parti-prompts-leaderboard): The leaderboard comparing the currently best open-sourced diffusion models to each other.
65
+
66
+ To manually compare images, let’s see how we can use `diffusers` on a couple of PartiPrompts.
67
+
68
+ Below we show some prompts sampled across different challenges: Basic, Complex, Linguistic Structures, Imagination, and Writing & Symbols. Here we are using PartiPrompts as a [dataset](https://huggingface.co/datasets/nateraw/parti-prompts).
69
+
70
+ ```python
71
+ from datasets import load_dataset
72
+
73
+ # prompts = load_dataset("nateraw/parti-prompts", split="train")
74
+ # prompts = prompts.shuffle()
75
+ # sample_prompts = [prompts[i]["Prompt"] for i in range(5)]
76
+
77
+ # Fixing these sample prompts in the interest of reproducibility.
78
+ sample_prompts = [
79
+ "a corgi",
80
+ "a hot air balloon with a yin-yang symbol, with the moon visible in the daytime sky",
81
+ "a car with no windows",
82
+ "a cube made of porcupine",
83
+ 'The saying "BE EXCELLENT TO EACH OTHER" written on a red brick wall with a graffiti image of a green alien wearing a tuxedo. A yellow fire hydrant is on a sidewalk in the foreground.',
84
+ ]
85
+ ```
86
+
87
+ Now we can use these prompts to generate some images using Stable Diffusion ([v1-4 checkpoint](https://huggingface.co/CompVis/stable-diffusion-v1-4)):
88
+
89
+ ```python
90
+ import torch
91
+
92
+ seed = 0
93
+ generator = torch.manual_seed(seed)
94
+
95
+ images = sd_pipeline(sample_prompts, num_images_per_prompt=1, generator=generator).images
96
+ ```
97
+
98
+ ![parti-prompts-14](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/parti-prompts-14.png)
99
+
100
+ We can also set `num_images_per_prompt` accordingly to compare different images for the same prompt. Running the same pipeline but with a different checkpoint ([v1-5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5)), yields:
101
+
102
+ ![parti-prompts-15](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/parti-prompts-15.png)
103
+
104
+ Once several images are generated from all the prompts using multiple models (under evaluation), these results are presented to human evaluators for scoring. For
105
+ more details on the DrawBench and PartiPrompts benchmarks, refer to their respective papers.
106
+
107
+ <Tip>
108
+
109
+ It is useful to look at some inference samples while a model is training to measure the
110
+ training progress. In our [training scripts](https://github.com/huggingface/diffusers/tree/main/examples/), we support this utility with additional support for
111
+ logging to TensorBoard and Weights & Biases.
112
+
113
+ </Tip>
114
+
115
+ ## Quantitative Evaluation
116
+
117
+ In this section, we will walk you through how to evaluate three different diffusion pipelines using:
118
+
119
+ - CLIP score
120
+ - CLIP directional similarity
121
+ - FID
122
+
123
+ ### Text-guided image generation
124
+
125
+ [CLIP score](https://huggingface.co/papers/2104.08718) measures the compatibility of image-caption pairs. Higher CLIP scores imply higher compatibility 🔼. The CLIP score is a quantitative measurement of the qualitative concept "compatibility". Image-caption pair compatibility can also be thought of as the semantic similarity between the image and the caption. CLIP score was found to have high correlation with human judgement.
126
+
127
+ Let's first load a [`StableDiffusionPipeline`]:
128
+
129
+ ```python
130
+ from diffusers import StableDiffusionPipeline
131
+ import torch
132
+
133
+ model_ckpt = "CompVis/stable-diffusion-v1-4"
134
+ sd_pipeline = StableDiffusionPipeline.from_pretrained(model_ckpt, torch_dtype=torch.float16).to("cuda")
135
+ ```
136
+
137
+ Generate some images with multiple prompts:
138
+
139
+ ```python
140
+ prompts = [
141
+ "a photo of an astronaut riding a horse on mars",
142
+ "A high tech solarpunk utopia in the Amazon rainforest",
143
+ "A pikachu fine dining with a view to the Eiffel Tower",
144
+ "A mecha robot in a favela in expressionist style",
145
+ "an insect robot preparing a delicious meal",
146
+ "A small cabin on top of a snowy mountain in the style of Disney, artstation",
147
+ ]
148
+
149
+ images = sd_pipeline(prompts, num_images_per_prompt=1, output_type="np").images
150
+
151
+ print(images.shape)
152
+ # (6, 512, 512, 3)
153
+ ```
154
+
155
+ And then, we calculate the CLIP score.
156
+
157
+ ```python
158
+ from torchmetrics.functional.multimodal import clip_score
159
+ from functools import partial
160
+
161
+ clip_score_fn = partial(clip_score, model_name_or_path="openai/clip-vit-base-patch16")
162
+
163
+ def calculate_clip_score(images, prompts):
164
+ images_int = (images * 255).astype("uint8")
165
+ clip_score = clip_score_fn(torch.from_numpy(images_int).permute(0, 3, 1, 2), prompts).detach()
166
+ return round(float(clip_score), 4)
167
+
168
+ sd_clip_score = calculate_clip_score(images, prompts)
169
+ print(f"CLIP score: {sd_clip_score}")
170
+ # CLIP score: 35.7038
171
+ ```
172
+
173
+ In the above example, we generated one image per prompt. If we generated multiple images per prompt, we would have to take the average score from the generated images per prompt.
174
+
175
+ Now, if we wanted to compare two checkpoints compatible with the [`StableDiffusionPipeline`] we should pass a generator while calling the pipeline. First, we generate images with a
176
+ fixed seed with the [v1-4 Stable Diffusion checkpoint](https://huggingface.co/CompVis/stable-diffusion-v1-4):
177
+
178
+ ```python
179
+ seed = 0
180
+ generator = torch.manual_seed(seed)
181
+
182
+ images = sd_pipeline(prompts, num_images_per_prompt=1, generator=generator, output_type="np").images
183
+ ```
184
+
185
+ Then we load the [v1-5 checkpoint](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) to generate images:
186
+
187
+ ```python
188
+ model_ckpt_1_5 = "stable-diffusion-v1-5/stable-diffusion-v1-5"
189
+ sd_pipeline_1_5 = StableDiffusionPipeline.from_pretrained(model_ckpt_1_5, torch_dtype=torch.float16).to("cuda")
190
+
191
+ images_1_5 = sd_pipeline_1_5(prompts, num_images_per_prompt=1, generator=generator, output_type="np").images
192
+ ```
193
+
194
+ And finally, we compare their CLIP scores:
195
+
196
+ ```python
197
+ sd_clip_score_1_4 = calculate_clip_score(images, prompts)
198
+ print(f"CLIP Score with v-1-4: {sd_clip_score_1_4}")
199
+ # CLIP Score with v-1-4: 34.9102
200
+
201
+ sd_clip_score_1_5 = calculate_clip_score(images_1_5, prompts)
202
+ print(f"CLIP Score with v-1-5: {sd_clip_score_1_5}")
203
+ # CLIP Score with v-1-5: 36.2137
204
+ ```
205
+
206
+ It seems like the [v1-5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) checkpoint performs better than its predecessor. Note, however, that the number of prompts we used to compute the CLIP scores is quite low. For a more practical evaluation, this number should be way higher, and the prompts should be diverse.
207
+
208
+ <Tip warning={true}>
209
+
210
+ By construction, there are some limitations in this score. The captions in the training dataset
211
+ were crawled from the web and extracted from `alt` and similar tags associated an image on the internet.
212
+ They are not necessarily representative of what a human being would use to describe an image. Hence we
213
+ had to "engineer" some prompts here.
214
+
215
+ </Tip>
216
+
217
+ ### Image-conditioned text-to-image generation
218
+
219
+ In this case, we condition the generation pipeline with an input image as well as a text prompt. Let's take the [`StableDiffusionInstructPix2PixPipeline`], as an example. It takes an edit instruction as an input prompt and an input image to be edited.
220
+
221
+ Here is one example:
222
+
223
+ ![edit-instruction](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/edit-instruction.png)
224
+
225
+ One strategy to evaluate such a model is to measure the consistency of the change between the two images (in [CLIP](https://huggingface.co/docs/transformers/model_doc/clip) space) with the change between the two image captions (as shown in [CLIP-Guided Domain Adaptation of Image Generators](https://huggingface.co/papers/2108.00946)). This is referred to as the "**CLIP directional similarity**".
226
+
227
+ - Caption 1 corresponds to the input image (image 1) that is to be edited.
228
+ - Caption 2 corresponds to the edited image (image 2). It should reflect the edit instruction.
229
+
230
+ Following is a pictorial overview:
231
+
232
+ ![edit-consistency](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/edit-consistency.png)
233
+
234
+ We have prepared a mini dataset to implement this metric. Let's first load the dataset.
235
+
236
+ ```python
237
+ from datasets import load_dataset
238
+
239
+ dataset = load_dataset("sayakpaul/instructpix2pix-demo", split="train")
240
+ dataset.features
241
+ ```
242
+
243
+ ```bash
244
+ {'input': Value(dtype='string', id=None),
245
+ 'edit': Value(dtype='string', id=None),
246
+ 'output': Value(dtype='string', id=None),
247
+ 'image': Image(decode=True, id=None)}
248
+ ```
249
+
250
+ Here we have:
251
+
252
+ - `input` is a caption corresponding to the `image`.
253
+ - `edit` denotes the edit instruction.
254
+ - `output` denotes the modified caption reflecting the `edit` instruction.
255
+
256
+ Let's take a look at a sample.
257
+
258
+ ```python
259
+ idx = 0
260
+ print(f"Original caption: {dataset[idx]['input']}")
261
+ print(f"Edit instruction: {dataset[idx]['edit']}")
262
+ print(f"Modified caption: {dataset[idx]['output']}")
263
+ ```
264
+
265
+ ```bash
266
+ Original caption: 2. FAROE ISLANDS: An archipelago of 18 mountainous isles in the North Atlantic Ocean between Norway and Iceland, the Faroe Islands has 'everything you could hope for', according to Big 7 Travel. It boasts 'crystal clear waterfalls, rocky cliffs that seem to jut out of nowhere and velvety green hills'
267
+ Edit instruction: make the isles all white marble
268
+ Modified caption: 2. WHITE MARBLE ISLANDS: An archipelago of 18 mountainous white marble isles in the North Atlantic Ocean between Norway and Iceland, the White Marble Islands has 'everything you could hope for', according to Big 7 Travel. It boasts 'crystal clear waterfalls, rocky cliffs that seem to jut out of nowhere and velvety green hills'
269
+ ```
270
+
271
+ And here is the image:
272
+
273
+ ```python
274
+ dataset[idx]["image"]
275
+ ```
276
+
277
+ ![edit-dataset](https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/edit-dataset.png)
278
+
279
+ We will first edit the images of our dataset with the edit instruction and compute the directional similarity.
280
+
281
+ Let's first load the [`StableDiffusionInstructPix2PixPipeline`]:
282
+
283
+ ```python
284
+ from diffusers import StableDiffusionInstructPix2PixPipeline
285
+
286
+ instruct_pix2pix_pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained(
287
+ "timbrooks/instruct-pix2pix", torch_dtype=torch.float16
288
+ ).to("cuda")
289
+ ```
290
+
291
+ Now, we perform the edits:
292
+
293
+ ```python
294
+ import numpy as np
295
+
296
+
297
+ def edit_image(input_image, instruction):
298
+ image = instruct_pix2pix_pipeline(
299
+ instruction,
300
+ image=input_image,
301
+ output_type="np",
302
+ generator=generator,
303
+ ).images[0]
304
+ return image
305
+
306
+ input_images = []
307
+ original_captions = []
308
+ modified_captions = []
309
+ edited_images = []
310
+
311
+ for idx in range(len(dataset)):
312
+ input_image = dataset[idx]["image"]
313
+ edit_instruction = dataset[idx]["edit"]
314
+ edited_image = edit_image(input_image, edit_instruction)
315
+
316
+ input_images.append(np.array(input_image))
317
+ original_captions.append(dataset[idx]["input"])
318
+ modified_captions.append(dataset[idx]["output"])
319
+ edited_images.append(edited_image)
320
+ ```
321
+
322
+ To measure the directional similarity, we first load CLIP's image and text encoders:
323
+
324
+ ```python
325
+ from transformers import (
326
+ CLIPTokenizer,
327
+ CLIPTextModelWithProjection,
328
+ CLIPVisionModelWithProjection,
329
+ CLIPImageProcessor,
330
+ )
331
+
332
+ clip_id = "openai/clip-vit-large-patch14"
333
+ tokenizer = CLIPTokenizer.from_pretrained(clip_id)
334
+ text_encoder = CLIPTextModelWithProjection.from_pretrained(clip_id).to("cuda")
335
+ image_processor = CLIPImageProcessor.from_pretrained(clip_id)
336
+ image_encoder = CLIPVisionModelWithProjection.from_pretrained(clip_id).to("cuda")
337
+ ```
338
+
339
+ Notice that we are using a particular CLIP checkpoint, i.e., `openai/clip-vit-large-patch14`. This is because the Stable Diffusion pre-training was performed with this CLIP variant. For more details, refer to the [documentation](https://huggingface.co/docs/transformers/model_doc/clip).
340
+
341
+ Next, we prepare a PyTorch `nn.Module` to compute directional similarity:
342
+
343
+ ```python
344
+ import torch.nn as nn
345
+ import torch.nn.functional as F
346
+
347
+
348
+ class DirectionalSimilarity(nn.Module):
349
+ def __init__(self, tokenizer, text_encoder, image_processor, image_encoder):
350
+ super().__init__()
351
+ self.tokenizer = tokenizer
352
+ self.text_encoder = text_encoder
353
+ self.image_processor = image_processor
354
+ self.image_encoder = image_encoder
355
+
356
+ def preprocess_image(self, image):
357
+ image = self.image_processor(image, return_tensors="pt")["pixel_values"]
358
+ return {"pixel_values": image.to("cuda")}
359
+
360
+ def tokenize_text(self, text):
361
+ inputs = self.tokenizer(
362
+ text,
363
+ max_length=self.tokenizer.model_max_length,
364
+ padding="max_length",
365
+ truncation=True,
366
+ return_tensors="pt",
367
+ )
368
+ return {"input_ids": inputs.input_ids.to("cuda")}
369
+
370
+ def encode_image(self, image):
371
+ preprocessed_image = self.preprocess_image(image)
372
+ image_features = self.image_encoder(**preprocessed_image).image_embeds
373
+ image_features = image_features / image_features.norm(dim=1, keepdim=True)
374
+ return image_features
375
+
376
+ def encode_text(self, text):
377
+ tokenized_text = self.tokenize_text(text)
378
+ text_features = self.text_encoder(**tokenized_text).text_embeds
379
+ text_features = text_features / text_features.norm(dim=1, keepdim=True)
380
+ return text_features
381
+
382
+ def compute_directional_similarity(self, img_feat_one, img_feat_two, text_feat_one, text_feat_two):
383
+ sim_direction = F.cosine_similarity(img_feat_two - img_feat_one, text_feat_two - text_feat_one)
384
+ return sim_direction
385
+
386
+ def forward(self, image_one, image_two, caption_one, caption_two):
387
+ img_feat_one = self.encode_image(image_one)
388
+ img_feat_two = self.encode_image(image_two)
389
+ text_feat_one = self.encode_text(caption_one)
390
+ text_feat_two = self.encode_text(caption_two)
391
+ directional_similarity = self.compute_directional_similarity(
392
+ img_feat_one, img_feat_two, text_feat_one, text_feat_two
393
+ )
394
+ return directional_similarity
395
+ ```
396
+
397
+ Let's put `DirectionalSimilarity` to use now.
398
+
399
+ ```python
400
+ dir_similarity = DirectionalSimilarity(tokenizer, text_encoder, image_processor, image_encoder)
401
+ scores = []
402
+
403
+ for i in range(len(input_images)):
404
+ original_image = input_images[i]
405
+ original_caption = original_captions[i]
406
+ edited_image = edited_images[i]
407
+ modified_caption = modified_captions[i]
408
+
409
+ similarity_score = dir_similarity(original_image, edited_image, original_caption, modified_caption)
410
+ scores.append(float(similarity_score.detach().cpu()))
411
+
412
+ print(f"CLIP directional similarity: {np.mean(scores)}")
413
+ # CLIP directional similarity: 0.0797976553440094
414
+ ```
415
+
416
+ Like the CLIP Score, the higher the CLIP directional similarity, the better it is.
417
+
418
+ It should be noted that the `StableDiffusionInstructPix2PixPipeline` exposes two arguments, namely, `image_guidance_scale` and `guidance_scale` that let you control the quality of the final edited image. We encourage you to experiment with these two arguments and see the impact of that on the directional similarity.
419
+
420
+ We can extend the idea of this metric to measure how similar the original image and edited version are. To do that, we can just do `F.cosine_similarity(img_feat_two, img_feat_one)`. For these kinds of edits, we would still want the primary semantics of the images to be preserved as much as possible, i.e., a high similarity score.
421
+
422
+ We can use these metrics for similar pipelines such as the [`StableDiffusionPix2PixZeroPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/pix2pix_zero#diffusers.StableDiffusionPix2PixZeroPipeline).
423
+
424
+ <Tip>
425
+
426
+ Both CLIP score and CLIP direction similarity rely on the CLIP model, which can make the evaluations biased.
427
+
428
+ </Tip>
429
+
430
+ ***Extending metrics like IS, FID (discussed later), or KID can be difficult*** when the model under evaluation was pre-trained on a large image-captioning dataset (such as the [LAION-5B dataset](https://laion.ai/blog/laion-5b/)). This is because underlying these metrics is an InceptionNet (pre-trained on the ImageNet-1k dataset) used for extracting intermediate image features. The pre-training dataset of Stable Diffusion may have limited overlap with the pre-training dataset of InceptionNet, so it is not a good candidate here for feature extraction.
431
+
432
+ ***Using the above metrics helps evaluate models that are class-conditioned. For example, [DiT](https://huggingface.co/docs/diffusers/main/en/api/pipelines/dit). It was pre-trained being conditioned on the ImageNet-1k classes.***
433
+
434
+ ### Class-conditioned image generation
435
+
436
+ Class-conditioned generative models are usually pre-trained on a class-labeled dataset such as [ImageNet-1k](https://huggingface.co/datasets/imagenet-1k). Popular metrics for evaluating these models include Fréchet Inception Distance (FID), Kernel Inception Distance (KID), and Inception Score (IS). In this document, we focus on FID ([Heusel et al.](https://huggingface.co/papers/1706.08500)). We show how to compute it with the [`DiTPipeline`](https://huggingface.co/docs/diffusers/api/pipelines/dit), which uses the [DiT model](https://huggingface.co/papers/2212.09748) under the hood.
437
+
438
+ FID aims to measure how similar are two datasets of images. As per [this resource](https://mmgeneration.readthedocs.io/en/latest/quick_run.html#fid):
439
+
440
+ > Fréchet Inception Distance is a measure of similarity between two datasets of images. It was shown to correlate well with the human judgment of visual quality and is most often used to evaluate the quality of samples of Generative Adversarial Networks. FID is calculated by computing the Fréchet distance between two Gaussians fitted to feature representations of the Inception network.
441
+
442
+ These two datasets are essentially the dataset of real images and the dataset of fake images (generated images in our case). FID is usually calculated with two large datasets. However, for this document, we will work with two mini datasets.
443
+
444
+ Let's first download a few images from the ImageNet-1k training set:
445
+
446
+ ```python
447
+ from zipfile import ZipFile
448
+ import requests
449
+
450
+
451
+ def download(url, local_filepath):
452
+ r = requests.get(url)
453
+ with open(local_filepath, "wb") as f:
454
+ f.write(r.content)
455
+ return local_filepath
456
+
457
+ dummy_dataset_url = "https://hf.co/datasets/sayakpaul/sample-datasets/resolve/main/sample-imagenet-images.zip"
458
+ local_filepath = download(dummy_dataset_url, dummy_dataset_url.split("/")[-1])
459
+
460
+ with ZipFile(local_filepath, "r") as zipper:
461
+ zipper.extractall(".")
462
+ ```
463
+
464
+ ```python
465
+ from PIL import Image
466
+ import os
467
+ import numpy as np
468
+
469
+ dataset_path = "sample-imagenet-images"
470
+ image_paths = sorted([os.path.join(dataset_path, x) for x in os.listdir(dataset_path)])
471
+
472
+ real_images = [np.array(Image.open(path).convert("RGB")) for path in image_paths]
473
+ ```
474
+
475
+ These are 10 images from the following ImageNet-1k classes: "cassette_player", "chain_saw" (x2), "church", "gas_pump" (x3), "parachute" (x2), and "tench".
476
+
477
+ <p align="center">
478
+ <img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/real-images.png" alt="real-images"><br>
479
+ <em>Real images.</em>
480
+ </p>
481
+
482
+ Now that the images are loaded, let's apply some lightweight pre-processing on them to use them for FID calculation.
483
+
484
+ ```python
485
+ from torchvision.transforms import functional as F
486
+ import torch
487
+
488
+
489
+ def preprocess_image(image):
490
+ image = torch.tensor(image).unsqueeze(0)
491
+ image = image.permute(0, 3, 1, 2) / 255.0
492
+ return F.center_crop(image, (256, 256))
493
+
494
+ real_images = torch.cat([preprocess_image(image) for image in real_images])
495
+ print(real_images.shape)
496
+ # torch.Size([10, 3, 256, 256])
497
+ ```
498
+
499
+ We now load the [`DiTPipeline`](https://huggingface.co/docs/diffusers/api/pipelines/dit) to generate images conditioned on the above-mentioned classes.
500
+
501
+ ```python
502
+ from diffusers import DiTPipeline, DPMSolverMultistepScheduler
503
+
504
+ dit_pipeline = DiTPipeline.from_pretrained("facebook/DiT-XL-2-256", torch_dtype=torch.float16)
505
+ dit_pipeline.scheduler = DPMSolverMultistepScheduler.from_config(dit_pipeline.scheduler.config)
506
+ dit_pipeline = dit_pipeline.to("cuda")
507
+
508
+ seed = 0
509
+ generator = torch.manual_seed(seed)
510
+
511
+
512
+ words = [
513
+ "cassette player",
514
+ "chainsaw",
515
+ "chainsaw",
516
+ "church",
517
+ "gas pump",
518
+ "gas pump",
519
+ "gas pump",
520
+ "parachute",
521
+ "parachute",
522
+ "tench",
523
+ ]
524
+
525
+ class_ids = dit_pipeline.get_label_ids(words)
526
+ output = dit_pipeline(class_labels=class_ids, generator=generator, output_type="np")
527
+
528
+ fake_images = output.images
529
+ fake_images = torch.tensor(fake_images)
530
+ fake_images = fake_images.permute(0, 3, 1, 2)
531
+ print(fake_images.shape)
532
+ # torch.Size([10, 3, 256, 256])
533
+ ```
534
+
535
+ Now, we can compute the FID using [`torchmetrics`](https://torchmetrics.readthedocs.io/).
536
+
537
+ ```python
538
+ from torchmetrics.image.fid import FrechetInceptionDistance
539
+
540
+ fid = FrechetInceptionDistance(normalize=True)
541
+ fid.update(real_images, real=True)
542
+ fid.update(fake_images, real=False)
543
+
544
+ print(f"FID: {float(fid.compute())}")
545
+ # FID: 177.7147216796875
546
+ ```
547
+
548
+ The lower the FID, the better it is. Several things can influence FID here:
549
+
550
+ - Number of images (both real and fake)
551
+ - Randomness induced in the diffusion process
552
+ - Number of inference steps in the diffusion process
553
+ - The scheduler being used in the diffusion process
554
+
555
+ For the last two points, it is, therefore, a good practice to run the evaluation across different seeds and inference steps, and then report an average result.
556
+
557
+ <Tip warning={true}>
558
+
559
+ FID results tend to be fragile as they depend on a lot of factors:
560
+
561
+ * The specific Inception model used during computation.
562
+ * The implementation accuracy of the computation.
563
+ * The image format (not the same if we start from PNGs vs JPGs).
564
+
565
+ Keeping that in mind, FID is often most useful when comparing similar runs, but it is
566
+ hard to reproduce paper results unless the authors carefully disclose the FID
567
+ measurement code.
568
+
569
+ These points apply to other related metrics too, such as KID and IS.
570
+
571
+ </Tip>
572
+
573
+ As a final step, let's visually inspect the `fake_images`.
574
+
575
+ <p align="center">
576
+ <img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/evaluation_diffusion_models/fake-images.png" alt="fake-images"><br>
577
+ <em>Fake images.</em>
578
+ </p>
diffusers/docs/source/en/conceptual/philosophy.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2025 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Philosophy
14
+
15
+ 🧨 Diffusers provides **state-of-the-art** pretrained diffusion models across multiple modalities.
16
+ Its purpose is to serve as a **modular toolbox** for both inference and training.
17
+
18
+ We aim at building a library that stands the test of time and therefore take API design very seriously.
19
+
20
+ In a nutshell, Diffusers is built to be a natural extension of PyTorch. Therefore, most of our design choices are based on [PyTorch's Design Principles](https://pytorch.org/docs/stable/community/design.html#pytorch-design-philosophy). Let's go over the most important ones:
21
+
22
+ ## Usability over Performance
23
+
24
+ - While Diffusers has many built-in performance-enhancing features (see [Memory and Speed](https://huggingface.co/docs/diffusers/optimization/fp16)), models are always loaded with the highest precision and lowest optimization. Therefore, by default diffusion pipelines are always instantiated on CPU with float32 precision if not otherwise defined by the user. This ensures usability across different platforms and accelerators and means that no complex installations are required to run the library.
25
+ - Diffusers aims to be a **light-weight** package and therefore has very few required dependencies, but many soft dependencies that can improve performance (such as `accelerate`, `safetensors`, `onnx`, etc...). We strive to keep the library as lightweight as possible so that it can be added without much concern as a dependency on other packages.
26
+ - Diffusers prefers simple, self-explainable code over condensed, magic code. This means that short-hand code syntaxes such as lambda functions, and advanced PyTorch operators are often not desired.
27
+
28
+ ## Simple over easy
29
+
30
+ As PyTorch states, **explicit is better than implicit** and **simple is better than complex**. This design philosophy is reflected in multiple parts of the library:
31
+ - We follow PyTorch's API with methods like [`DiffusionPipeline.to`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.to) to let the user handle device management.
32
+ - Raising concise error messages is preferred to silently correct erroneous input. Diffusers aims at teaching the user, rather than making the library as easy to use as possible.
33
+ - Complex model vs. scheduler logic is exposed instead of magically handled inside. Schedulers/Samplers are separated from diffusion models with minimal dependencies on each other. This forces the user to write the unrolled denoising loop. However, the separation allows for easier debugging and gives the user more control over adapting the denoising process or switching out diffusion models or schedulers.
34
+ - Separately trained components of the diffusion pipeline, *e.g.* the text encoder, the unet, and the variational autoencoder, each have their own model class. This forces the user to handle the interaction between the different model components, and the serialization format separates the model components into different files. However, this allows for easier debugging and customization. DreamBooth or Textual Inversion training
35
+ is very simple thanks to Diffusers' ability to separate single components of the diffusion pipeline.
36
+
37
+ ## Tweakable, contributor-friendly over abstraction
38
+
39
+ For large parts of the library, Diffusers adopts an important design principle of the [Transformers library](https://github.com/huggingface/transformers), which is to prefer copy-pasted code over hasty abstractions. This design principle is very opinionated and stands in stark contrast to popular design principles such as [Don't repeat yourself (DRY)](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself).
40
+ In short, just like Transformers does for modeling files, Diffusers prefers to keep an extremely low level of abstraction and very self-contained code for pipelines and schedulers.
41
+ Functions, long code blocks, and even classes can be copied across multiple files which at first can look like a bad, sloppy design choice that makes the library unmaintainable.
42
+ **However**, this design has proven to be extremely successful for Transformers and makes a lot of sense for community-driven, open-source machine learning libraries because:
43
+ - Machine Learning is an extremely fast-moving field in which paradigms, model architectures, and algorithms are changing rapidly, which therefore makes it very difficult to define long-lasting code abstractions.
44
+ - Machine Learning practitioners like to be able to quickly tweak existing code for ideation and research and therefore prefer self-contained code over one that contains many abstractions.
45
+ - Open-source libraries rely on community contributions and therefore must build a library that is easy to contribute to. The more abstract the code, the more dependencies, the harder to read, and the harder to contribute to. Contributors simply stop contributing to very abstract libraries out of fear of breaking vital functionality. If contributing to a library cannot break other fundamental code, not only is it more inviting for potential new contributors, but it is also easier to review and contribute to multiple parts in parallel.
46
+
47
+ At Hugging Face, we call this design the **single-file policy** which means that almost all of the code of a certain class should be written in a single, self-contained file. To read more about the philosophy, you can have a look
48
+ at [this blog post](https://huggingface.co/blog/transformers-design-philosophy).
49
+
50
+ In Diffusers, we follow this philosophy for both pipelines and schedulers, but only partly for diffusion models. The reason we don't follow this design fully for diffusion models is because almost all diffusion pipelines, such
51
+ as [DDPM](https://huggingface.co/docs/diffusers/api/pipelines/ddpm), [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview#stable-diffusion-pipelines), [unCLIP (DALL·E 2)](https://huggingface.co/docs/diffusers/api/pipelines/unclip) and [Imagen](https://imagen.research.google/) all rely on the same diffusion model, the [UNet](https://huggingface.co/docs/diffusers/api/models/unet2d-cond).
52
+
53
+ Great, now you should have generally understood why 🧨 Diffusers is designed the way it is 🤗.
54
+ We try to apply these design principles consistently across the library. Nevertheless, there are some minor exceptions to the philosophy or some unlucky design choices. If you have feedback regarding the design, we would ❤️ to hear it [directly on GitHub](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=).
55
+
56
+ ## Design Philosophy in Details
57
+
58
+ Now, let's look a bit into the nitty-gritty details of the design philosophy. Diffusers essentially consists of three major classes: [pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines), [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models), and [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
59
+ Let's walk through more in-detail design decisions for each class.
60
+
61
+ ### Pipelines
62
+
63
+ Pipelines are designed to be easy to use (therefore do not follow [*Simple over easy*](#simple-over-easy) 100%), are not feature complete, and should loosely be seen as examples of how to use [models](#models) and [schedulers](#schedulers) for inference.
64
+
65
+ The following design principles are followed:
66
+ - Pipelines follow the single-file policy. All pipelines can be found in individual directories under src/diffusers/pipelines. One pipeline folder corresponds to one diffusion paper/project/release. Multiple pipeline files can be gathered in one pipeline folder, as it’s done for [`src/diffusers/pipelines/stable-diffusion`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion). If pipelines share similar functionality, one can make use of the [# Copied from mechanism](https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251).
67
+ - Pipelines all inherit from [`DiffusionPipeline`].
68
+ - Every pipeline consists of different model and scheduler components, that are documented in the [`model_index.json` file](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/model_index.json), are accessible under the same name as attributes of the pipeline and can be shared between pipelines with [`DiffusionPipeline.components`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components) function.
69
+ - Every pipeline should be loadable via the [`DiffusionPipeline.from_pretrained`](https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained) function.
70
+ - Pipelines should be used **only** for inference.
71
+ - Pipelines should be very readable, self-explanatory, and easy to tweak.
72
+ - Pipelines should be designed to build on top of each other and be easy to integrate into higher-level APIs.
73
+ - Pipelines are **not** intended to be feature-complete user interfaces. For feature-complete user interfaces one should rather have a look at [InvokeAI](https://github.com/invoke-ai/InvokeAI), [Diffuzers](https://github.com/abhishekkrthakur/diffuzers), and [lama-cleaner](https://github.com/Sanster/lama-cleaner).
74
+ - Every pipeline should have one and only one way to run it via a `__call__` method. The naming of the `__call__` arguments should be shared across all pipelines.
75
+ - Pipelines should be named after the task they are intended to solve.
76
+ - In almost all cases, novel diffusion pipelines shall be implemented in a new pipeline folder/file.
77
+
78
+ ### Models
79
+
80
+ Models are designed as configurable toolboxes that are natural extensions of [PyTorch's Module class](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). They only partly follow the **single-file policy**.
81
+
82
+ The following design principles are followed:
83
+ - Models correspond to **a type of model architecture**. *E.g.* the [`UNet2DConditionModel`] class is used for all UNet variations that expect 2D image inputs and are conditioned on some context.
84
+ - All models can be found in [`src/diffusers/models`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and every model architecture shall be defined in its file, e.g. [`unets/unet_2d_condition.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_condition.py), [`transformers/transformer_2d.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformers/transformer_2d.py), etc...
85
+ - Models **do not** follow the single-file policy and should make use of smaller model building blocks, such as [`attention.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py), [`resnet.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py), [`embeddings.py`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py), etc... **Note**: This is in stark contrast to Transformers' modeling files and shows that models do not really follow the single-file policy.
86
+ - Models intend to expose complexity, just like PyTorch's `Module` class, and give clear error messages.
87
+ - Models all inherit from `ModelMixin` and `ConfigMixin`.
88
+ - Models can be optimized for performance when it doesn’t demand major code changes, keeps backward compatibility, and gives significant memory or compute gain.
89
+ - Models should by default have the highest precision and lowest performance setting.
90
+ - To integrate new model checkpoints whose general architecture can be classified as an architecture that already exists in Diffusers, the existing model architecture shall be adapted to make it work with the new checkpoint. One should only create a new file if the model architecture is fundamentally different.
91
+ - Models should be designed to be easily extendable to future changes. This can be achieved by limiting public function arguments, configuration arguments, and "foreseeing" future changes, *e.g.* it is usually better to add `string` "...type" arguments that can easily be extended to new future types instead of boolean `is_..._type` arguments. Only the minimum amount of changes shall be made to existing architectures to make a new model checkpoint work.
92
+ - The model design is a difficult trade-off between keeping code readable and concise and supporting many model checkpoints. For most parts of the modeling code, classes shall be adapted for new model checkpoints, while there are some exceptions where it is preferred to add new classes to make sure the code is kept concise and
93
+ readable long-term, such as [UNet blocks](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_blocks.py) and [Attention processors](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
94
+
95
+ ### Schedulers
96
+
97
+ Schedulers are responsible to guide the denoising process for inference as well as to define a noise schedule for training. They are designed as individual classes with loadable configuration files and strongly follow the **single-file policy**.
98
+
99
+ The following design principles are followed:
100
+ - All schedulers are found in [`src/diffusers/schedulers`](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers).
101
+ - Schedulers are **not** allowed to import from large utils files and shall be kept very self-contained.
102
+ - One scheduler Python file corresponds to one scheduler algorithm (as might be defined in a paper).
103
+ - If schedulers share similar functionalities, we can make use of the `# Copied from` mechanism.
104
+ - Schedulers all inherit from `SchedulerMixin` and `ConfigMixin`.
105
+ - Schedulers can be easily swapped out with the [`ConfigMixin.from_config`](https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config) method as explained in detail [here](../using-diffusers/schedulers).
106
+ - Every scheduler has to have a `set_num_inference_steps`, and a `step` function. `set_num_inference_steps(...)` has to be called before every denoising process, *i.e.* before `step(...)` is called.
107
+ - Every scheduler exposes the timesteps to be "looped over" via a `timesteps` attribute, which is an array of timesteps the model will be called upon.
108
+ - The `step(...)` function takes a predicted model output and the "current" sample (x_t) and returns the "previous", slightly more denoised sample (x_t-1).
109
+ - Given the complexity of diffusion schedulers, the `step` function does not expose all the complexity and can be a bit of a "black box".
110
+ - In almost all cases, novel schedulers shall be implemented in a new scheduling file.
diffusers/docs/source/en/index.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2025 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ <p align="center">
14
+ <br>
15
+ <img src="https://raw.githubusercontent.com/huggingface/diffusers/77aadfee6a891ab9fcfb780f87c693f7a5beeb8e/docs/source/imgs/diffusers_library.jpg" width="400"/>
16
+ <br>
17
+ </p>
18
+
19
+ # Diffusers
20
+
21
+ 🤗 Diffusers is the go-to library for state-of-the-art pretrained diffusion models for generating images, audio, and even 3D structures of molecules. Whether you're looking for a simple inference solution or want to train your own diffusion model, 🤗 Diffusers is a modular toolbox that supports both. Our library is designed with a focus on [usability over performance](conceptual/philosophy#usability-over-performance), [simple over easy](conceptual/philosophy#simple-over-easy), and [customizability over abstractions](conceptual/philosophy#tweakable-contributorfriendly-over-abstraction).
22
+
23
+ The library has three main components:
24
+
25
+ - State-of-the-art diffusion pipelines for inference with just a few lines of code. There are many pipelines in 🤗 Diffusers, check out the table in the pipeline [overview](api/pipelines/overview) for a complete list of available pipelines and the task they solve.
26
+ - Interchangeable [noise schedulers](api/schedulers/overview) for balancing trade-offs between generation speed and quality.
27
+ - Pretrained [models](api/models) that can be used as building blocks, and combined with schedulers, for creating your own end-to-end diffusion systems.
28
+
29
+ <div class="mt-10">
30
+ <div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-2 md:gap-y-4 md:gap-x-5">
31
+ <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./tutorials/tutorial_overview"
32
+ ><div class="w-full text-center bg-gradient-to-br from-blue-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Tutorials</div>
33
+ <p class="text-gray-700">Learn the fundamental skills you need to start generating outputs, build your own diffusion system, and train a diffusion model. We recommend starting here if you're using 🤗 Diffusers for the first time!</p>
34
+ </a>
35
+ <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./using-diffusers/loading_overview"
36
+ ><div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">How-to guides</div>
37
+ <p class="text-gray-700">Practical guides for helping you load pipelines, models, and schedulers. You'll also learn how to use pipelines for specific tasks, control how outputs are generated, optimize for inference speed, and different training techniques.</p>
38
+ </a>
39
+ <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./conceptual/philosophy"
40
+ ><div class="w-full text-center bg-gradient-to-br from-pink-400 to-pink-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Conceptual guides</div>
41
+ <p class="text-gray-700">Understand why the library was designed the way it was, and learn more about the ethical guidelines and safety implementations for using the library.</p>
42
+ </a>
43
+ <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./api/models/overview"
44
+ ><div class="w-full text-center bg-gradient-to-br from-purple-400 to-purple-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">Reference</div>
45
+ <p class="text-gray-700">Technical descriptions of how 🤗 Diffusers classes and methods work.</p>
46
+ </a>
47
+ </div>
48
+ </div>
diffusers/docs/source/en/installation.md ADDED
@@ -0,0 +1,194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2025 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Installation
14
+
15
+ 🤗 Diffusers is tested on Python 3.8+, PyTorch 1.7.0+, and Flax. Follow the installation instructions below for the deep learning library you are using:
16
+
17
+ - [PyTorch](https://pytorch.org/get-started/locally/) installation instructions
18
+ - [Flax](https://flax.readthedocs.io/en/latest/) installation instructions
19
+
20
+ ## Install with pip
21
+
22
+ You should install 🤗 Diffusers in a [virtual environment](https://docs.python.org/3/library/venv.html).
23
+ If you're unfamiliar with Python virtual environments, take a look at this [guide](https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/).
24
+ A virtual environment makes it easier to manage different projects and avoid compatibility issues between dependencies.
25
+
26
+ Create a virtual environment with Python or [uv](https://docs.astral.sh/uv/) (refer to [Installation](https://docs.astral.sh/uv/getting-started/installation/) for installation instructions), a fast Rust-based Python package and project manager.
27
+
28
+ <hfoptions id="install">
29
+ <hfoption id="uv">
30
+
31
+ ```bash
32
+ uv venv my-env
33
+ source my-env/bin/activate
34
+ ```
35
+
36
+ </hfoption>
37
+ <hfoption id="Python">
38
+
39
+ ```bash
40
+ python -m venv my-env
41
+ source my-env/bin/activate
42
+ ```
43
+
44
+ </hfoption>
45
+ </hfoptions>
46
+
47
+ You should also install 🤗 Transformers because 🤗 Diffusers relies on its models.
48
+
49
+
50
+ <frameworkcontent>
51
+ <pt>
52
+
53
+ PyTorch only supports Python 3.8 - 3.11 on Windows. Install Diffusers with uv.
54
+
55
+ ```bash
56
+ uv install diffusers["torch"] transformers
57
+ ```
58
+
59
+ You can also install Diffusers with pip.
60
+
61
+ ```bash
62
+ pip install diffusers["torch"] transformers
63
+ ```
64
+
65
+ </pt>
66
+ <jax>
67
+
68
+ Install Diffusers with uv.
69
+
70
+ ```bash
71
+ uv pip install diffusers["flax"] transformers
72
+ ```
73
+
74
+ You can also install Diffusers with pip.
75
+
76
+ ```bash
77
+ pip install diffusers["flax"] transformers
78
+ ```
79
+
80
+ </jax>
81
+ </frameworkcontent>
82
+
83
+ ## Install with conda
84
+
85
+ After activating your virtual environment, with `conda` (maintained by the community):
86
+
87
+ ```bash
88
+ conda install -c conda-forge diffusers
89
+ ```
90
+
91
+ ## Install from source
92
+
93
+ Before installing 🤗 Diffusers from source, make sure you have PyTorch and 🤗 Accelerate installed.
94
+
95
+ To install 🤗 Accelerate:
96
+
97
+ ```bash
98
+ pip install accelerate
99
+ ```
100
+
101
+ Then install 🤗 Diffusers from source:
102
+
103
+ ```bash
104
+ pip install git+https://github.com/huggingface/diffusers
105
+ ```
106
+
107
+ This command installs the bleeding edge `main` version rather than the latest `stable` version.
108
+ The `main` version is useful for staying up-to-date with the latest developments.
109
+ For instance, if a bug has been fixed since the last official release but a new release hasn't been rolled out yet.
110
+ However, this means the `main` version may not always be stable.
111
+ We strive to keep the `main` version operational, and most issues are usually resolved within a few hours or a day.
112
+ If you run into a problem, please open an [Issue](https://github.com/huggingface/diffusers/issues/new/choose) so we can fix it even sooner!
113
+
114
+ ## Editable install
115
+
116
+ You will need an editable install if you'd like to:
117
+
118
+ * Use the `main` version of the source code.
119
+ * Contribute to 🤗 Diffusers and need to test changes in the code.
120
+
121
+ Clone the repository and install 🤗 Diffusers with the following commands:
122
+
123
+ ```bash
124
+ git clone https://github.com/huggingface/diffusers.git
125
+ cd diffusers
126
+ ```
127
+
128
+ <frameworkcontent>
129
+ <pt>
130
+ ```bash
131
+ pip install -e ".[torch]"
132
+ ```
133
+ </pt>
134
+ <jax>
135
+ ```bash
136
+ pip install -e ".[flax]"
137
+ ```
138
+ </jax>
139
+ </frameworkcontent>
140
+
141
+ These commands will link the folder you cloned the repository to and your Python library paths.
142
+ Python will now look inside the folder you cloned to in addition to the normal library paths.
143
+ For example, if your Python packages are typically installed in `~/anaconda3/envs/main/lib/python3.10/site-packages/`, Python will also search the `~/diffusers/` folder you cloned to.
144
+
145
+ <Tip warning={true}>
146
+
147
+ You must keep the `diffusers` folder if you want to keep using the library.
148
+
149
+ </Tip>
150
+
151
+ Now you can easily update your clone to the latest version of 🤗 Diffusers with the following command:
152
+
153
+ ```bash
154
+ cd ~/diffusers/
155
+ git pull
156
+ ```
157
+
158
+ Your Python environment will find the `main` version of 🤗 Diffusers on the next run.
159
+
160
+ ## Cache
161
+
162
+ Model weights and files are downloaded from the Hub to a cache which is usually your home directory. You can change the cache location by specifying the `HF_HOME` or `HUGGINFACE_HUB_CACHE` environment variables or configuring the `cache_dir` parameter in methods like [`~DiffusionPipeline.from_pretrained`].
163
+
164
+ Cached files allow you to run 🤗 Diffusers offline. To prevent 🤗 Diffusers from connecting to the internet, set the `HF_HUB_OFFLINE` environment variable to `1` and 🤗 Diffusers will only load previously downloaded files in the cache.
165
+
166
+ ```shell
167
+ export HF_HUB_OFFLINE=1
168
+ ```
169
+
170
+ For more details about managing and cleaning the cache, take a look at the [caching](https://huggingface.co/docs/huggingface_hub/guides/manage-cache) guide.
171
+
172
+ ## Telemetry logging
173
+
174
+ Our library gathers telemetry information during [`~DiffusionPipeline.from_pretrained`] requests.
175
+ The data gathered includes the version of 🤗 Diffusers and PyTorch/Flax, the requested model or pipeline class,
176
+ and the path to a pretrained checkpoint if it is hosted on the Hugging Face Hub.
177
+ This usage data helps us debug issues and prioritize new features.
178
+ Telemetry is only sent when loading models and pipelines from the Hub,
179
+ and it is not collected if you're loading local files.
180
+
181
+ We understand that not everyone wants to share additional information,and we respect your privacy.
182
+ You can disable telemetry collection by setting the `HF_HUB_DISABLE_TELEMETRY` environment variable from your terminal:
183
+
184
+ On Linux/MacOS:
185
+
186
+ ```bash
187
+ export HF_HUB_DISABLE_TELEMETRY=1
188
+ ```
189
+
190
+ On Windows:
191
+
192
+ ```bash
193
+ set HF_HUB_DISABLE_TELEMETRY=1
194
+ ```
diffusers/docs/source/en/quicktour.md ADDED
@@ -0,0 +1,323 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2025 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ [[open-in-colab]]
14
+
15
+ # Quicktour
16
+
17
+ Diffusion models are trained to denoise random Gaussian noise step-by-step to generate a sample of interest, such as an image or audio. This has sparked a tremendous amount of interest in generative AI, and you have probably seen examples of diffusion generated images on the internet. 🧨 Diffusers is a library aimed at making diffusion models widely accessible to everyone.
18
+
19
+ Whether you're a developer or an everyday user, this quicktour will introduce you to 🧨 Diffusers and help you get up and generating quickly! There are three main components of the library to know about:
20
+
21
+ * The [`DiffusionPipeline`] is a high-level end-to-end class designed to rapidly generate samples from pretrained diffusion models for inference.
22
+ * Popular pretrained [model](./api/models) architectures and modules that can be used as building blocks for creating diffusion systems.
23
+ * Many different [schedulers](./api/schedulers/overview) - algorithms that control how noise is added for training, and how to generate denoised images during inference.
24
+
25
+ The quicktour will show you how to use the [`DiffusionPipeline`] for inference, and then walk you through how to combine a model and scheduler to replicate what's happening inside the [`DiffusionPipeline`].
26
+
27
+ <Tip>
28
+
29
+ The quicktour is a simplified version of the introductory 🧨 Diffusers [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) to help you get started quickly. If you want to learn more about 🧨 Diffusers' goal, design philosophy, and additional details about its core API, check out the notebook!
30
+
31
+ </Tip>
32
+
33
+ Before you begin, make sure you have all the necessary libraries installed:
34
+
35
+ ```py
36
+ # uncomment to install the necessary libraries in Colab
37
+ #!pip install --upgrade diffusers accelerate transformers
38
+ ```
39
+
40
+ - [🤗 Accelerate](https://huggingface.co/docs/accelerate/index) speeds up model loading for inference and training.
41
+ - [🤗 Transformers](https://huggingface.co/docs/transformers/index) is required to run the most popular diffusion models, such as [Stable Diffusion](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview).
42
+
43
+ ## DiffusionPipeline
44
+
45
+ The [`DiffusionPipeline`] is the easiest way to use a pretrained diffusion system for inference. It is an end-to-end system containing the model and the scheduler. You can use the [`DiffusionPipeline`] out-of-the-box for many tasks. Take a look at the table below for some supported tasks, and for a complete list of supported tasks, check out the [🧨 Diffusers Summary](./api/pipelines/overview#diffusers-summary) table.
46
+
47
+ | **Task** | **Description** | **Pipeline**
48
+ |------------------------------|--------------------------------------------------------------------------------------------------------------|-----------------|
49
+ | Unconditional Image Generation | generate an image from Gaussian noise | [unconditional_image_generation](./using-diffusers/unconditional_image_generation) |
50
+ | Text-Guided Image Generation | generate an image given a text prompt | [conditional_image_generation](./using-diffusers/conditional_image_generation) |
51
+ | Text-Guided Image-to-Image Translation | adapt an image guided by a text prompt | [img2img](./using-diffusers/img2img) |
52
+ | Text-Guided Image-Inpainting | fill the masked part of an image given the image, the mask and a text prompt | [inpaint](./using-diffusers/inpaint) |
53
+ | Text-Guided Depth-to-Image Translation | adapt parts of an image guided by a text prompt while preserving structure via depth estimation | [depth2img](./using-diffusers/depth2img) |
54
+
55
+ Start by creating an instance of a [`DiffusionPipeline`] and specify which pipeline checkpoint you would like to download.
56
+ You can use the [`DiffusionPipeline`] for any [checkpoint](https://huggingface.co/models?library=diffusers&sort=downloads) stored on the Hugging Face Hub.
57
+ In this quicktour, you'll load the [`stable-diffusion-v1-5`](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) checkpoint for text-to-image generation.
58
+
59
+ <Tip warning={true}>
60
+
61
+ For [Stable Diffusion](https://huggingface.co/CompVis/stable-diffusion) models, please carefully read the [license](https://huggingface.co/spaces/CompVis/stable-diffusion-license) first before running the model. 🧨 Diffusers implements a [`safety_checker`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py) to prevent offensive or harmful content, but the model's improved image generation capabilities can still produce potentially harmful content.
62
+
63
+ </Tip>
64
+
65
+ Load the model with the [`~DiffusionPipeline.from_pretrained`] method:
66
+
67
+ ```python
68
+ >>> from diffusers import DiffusionPipeline
69
+
70
+ >>> pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", use_safetensors=True)
71
+ ```
72
+
73
+ The [`DiffusionPipeline`] downloads and caches all modeling, tokenization, and scheduling components. You'll see that the Stable Diffusion pipeline is composed of the [`UNet2DConditionModel`] and [`PNDMScheduler`] among other things:
74
+
75
+ ```py
76
+ >>> pipeline
77
+ StableDiffusionPipeline {
78
+ "_class_name": "StableDiffusionPipeline",
79
+ "_diffusers_version": "0.21.4",
80
+ ...,
81
+ "scheduler": [
82
+ "diffusers",
83
+ "PNDMScheduler"
84
+ ],
85
+ ...,
86
+ "unet": [
87
+ "diffusers",
88
+ "UNet2DConditionModel"
89
+ ],
90
+ "vae": [
91
+ "diffusers",
92
+ "AutoencoderKL"
93
+ ]
94
+ }
95
+ ```
96
+
97
+ We strongly recommend running the pipeline on a GPU because the model consists of roughly 1.4 billion parameters.
98
+ You can move the generator object to a GPU, just like you would in PyTorch:
99
+
100
+ ```python
101
+ >>> pipeline.to("cuda")
102
+ ```
103
+
104
+ Now you can pass a text prompt to the `pipeline` to generate an image, and then access the denoised image. By default, the image output is wrapped in a [`PIL.Image`](https://pillow.readthedocs.io/en/stable/reference/Image.html?highlight=image#the-image-class) object.
105
+
106
+ ```python
107
+ >>> image = pipeline("An image of a squirrel in Picasso style").images[0]
108
+ >>> image
109
+ ```
110
+
111
+ <div class="flex justify-center">
112
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/image_of_squirrel_painting.png"/>
113
+ </div>
114
+
115
+ Save the image by calling `save`:
116
+
117
+ ```python
118
+ >>> image.save("image_of_squirrel_painting.png")
119
+ ```
120
+
121
+ ### Local pipeline
122
+
123
+ You can also use the pipeline locally. The only difference is you need to download the weights first:
124
+
125
+ ```bash
126
+ !git lfs install
127
+ !git clone https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5
128
+ ```
129
+
130
+ Then load the saved weights into the pipeline:
131
+
132
+ ```python
133
+ >>> pipeline = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5", use_safetensors=True)
134
+ ```
135
+
136
+ Now, you can run the pipeline as you would in the section above.
137
+
138
+ ### Swapping schedulers
139
+
140
+ Different schedulers come with different denoising speeds and quality trade-offs. The best way to find out which one works best for you is to try them out! One of the main features of 🧨 Diffusers is to allow you to easily switch between schedulers. For example, to replace the default [`PNDMScheduler`] with the [`EulerDiscreteScheduler`], load it with the [`~diffusers.ConfigMixin.from_config`] method:
141
+
142
+ ```py
143
+ >>> from diffusers import EulerDiscreteScheduler
144
+
145
+ >>> pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", use_safetensors=True)
146
+ >>> pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)
147
+ ```
148
+
149
+ Try generating an image with the new scheduler and see if you notice a difference!
150
+
151
+ In the next section, you'll take a closer look at the components - the model and scheduler - that make up the [`DiffusionPipeline`] and learn how to use these components to generate an image of a cat.
152
+
153
+ ## Models
154
+
155
+ Most models take a noisy sample, and at each timestep it predicts the *noise residual* (other models learn to predict the previous sample directly or the velocity or [`v-prediction`](https://github.com/huggingface/diffusers/blob/5e5ce13e2f89ac45a0066cb3f369462a3cf1d9ef/src/diffusers/schedulers/scheduling_ddim.py#L110)), the difference between a less noisy image and the input image. You can mix and match models to create other diffusion systems.
156
+
157
+ Models are initiated with the [`~ModelMixin.from_pretrained`] method which also locally caches the model weights so it is faster the next time you load the model. For the quicktour, you'll load the [`UNet2DModel`], a basic unconditional image generation model with a checkpoint trained on cat images:
158
+
159
+ ```py
160
+ >>> from diffusers import UNet2DModel
161
+
162
+ >>> repo_id = "google/ddpm-cat-256"
163
+ >>> model = UNet2DModel.from_pretrained(repo_id, use_safetensors=True)
164
+ ```
165
+
166
+ > [!TIP]
167
+ > Use the [`AutoModel`] API to automatically select a model class if you're unsure of which one to use.
168
+
169
+ To access the model parameters, call `model.config`:
170
+
171
+ ```py
172
+ >>> model.config
173
+ ```
174
+
175
+ The model configuration is a 🧊 frozen 🧊 dictionary, which means those parameters can't be changed after the model is created. This is intentional and ensures that the parameters used to define the model architecture at the start remain the same, while other parameters can still be adjusted during inference.
176
+
177
+ Some of the most important parameters are:
178
+
179
+ * `sample_size`: the height and width dimension of the input sample.
180
+ * `in_channels`: the number of input channels of the input sample.
181
+ * `down_block_types` and `up_block_types`: the type of down- and upsampling blocks used to create the UNet architecture.
182
+ * `block_out_channels`: the number of output channels of the downsampling blocks; also used in reverse order for the number of input channels of the upsampling blocks.
183
+ * `layers_per_block`: the number of ResNet blocks present in each UNet block.
184
+
185
+ To use the model for inference, create the image shape with random Gaussian noise. It should have a `batch` axis because the model can receive multiple random noises, a `channel` axis corresponding to the number of input channels, and a `sample_size` axis for the height and width of the image:
186
+
187
+ ```py
188
+ >>> import torch
189
+
190
+ >>> torch.manual_seed(0)
191
+
192
+ >>> noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size)
193
+ >>> noisy_sample.shape
194
+ torch.Size([1, 3, 256, 256])
195
+ ```
196
+
197
+ For inference, pass the noisy image and a `timestep` to the model. The `timestep` indicates how noisy the input image is, with more noise at the beginning and less at the end. This helps the model determine its position in the diffusion process, whether it is closer to the start or the end. Use the `sample` method to get the model output:
198
+
199
+ ```py
200
+ >>> with torch.no_grad():
201
+ ... noisy_residual = model(sample=noisy_sample, timestep=2).sample
202
+ ```
203
+
204
+ To generate actual examples though, you'll need a scheduler to guide the denoising process. In the next section, you'll learn how to couple a model with a scheduler.
205
+
206
+ ## Schedulers
207
+
208
+ Schedulers manage going from a noisy sample to a less noisy sample given the model output - in this case, it is the `noisy_residual`.
209
+
210
+ <Tip>
211
+
212
+ 🧨 Diffusers is a toolbox for building diffusion systems. While the [`DiffusionPipeline`] is a convenient way to get started with a pre-built diffusion system, you can also choose your own model and scheduler components separately to build a custom diffusion system.
213
+
214
+ </Tip>
215
+
216
+ For the quicktour, you'll instantiate the [`DDPMScheduler`] with its [`~diffusers.ConfigMixin.from_config`] method:
217
+
218
+ ```py
219
+ >>> from diffusers import DDPMScheduler
220
+
221
+ >>> scheduler = DDPMScheduler.from_pretrained(repo_id)
222
+ >>> scheduler
223
+ DDPMScheduler {
224
+ "_class_name": "DDPMScheduler",
225
+ "_diffusers_version": "0.21.4",
226
+ "beta_end": 0.02,
227
+ "beta_schedule": "linear",
228
+ "beta_start": 0.0001,
229
+ "clip_sample": true,
230
+ "clip_sample_range": 1.0,
231
+ "dynamic_thresholding_ratio": 0.995,
232
+ "num_train_timesteps": 1000,
233
+ "prediction_type": "epsilon",
234
+ "sample_max_value": 1.0,
235
+ "steps_offset": 0,
236
+ "thresholding": false,
237
+ "timestep_spacing": "leading",
238
+ "trained_betas": null,
239
+ "variance_type": "fixed_small"
240
+ }
241
+ ```
242
+
243
+ <Tip>
244
+
245
+ 💡 Unlike a model, a scheduler does not have trainable weights and is parameter-free!
246
+
247
+ </Tip>
248
+
249
+ Some of the most important parameters are:
250
+
251
+ * `num_train_timesteps`: the length of the denoising process or, in other words, the number of timesteps required to process random Gaussian noise into a data sample.
252
+ * `beta_schedule`: the type of noise schedule to use for inference and training.
253
+ * `beta_start` and `beta_end`: the start and end noise values for the noise schedule.
254
+
255
+ To predict a slightly less noisy image, pass the following to the scheduler's [`~diffusers.DDPMScheduler.step`] method: model output, `timestep`, and current `sample`.
256
+
257
+ ```py
258
+ >>> less_noisy_sample = scheduler.step(model_output=noisy_residual, timestep=2, sample=noisy_sample).prev_sample
259
+ >>> less_noisy_sample.shape
260
+ torch.Size([1, 3, 256, 256])
261
+ ```
262
+
263
+ The `less_noisy_sample` can be passed to the next `timestep` where it'll get even less noisy! Let's bring it all together now and visualize the entire denoising process.
264
+
265
+ First, create a function that postprocesses and displays the denoised image as a `PIL.Image`:
266
+
267
+ ```py
268
+ >>> import PIL.Image
269
+ >>> import numpy as np
270
+
271
+
272
+ >>> def display_sample(sample, i):
273
+ ... image_processed = sample.cpu().permute(0, 2, 3, 1)
274
+ ... image_processed = (image_processed + 1.0) * 127.5
275
+ ... image_processed = image_processed.numpy().astype(np.uint8)
276
+
277
+ ... image_pil = PIL.Image.fromarray(image_processed[0])
278
+ ... display(f"Image at step {i}")
279
+ ... display(image_pil)
280
+ ```
281
+
282
+ To speed up the denoising process, move the input and model to a GPU:
283
+
284
+ ```py
285
+ >>> model.to("cuda")
286
+ >>> noisy_sample = noisy_sample.to("cuda")
287
+ ```
288
+
289
+ Now create a denoising loop that predicts the residual of the less noisy sample, and computes the less noisy sample with the scheduler:
290
+
291
+ ```py
292
+ >>> import tqdm
293
+
294
+ >>> sample = noisy_sample
295
+
296
+ >>> for i, t in enumerate(tqdm.tqdm(scheduler.timesteps)):
297
+ ... # 1. predict noise residual
298
+ ... with torch.no_grad():
299
+ ... residual = model(sample, t).sample
300
+
301
+ ... # 2. compute less noisy image and set x_t -> x_t-1
302
+ ... sample = scheduler.step(residual, t, sample).prev_sample
303
+
304
+ ... # 3. optionally look at image
305
+ ... if (i + 1) % 50 == 0:
306
+ ... display_sample(sample, i + 1)
307
+ ```
308
+
309
+ Sit back and watch as a cat is generated from nothing but noise! 😻
310
+
311
+ <div class="flex justify-center">
312
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/diffusion-quicktour.png"/>
313
+ </div>
314
+
315
+ ## Next steps
316
+
317
+ Hopefully, you generated some cool images with 🧨 Diffusers in this quicktour! For your next steps, you can:
318
+
319
+ * Train or finetune a model to generate your own images in the [training](./tutorials/basic_training) tutorial.
320
+ * See example official and community [training or finetuning scripts](https://github.com/huggingface/diffusers/tree/main/examples#-diffusers-examples) for a variety of use cases.
321
+ * Learn more about loading, accessing, changing, and comparing schedulers in the [Using different Schedulers](./using-diffusers/schedulers) guide.
322
+ * Explore prompt engineering, speed and memory optimizations, and tips and tricks for generating higher-quality images with the [Stable Diffusion](./stable_diffusion) guide.
323
+ * Dive deeper into speeding up 🧨 Diffusers with guides on [optimized PyTorch on a GPU](./optimization/fp16), and inference guides for running [Stable Diffusion on Apple Silicon (M1/M2)](./optimization/mps) and [ONNX Runtime](./optimization/onnx).
diffusers/docs/source/en/stable_diffusion.md ADDED
@@ -0,0 +1,261 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2025 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Effective and efficient diffusion
14
+
15
+ [[open-in-colab]]
16
+
17
+ Getting the [`DiffusionPipeline`] to generate images in a certain style or include what you want can be tricky. Often times, you have to run the [`DiffusionPipeline`] several times before you end up with an image you're happy with. But generating something out of nothing is a computationally intensive process, especially if you're running inference over and over again.
18
+
19
+ This is why it's important to get the most *computational* (speed) and *memory* (GPU vRAM) efficiency from the pipeline to reduce the time between inference cycles so you can iterate faster.
20
+
21
+ This tutorial walks you through how to generate faster and better with the [`DiffusionPipeline`].
22
+
23
+ Begin by loading the [`stable-diffusion-v1-5/stable-diffusion-v1-5`](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) model:
24
+
25
+ ```python
26
+ from diffusers import DiffusionPipeline
27
+
28
+ model_id = "stable-diffusion-v1-5/stable-diffusion-v1-5"
29
+ pipeline = DiffusionPipeline.from_pretrained(model_id, use_safetensors=True)
30
+ ```
31
+
32
+ The example prompt you'll use is a portrait of an old warrior chief, but feel free to use your own prompt:
33
+
34
+ ```python
35
+ prompt = "portrait photo of a old warrior chief"
36
+ ```
37
+
38
+ ## Speed
39
+
40
+ <Tip>
41
+
42
+ 💡 If you don't have access to a GPU, you can use one for free from a GPU provider like [Colab](https://colab.research.google.com/)!
43
+
44
+ </Tip>
45
+
46
+ One of the simplest ways to speed up inference is to place the pipeline on a GPU the same way you would with any PyTorch module:
47
+
48
+ ```python
49
+ pipeline = pipeline.to("cuda")
50
+ ```
51
+
52
+ To make sure you can use the same image and improve on it, use a [`Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html) and set a seed for [reproducibility](./using-diffusers/reusing_seeds):
53
+
54
+ ```python
55
+ import torch
56
+
57
+ generator = torch.Generator("cuda").manual_seed(0)
58
+ ```
59
+
60
+ Now you can generate an image:
61
+
62
+ ```python
63
+ image = pipeline(prompt, generator=generator).images[0]
64
+ image
65
+ ```
66
+
67
+ <div class="flex justify-center">
68
+ <img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/stable_diffusion_101/sd_101_1.png">
69
+ </div>
70
+
71
+ This process took ~30 seconds on a T4 GPU (it might be faster if your allocated GPU is better than a T4). By default, the [`DiffusionPipeline`] runs inference with full `float32` precision for 50 inference steps. You can speed this up by switching to a lower precision like `float16` or running fewer inference steps.
72
+
73
+ Let's start by loading the model in `float16` and generate an image:
74
+
75
+ ```python
76
+ import torch
77
+
78
+ pipeline = DiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16, use_safetensors=True)
79
+ pipeline = pipeline.to("cuda")
80
+ generator = torch.Generator("cuda").manual_seed(0)
81
+ image = pipeline(prompt, generator=generator).images[0]
82
+ image
83
+ ```
84
+
85
+ <div class="flex justify-center">
86
+ <img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/stable_diffusion_101/sd_101_2.png">
87
+ </div>
88
+
89
+ This time, it only took ~11 seconds to generate the image, which is almost 3x faster than before!
90
+
91
+ <Tip>
92
+
93
+ 💡 We strongly suggest always running your pipelines in `float16`, and so far, we've rarely seen any degradation in output quality.
94
+
95
+ </Tip>
96
+
97
+ Another option is to reduce the number of inference steps. Choosing a more efficient scheduler could help decrease the number of steps without sacrificing output quality. You can find which schedulers are compatible with the current model in the [`DiffusionPipeline`] by calling the `compatibles` method:
98
+
99
+ ```python
100
+ pipeline.scheduler.compatibles
101
+ [
102
+ diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler,
103
+ diffusers.schedulers.scheduling_unipc_multistep.UniPCMultistepScheduler,
104
+ diffusers.schedulers.scheduling_k_dpm_2_discrete.KDPM2DiscreteScheduler,
105
+ diffusers.schedulers.scheduling_deis_multistep.DEISMultistepScheduler,
106
+ diffusers.schedulers.scheduling_euler_discrete.EulerDiscreteScheduler,
107
+ diffusers.schedulers.scheduling_dpmsolver_multistep.DPMSolverMultistepScheduler,
108
+ diffusers.schedulers.scheduling_ddpm.DDPMScheduler,
109
+ diffusers.schedulers.scheduling_dpmsolver_singlestep.DPMSolverSinglestepScheduler,
110
+ diffusers.schedulers.scheduling_k_dpm_2_ancestral_discrete.KDPM2AncestralDiscreteScheduler,
111
+ diffusers.utils.dummy_torch_and_torchsde_objects.DPMSolverSDEScheduler,
112
+ diffusers.schedulers.scheduling_heun_discrete.HeunDiscreteScheduler,
113
+ diffusers.schedulers.scheduling_pndm.PNDMScheduler,
114
+ diffusers.schedulers.scheduling_euler_ancestral_discrete.EulerAncestralDiscreteScheduler,
115
+ diffusers.schedulers.scheduling_ddim.DDIMScheduler,
116
+ ]
117
+ ```
118
+
119
+ The Stable Diffusion model uses the [`PNDMScheduler`] by default which usually requires ~50 inference steps, but more performant schedulers like [`DPMSolverMultistepScheduler`], require only ~20 or 25 inference steps. Use the [`~ConfigMixin.from_config`] method to load a new scheduler:
120
+
121
+ ```python
122
+ from diffusers import DPMSolverMultistepScheduler
123
+
124
+ pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
125
+ ```
126
+
127
+ Now set the `num_inference_steps` to 20:
128
+
129
+ ```python
130
+ generator = torch.Generator("cuda").manual_seed(0)
131
+ image = pipeline(prompt, generator=generator, num_inference_steps=20).images[0]
132
+ image
133
+ ```
134
+
135
+ <div class="flex justify-center">
136
+ <img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/stable_diffusion_101/sd_101_3.png">
137
+ </div>
138
+
139
+ Great, you've managed to cut the inference time to just 4 seconds! ⚡️
140
+
141
+ ## Memory
142
+
143
+ The other key to improving pipeline performance is consuming less memory, which indirectly implies more speed, since you're often trying to maximize the number of images generated per second. The easiest way to see how many images you can generate at once is to try out different batch sizes until you get an `OutOfMemoryError` (OOM).
144
+
145
+ Create a function that'll generate a batch of images from a list of prompts and `Generators`. Make sure to assign each `Generator` a seed so you can reuse it if it produces a good result.
146
+
147
+ ```python
148
+ def get_inputs(batch_size=1):
149
+ generator = [torch.Generator("cuda").manual_seed(i) for i in range(batch_size)]
150
+ prompts = batch_size * [prompt]
151
+ num_inference_steps = 20
152
+
153
+ return {"prompt": prompts, "generator": generator, "num_inference_steps": num_inference_steps}
154
+ ```
155
+
156
+ Start with `batch_size=4` and see how much memory you've consumed:
157
+
158
+ ```python
159
+ from diffusers.utils import make_image_grid
160
+
161
+ images = pipeline(**get_inputs(batch_size=4)).images
162
+ make_image_grid(images, 2, 2)
163
+ ```
164
+
165
+ Unless you have a GPU with more vRAM, the code above probably returned an `OOM` error! Most of the memory is taken up by the cross-attention layers. Instead of running this operation in a batch, you can run it sequentially to save a significant amount of memory. All you have to do is configure the pipeline to use the [`~DiffusionPipeline.enable_attention_slicing`] function:
166
+
167
+ ```python
168
+ pipeline.enable_attention_slicing()
169
+ ```
170
+
171
+ Now try increasing the `batch_size` to 8!
172
+
173
+ ```python
174
+ images = pipeline(**get_inputs(batch_size=8)).images
175
+ make_image_grid(images, rows=2, cols=4)
176
+ ```
177
+
178
+ <div class="flex justify-center">
179
+ <img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/stable_diffusion_101/sd_101_5.png">
180
+ </div>
181
+
182
+ Whereas before you couldn't even generate a batch of 4 images, now you can generate a batch of 8 images at ~3.5 seconds per image! This is probably the fastest you can go on a T4 GPU without sacrificing quality.
183
+
184
+ ## Quality
185
+
186
+ In the last two sections, you learned how to optimize the speed of your pipeline by using `fp16`, reducing the number of inference steps by using a more performant scheduler, and enabling attention slicing to reduce memory consumption. Now you're going to focus on how to improve the quality of generated images.
187
+
188
+ ### Better checkpoints
189
+
190
+ The most obvious step is to use better checkpoints. The Stable Diffusion model is a good starting point, and since its official launch, several improved versions have also been released. However, using a newer version doesn't automatically mean you'll get better results. You'll still have to experiment with different checkpoints yourself, and do a little research (such as using [negative prompts](https://minimaxir.com/2022/11/stable-diffusion-negative-prompt/)) to get the best results.
191
+
192
+ As the field grows, there are more and more high-quality checkpoints finetuned to produce certain styles. Try exploring the [Hub](https://huggingface.co/models?library=diffusers&sort=downloads) and [Diffusers Gallery](https://huggingface.co/spaces/huggingface-projects/diffusers-gallery) to find one you're interested in!
193
+
194
+ ### Better pipeline components
195
+
196
+ You can also try replacing the current pipeline components with a newer version. Let's try loading the latest [autoencoder](https://huggingface.co/stabilityai/stable-diffusion-2-1/tree/main/vae) from Stability AI into the pipeline, and generate some images:
197
+
198
+ ```python
199
+ from diffusers import AutoencoderKL
200
+
201
+ vae = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse", torch_dtype=torch.float16).to("cuda")
202
+ pipeline.vae = vae
203
+ images = pipeline(**get_inputs(batch_size=8)).images
204
+ make_image_grid(images, rows=2, cols=4)
205
+ ```
206
+
207
+ <div class="flex justify-center">
208
+ <img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/stable_diffusion_101/sd_101_6.png">
209
+ </div>
210
+
211
+ ### Better prompt engineering
212
+
213
+ The text prompt you use to generate an image is super important, so much so that it is called *prompt engineering*. Some considerations to keep during prompt engineering are:
214
+
215
+ - How is the image or similar images of the one I want to generate stored on the internet?
216
+ - What additional detail can I give that steers the model towards the style I want?
217
+
218
+ With this in mind, let's improve the prompt to include color and higher quality details:
219
+
220
+ ```python
221
+ prompt += ", tribal panther make up, blue on red, side profile, looking away, serious eyes"
222
+ prompt += " 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta"
223
+ ```
224
+
225
+ Generate a batch of images with the new prompt:
226
+
227
+ ```python
228
+ images = pipeline(**get_inputs(batch_size=8)).images
229
+ make_image_grid(images, rows=2, cols=4)
230
+ ```
231
+
232
+ <div class="flex justify-center">
233
+ <img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/stable_diffusion_101/sd_101_7.png">
234
+ </div>
235
+
236
+ Pretty impressive! Let's tweak the second image - corresponding to the `Generator` with a seed of `1` - a bit more by adding some text about the age of the subject:
237
+
238
+ ```python
239
+ prompts = [
240
+ "portrait photo of the oldest warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta",
241
+ "portrait photo of an old warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta",
242
+ "portrait photo of a warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta",
243
+ "portrait photo of a young warrior chief, tribal panther make up, blue on red, side profile, looking away, serious eyes 50mm portrait photography, hard rim lighting photography--beta --ar 2:3 --beta --upbeta",
244
+ ]
245
+
246
+ generator = [torch.Generator("cuda").manual_seed(1) for _ in range(len(prompts))]
247
+ images = pipeline(prompt=prompts, generator=generator, num_inference_steps=25).images
248
+ make_image_grid(images, 2, 2)
249
+ ```
250
+
251
+ <div class="flex justify-center">
252
+ <img src="https://huggingface.co/datasets/diffusers/docs-images/resolve/main/stable_diffusion_101/sd_101_8.png">
253
+ </div>
254
+
255
+ ## Next steps
256
+
257
+ In this tutorial, you learned how to optimize a [`DiffusionPipeline`] for computational and memory efficiency as well as improving the quality of generated outputs. If you're interested in making your pipeline even faster, take a look at the following resources:
258
+
259
+ - Learn how [PyTorch 2.0](./optimization/fp16) and [`torch.compile`](https://pytorch.org/docs/stable/generated/torch.compile.html) can yield 5 - 300% faster inference speed. On an A100 GPU, inference can be up to 50% faster!
260
+ - If you can't use PyTorch 2, we recommend you install [xFormers](./optimization/xformers). Its memory-efficient attention mechanism works great with PyTorch 1.13.1 for faster speed and reduced memory consumption.
261
+ - Other optimization techniques, such as model offloading, are covered in [this guide](./optimization/fp16).
diffusers/docs/source/en/using-diffusers/conditional_image_generation.md ADDED
@@ -0,0 +1,316 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2025 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Text-to-image
14
+
15
+ [[open-in-colab]]
16
+
17
+ When you think of diffusion models, text-to-image is usually one of the first things that come to mind. Text-to-image generates an image from a text description (for example, "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k") which is also known as a *prompt*.
18
+
19
+ From a very high level, a diffusion model takes a prompt and some random initial noise, and iteratively removes the noise to construct an image. The *denoising* process is guided by the prompt, and once the denoising process ends after a predetermined number of time steps, the image representation is decoded into an image.
20
+
21
+ <Tip>
22
+
23
+ Read the [How does Stable Diffusion work?](https://huggingface.co/blog/stable_diffusion#how-does-stable-diffusion-work) blog post to learn more about how a latent diffusion model works.
24
+
25
+ </Tip>
26
+
27
+ You can generate images from a prompt in 🤗 Diffusers in two steps:
28
+
29
+ 1. Load a checkpoint into the [`AutoPipelineForText2Image`] class, which automatically detects the appropriate pipeline class to use based on the checkpoint:
30
+
31
+ ```py
32
+ from diffusers import AutoPipelineForText2Image
33
+ import torch
34
+
35
+ pipeline = AutoPipelineForText2Image.from_pretrained(
36
+ "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
37
+ ).to("cuda")
38
+ ```
39
+
40
+ 2. Pass a prompt to the pipeline to generate an image:
41
+
42
+ ```py
43
+ image = pipeline(
44
+ "stained glass of darth vader, backlight, centered composition, masterpiece, photorealistic, 8k"
45
+ ).images[0]
46
+ image
47
+ ```
48
+
49
+ <div class="flex justify-center">
50
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-vader.png"/>
51
+ </div>
52
+
53
+ ## Popular models
54
+
55
+ The most common text-to-image models are [Stable Diffusion v1.5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5), [Stable Diffusion XL (SDXL)](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0), and [Kandinsky 2.2](https://huggingface.co/kandinsky-community/kandinsky-2-2-decoder). There are also ControlNet models or adapters that can be used with text-to-image models for more direct control in generating images. The results from each model are slightly different because of their architecture and training process, but no matter which model you choose, their usage is more or less the same. Let's use the same prompt for each model and compare their results.
56
+
57
+ ### Stable Diffusion v1.5
58
+
59
+ [Stable Diffusion v1.5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5) is a latent diffusion model initialized from [Stable Diffusion v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4), and finetuned for 595K steps on 512x512 images from the LAION-Aesthetics V2 dataset. You can use this model like:
60
+
61
+ ```py
62
+ from diffusers import AutoPipelineForText2Image
63
+ import torch
64
+
65
+ pipeline = AutoPipelineForText2Image.from_pretrained(
66
+ "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
67
+ ).to("cuda")
68
+ generator = torch.Generator("cuda").manual_seed(31)
69
+ image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", generator=generator).images[0]
70
+ image
71
+ ```
72
+
73
+ ### Stable Diffusion XL
74
+
75
+ SDXL is a much larger version of the previous Stable Diffusion models, and involves a two-stage model process that adds even more details to an image. It also includes some additional *micro-conditionings* to generate high-quality images centered subjects. Take a look at the more comprehensive [SDXL](sdxl) guide to learn more about how to use it. In general, you can use SDXL like:
76
+
77
+ ```py
78
+ from diffusers import AutoPipelineForText2Image
79
+ import torch
80
+
81
+ pipeline = AutoPipelineForText2Image.from_pretrained(
82
+ "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16"
83
+ ).to("cuda")
84
+ generator = torch.Generator("cuda").manual_seed(31)
85
+ image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", generator=generator).images[0]
86
+ image
87
+ ```
88
+
89
+ ### Kandinsky 2.2
90
+
91
+ The Kandinsky model is a bit different from the Stable Diffusion models because it also uses an image prior model to create embeddings that are used to better align text and images in the diffusion model.
92
+
93
+ The easiest way to use Kandinsky 2.2 is:
94
+
95
+ ```py
96
+ from diffusers import AutoPipelineForText2Image
97
+ import torch
98
+
99
+ pipeline = AutoPipelineForText2Image.from_pretrained(
100
+ "kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16
101
+ ).to("cuda")
102
+ generator = torch.Generator("cuda").manual_seed(31)
103
+ image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", generator=generator).images[0]
104
+ image
105
+ ```
106
+
107
+ ### ControlNet
108
+
109
+ ControlNet models are auxiliary models or adapters that are finetuned on top of text-to-image models, such as [Stable Diffusion v1.5](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5). Using ControlNet models in combination with text-to-image models offers diverse options for more explicit control over how to generate an image. With ControlNet, you add an additional conditioning input image to the model. For example, if you provide an image of a human pose (usually represented as multiple keypoints that are connected into a skeleton) as a conditioning input, the model generates an image that follows the pose of the image. Check out the more in-depth [ControlNet](controlnet) guide to learn more about other conditioning inputs and how to use them.
110
+
111
+ In this example, let's condition the ControlNet with a human pose estimation image. Load the ControlNet model pretrained on human pose estimations:
112
+
113
+ ```py
114
+ from diffusers import ControlNetModel, AutoPipelineForText2Image
115
+ from diffusers.utils import load_image
116
+ import torch
117
+
118
+ controlnet = ControlNetModel.from_pretrained(
119
+ "lllyasviel/control_v11p_sd15_openpose", torch_dtype=torch.float16, variant="fp16"
120
+ ).to("cuda")
121
+ pose_image = load_image("https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/control.png")
122
+ ```
123
+
124
+ Pass the `controlnet` to the [`AutoPipelineForText2Image`], and provide the prompt and pose estimation image:
125
+
126
+ ```py
127
+ pipeline = AutoPipelineForText2Image.from_pretrained(
128
+ "stable-diffusion-v1-5/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16, variant="fp16"
129
+ ).to("cuda")
130
+ generator = torch.Generator("cuda").manual_seed(31)
131
+ image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", image=pose_image, generator=generator).images[0]
132
+ image
133
+ ```
134
+
135
+ <div class="flex flex-row gap-4">
136
+ <div class="flex-1">
137
+ <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-1.png"/>
138
+ <figcaption class="mt-2 text-center text-sm text-gray-500">Stable Diffusion v1.5</figcaption>
139
+ </div>
140
+ <div class="flex-1">
141
+ <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/sdxl-text2img.png"/>
142
+ <figcaption class="mt-2 text-center text-sm text-gray-500">Stable Diffusion XL</figcaption>
143
+ </div>
144
+ <div class="flex-1">
145
+ <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-2.png"/>
146
+ <figcaption class="mt-2 text-center text-sm text-gray-500">Kandinsky 2.2</figcaption>
147
+ </div>
148
+ <div class="flex-1">
149
+ <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-3.png"/>
150
+ <figcaption class="mt-2 text-center text-sm text-gray-500">ControlNet (pose conditioning)</figcaption>
151
+ </div>
152
+ </div>
153
+
154
+ ## Configure pipeline parameters
155
+
156
+ There are a number of parameters that can be configured in the pipeline that affect how an image is generated. You can change the image's output size, specify a negative prompt to improve image quality, and more. This section dives deeper into how to use these parameters.
157
+
158
+ ### Height and width
159
+
160
+ The `height` and `width` parameters control the height and width (in pixels) of the generated image. By default, the Stable Diffusion v1.5 model outputs 512x512 images, but you can change this to any size that is a multiple of 8. For example, to create a rectangular image:
161
+
162
+ ```py
163
+ from diffusers import AutoPipelineForText2Image
164
+ import torch
165
+
166
+ pipeline = AutoPipelineForText2Image.from_pretrained(
167
+ "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16"
168
+ ).to("cuda")
169
+ image = pipeline(
170
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", height=768, width=512
171
+ ).images[0]
172
+ image
173
+ ```
174
+
175
+ <div class="flex justify-center">
176
+ <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-hw.png"/>
177
+ </div>
178
+
179
+ <Tip warning={true}>
180
+
181
+ Other models may have different default image sizes depending on the image sizes in the training dataset. For example, SDXL's default image size is 1024x1024 and using lower `height` and `width` values may result in lower quality images. Make sure you check the model's API reference first!
182
+
183
+ </Tip>
184
+
185
+ ### Guidance scale
186
+
187
+ The `guidance_scale` parameter affects how much the prompt influences image generation. A lower value gives the model "creativity" to generate images that are more loosely related to the prompt. Higher `guidance_scale` values push the model to follow the prompt more closely, and if this value is too high, you may observe some artifacts in the generated image.
188
+
189
+ ```py
190
+ from diffusers import AutoPipelineForText2Image
191
+ import torch
192
+
193
+ pipeline = AutoPipelineForText2Image.from_pretrained(
194
+ "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16
195
+ ).to("cuda")
196
+ image = pipeline(
197
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", guidance_scale=3.5
198
+ ).images[0]
199
+ image
200
+ ```
201
+
202
+ <div class="flex flex-row gap-4">
203
+ <div class="flex-1">
204
+ <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-guidance-scale-2.5.png"/>
205
+ <figcaption class="mt-2 text-center text-sm text-gray-500">guidance_scale = 2.5</figcaption>
206
+ </div>
207
+ <div class="flex-1">
208
+ <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-guidance-scale-7.5.png"/>
209
+ <figcaption class="mt-2 text-center text-sm text-gray-500">guidance_scale = 7.5</figcaption>
210
+ </div>
211
+ <div class="flex-1">
212
+ <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-guidance-scale-10.5.png"/>
213
+ <figcaption class="mt-2 text-center text-sm text-gray-500">guidance_scale = 10.5</figcaption>
214
+ </div>
215
+ </div>
216
+
217
+ ### Negative prompt
218
+
219
+ Just like how a prompt guides generation, a *negative prompt* steers the model away from things you don't want the model to generate. This is commonly used to improve overall image quality by removing poor or bad image features such as "low resolution" or "bad details". You can also use a negative prompt to remove or modify the content and style of an image.
220
+
221
+ ```py
222
+ from diffusers import AutoPipelineForText2Image
223
+ import torch
224
+
225
+ pipeline = AutoPipelineForText2Image.from_pretrained(
226
+ "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16
227
+ ).to("cuda")
228
+ image = pipeline(
229
+ prompt="Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
230
+ negative_prompt="ugly, deformed, disfigured, poor details, bad anatomy",
231
+ ).images[0]
232
+ image
233
+ ```
234
+
235
+ <div class="flex flex-row gap-4">
236
+ <div class="flex-1">
237
+ <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-neg-prompt-1.png"/>
238
+ <figcaption class="mt-2 text-center text-sm text-gray-500">negative_prompt = "ugly, deformed, disfigured, poor details, bad anatomy"</figcaption>
239
+ </div>
240
+ <div class="flex-1">
241
+ <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/text2img-neg-prompt-2.png"/>
242
+ <figcaption class="mt-2 text-center text-sm text-gray-500">negative_prompt = "astronaut"</figcaption>
243
+ </div>
244
+ </div>
245
+
246
+ ### Generator
247
+
248
+ A [`torch.Generator`](https://pytorch.org/docs/stable/generated/torch.Generator.html#generator) object enables reproducibility in a pipeline by setting a manual seed. You can use a `Generator` to generate batches of images and iteratively improve on an image generated from a seed as detailed in the [Improve image quality with deterministic generation](reusing_seeds) guide.
249
+
250
+ You can set a seed and `Generator` as shown below. Creating an image with a `Generator` should return the same result each time instead of randomly generating a new image.
251
+
252
+ ```py
253
+ from diffusers import AutoPipelineForText2Image
254
+ import torch
255
+
256
+ pipeline = AutoPipelineForText2Image.from_pretrained(
257
+ "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16
258
+ ).to("cuda")
259
+ generator = torch.Generator(device="cuda").manual_seed(30)
260
+ image = pipeline(
261
+ "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k",
262
+ generator=generator,
263
+ ).images[0]
264
+ image
265
+ ```
266
+
267
+ ## Control image generation
268
+
269
+ There are several ways to exert more control over how an image is generated outside of configuring a pipeline's parameters, such as prompt weighting and ControlNet models.
270
+
271
+ ### Prompt weighting
272
+
273
+ Prompt weighting is a technique for increasing or decreasing the importance of concepts in a prompt to emphasize or minimize certain features in an image. We recommend using the [Compel](https://github.com/damian0815/compel) library to help you generate the weighted prompt embeddings.
274
+
275
+ <Tip>
276
+
277
+ Learn how to create the prompt embeddings in the [Prompt weighting](weighted_prompts) guide. This example focuses on how to use the prompt embeddings in the pipeline.
278
+
279
+ </Tip>
280
+
281
+ Once you've created the embeddings, you can pass them to the `prompt_embeds` (and `negative_prompt_embeds` if you're using a negative prompt) parameter in the pipeline.
282
+
283
+ ```py
284
+ from diffusers import AutoPipelineForText2Image
285
+ import torch
286
+
287
+ pipeline = AutoPipelineForText2Image.from_pretrained(
288
+ "stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16
289
+ ).to("cuda")
290
+ image = pipeline(
291
+ prompt_embeds=prompt_embeds, # generated from Compel
292
+ negative_prompt_embeds=negative_prompt_embeds, # generated from Compel
293
+ ).images[0]
294
+ ```
295
+
296
+ ### ControlNet
297
+
298
+ As you saw in the [ControlNet](#controlnet) section, these models offer a more flexible and accurate way to generate images by incorporating an additional conditioning image input. Each ControlNet model is pretrained on a particular type of conditioning image to generate new images that resemble it. For example, if you take a ControlNet model pretrained on depth maps, you can give the model a depth map as a conditioning input and it'll generate an image that preserves the spatial information in it. This is quicker and easier than specifying the depth information in a prompt. You can even combine multiple conditioning inputs with a [MultiControlNet](controlnet#multicontrolnet)!
299
+
300
+ There are many types of conditioning inputs you can use, and 🤗 Diffusers supports ControlNet for Stable Diffusion and SDXL models. Take a look at the more comprehensive [ControlNet](controlnet) guide to learn how you can use these models.
301
+
302
+ ## Optimize
303
+
304
+ Diffusion models are large, and the iterative nature of denoising an image is computationally expensive and intensive. But this doesn't mean you need access to powerful - or even many - GPUs to use them. There are many optimization techniques for running diffusion models on consumer and free-tier resources. For example, you can load model weights in half-precision to save GPU memory and increase speed or offload the entire model to the GPU to save even more memory.
305
+
306
+ PyTorch 2.0 also supports a more memory-efficient attention mechanism called [*scaled dot product attention*](../optimization/fp16#scaled-dot-product-attention) that is automatically enabled if you're using PyTorch 2.0. You can combine this with [`torch.compile`](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html) to speed your code up even more:
307
+
308
+ ```py
309
+ from diffusers import AutoPipelineForText2Image
310
+ import torch
311
+
312
+ pipeline = AutoPipelineForText2Image.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16, variant="fp16").to("cuda")
313
+ pipeline.unet = torch.compile(pipeline.unet, mode="reduce-overhead", fullgraph=True)
314
+ ```
315
+
316
+ For more tips on how to optimize your code to save memory and speed up inference, read the [Accelerate inference](../optimization/fp16) and [Reduce memory usage](../optimization/memory) guides.
diffusers/docs/source/en/using-diffusers/consisid.md ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2025 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+ # ConsisID
13
+
14
+ [ConsisID](https://github.com/PKU-YuanGroup/ConsisID) is an identity-preserving text-to-video generation model that keeps the face consistent in the generated video by frequency decomposition. The main features of ConsisID are:
15
+
16
+ - Frequency decomposition: The characteristics of the DiT architecture are analyzed from the frequency domain perspective, and based on these characteristics, a reasonable control information injection method is designed.
17
+ - Consistency training strategy: A coarse-to-fine training strategy, dynamic masking loss, and dynamic cross-face loss further enhance the model's generalization ability and identity preservation performance.
18
+ - Inference without finetuning: Previous methods required case-by-case finetuning of the input ID before inference, leading to significant time and computational costs. In contrast, ConsisID is tuning-free.
19
+
20
+ This guide will walk you through using ConsisID for use cases.
21
+
22
+ ## Load Model Checkpoints
23
+
24
+ Model weights may be stored in separate subfolders on the Hub or locally, in which case, you should use the [`~DiffusionPipeline.from_pretrained`] method.
25
+
26
+ ```python
27
+ # !pip install consisid_eva_clip insightface facexlib
28
+ import torch
29
+ from diffusers import ConsisIDPipeline
30
+ from diffusers.pipelines.consisid.consisid_utils import prepare_face_models, process_face_embeddings_infer
31
+ from huggingface_hub import snapshot_download
32
+
33
+ # Download ckpts
34
+ snapshot_download(repo_id="BestWishYsh/ConsisID-preview", local_dir="BestWishYsh/ConsisID-preview")
35
+
36
+ # Load face helper model to preprocess input face image
37
+ face_helper_1, face_helper_2, face_clip_model, face_main_model, eva_transform_mean, eva_transform_std = prepare_face_models("BestWishYsh/ConsisID-preview", device="cuda", dtype=torch.bfloat16)
38
+
39
+ # Load consisid base model
40
+ pipe = ConsisIDPipeline.from_pretrained("BestWishYsh/ConsisID-preview", torch_dtype=torch.bfloat16)
41
+ pipe.to("cuda")
42
+ ```
43
+
44
+ ## Identity-Preserving Text-to-Video
45
+
46
+ For identity-preserving text-to-video, pass a text prompt and an image contain clear face (e.g., preferably half-body or full-body). By default, ConsisID generates a 720x480 video for the best results.
47
+
48
+ ```python
49
+ from diffusers.utils import export_to_video
50
+
51
+ prompt = "The video captures a boy walking along a city street, filmed in black and white on a classic 35mm camera. His expression is thoughtful, his brow slightly furrowed as if he's lost in contemplation. The film grain adds a textured, timeless quality to the image, evoking a sense of nostalgia. Around him, the cityscape is filled with vintage buildings, cobblestone sidewalks, and softly blurred figures passing by, their outlines faint and indistinct. Streetlights cast a gentle glow, while shadows play across the boy's path, adding depth to the scene. The lighting highlights the boy's subtle smile, hinting at a fleeting moment of curiosity. The overall cinematic atmosphere, complete with classic film still aesthetics and dramatic contrasts, gives the scene an evocative and introspective feel."
52
+ image = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_input.png?download=true"
53
+
54
+ id_cond, id_vit_hidden, image, face_kps = process_face_embeddings_infer(face_helper_1, face_clip_model, face_helper_2, eva_transform_mean, eva_transform_std, face_main_model, "cuda", torch.bfloat16, image, is_align_face=True)
55
+
56
+ video = pipe(image=image, prompt=prompt, num_inference_steps=50, guidance_scale=6.0, use_dynamic_cfg=False, id_vit_hidden=id_vit_hidden, id_cond=id_cond, kps_cond=face_kps, generator=torch.Generator("cuda").manual_seed(42))
57
+ export_to_video(video.frames[0], "output.mp4", fps=8)
58
+ ```
59
+ <table>
60
+ <tr>
61
+ <th style="text-align: center;">Face Image</th>
62
+ <th style="text-align: center;">Video</th>
63
+ <th style="text-align: center;">Description</th
64
+ </tr>
65
+ <tr>
66
+ <td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_image_0.png?download=true" style="height: auto; width: 600px;"></td>
67
+ <td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_output_0.gif?download=true" style="height: auto; width: 2000px;"></td>
68
+ <td>The video, in a beautifully crafted animated style, features a confident woman riding a horse through a lush forest clearing. Her expression is focused yet serene as she adjusts her wide-brimmed hat with a practiced hand. She wears a flowy bohemian dress, which moves gracefully with the rhythm of the horse, the fabric flowing fluidly in the animated motion. The dappled sunlight filters through the trees, casting soft, painterly patterns on the forest floor. Her posture is poised, showing both control and elegance as she guides the horse with ease. The animation's gentle, fluid style adds a dreamlike quality to the scene, with the woman’s calm demeanor and the peaceful surroundings evoking a sense of freedom and harmony.</td>
69
+ </tr>
70
+ <tr>
71
+ <td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_image_1.png?download=true" style="height: auto; width: 600px;"></td>
72
+ <td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_output_1.gif?download=true" style="height: auto; width: 2000px;"></td>
73
+ <td>The video, in a captivating animated style, shows a woman standing in the center of a snowy forest, her eyes narrowed in concentration as she extends her hand forward. She is dressed in a deep blue cloak, her breath visible in the cold air, which is rendered with soft, ethereal strokes. A faint smile plays on her lips as she summons a wisp of ice magic, watching with focus as the surrounding trees and ground begin to shimmer and freeze, covered in delicate ice crystals. The animation’s fluid motion brings the magic to life, with the frost spreading outward in intricate, sparkling patterns. The environment is painted with soft, watercolor-like hues, enhancing the magical, dreamlike atmosphere. The overall mood is serene yet powerful, with the quiet winter air amplifying the delicate beauty of the frozen scene.</td>
74
+ </tr>
75
+ <tr>
76
+ <td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_image_2.png?download=true" style="height: auto; width: 600px;"></td>
77
+ <td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_output_2.gif?download=true" style="height: auto; width: 2000px;"></td>
78
+ <td>The animation features a whimsical portrait of a balloon seller standing in a gentle breeze, captured with soft, hazy brushstrokes that evoke the feel of a serene spring day. His face is framed by a gentle smile, his eyes squinting slightly against the sun, while a few wisps of hair flutter in the wind. He is dressed in a light, pastel-colored shirt, and the balloons around him sway with the wind, adding a sense of playfulness to the scene. The background blurs softly, with hints of a vibrant market or park, enhancing the light-hearted, yet tender mood of the moment.</td>
79
+ </tr>
80
+ <tr>
81
+ <td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_image_3.png?download=true" style="height: auto; width: 600px;"></td>
82
+ <td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_output_3.gif?download=true" style="height: auto; width: 2000px;"></td>
83
+ <td>The video captures a boy walking along a city street, filmed in black and white on a classic 35mm camera. His expression is thoughtful, his brow slightly furrowed as if he's lost in contemplation. The film grain adds a textured, timeless quality to the image, evoking a sense of nostalgia. Around him, the cityscape is filled with vintage buildings, cobblestone sidewalks, and softly blurred figures passing by, their outlines faint and indistinct. Streetlights cast a gentle glow, while shadows play across the boy's path, adding depth to the scene. The lighting highlights the boy's subtle smile, hinting at a fleeting moment of curiosity. The overall cinematic atmosphere, complete with classic film still aesthetics and dramatic contrasts, gives the scene an evocative and introspective feel.</td>
84
+ </tr>
85
+ <tr>
86
+ <td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_image_4.png?download=true" style="height: auto; width: 600px;"></td>
87
+ <td><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/consisid/consisid_output_4.gif?download=true" style="height: auto; width: 2000px;"></td>
88
+ <td>The video features a baby wearing a bright superhero cape, standing confidently with arms raised in a powerful pose. The baby has a determined look on their face, with eyes wide and lips pursed in concentration, as if ready to take on a challenge. The setting appears playful, with colorful toys scattered around and a soft rug underfoot, while sunlight streams through a nearby window, highlighting the fluttering cape and adding to the impression of heroism. The overall atmosphere is lighthearted and fun, with the baby's expressions capturing a mix of innocence and an adorable attempt at bravery, as if truly ready to save the day.</td>
89
+ </tr>
90
+ </table>
91
+
92
+ ## Resources
93
+
94
+ Learn more about ConsisID with the following resources.
95
+ - A [video](https://www.youtube.com/watch?v=PhlgC-bI5SQ) demonstrating ConsisID's main features.
96
+ - The research paper, [Identity-Preserving Text-to-Video Generation by Frequency Decomposition](https://hf.co/papers/2411.17440) for more details.
diffusers/docs/source/en/using-diffusers/controlling_generation.md ADDED
@@ -0,0 +1,217 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2025 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Controlled generation
14
+
15
+ Controlling outputs generated by diffusion models has been long pursued by the community and is now an active research topic. In many popular diffusion models, subtle changes in inputs, both images and text prompts, can drastically change outputs. In an ideal world we want to be able to control how semantics are preserved and changed.
16
+
17
+ Most examples of preserving semantics reduce to being able to accurately map a change in input to a change in output. I.e. adding an adjective to a subject in a prompt preserves the entire image, only modifying the changed subject. Or, image variation of a particular subject preserves the subject's pose.
18
+
19
+ Additionally, there are qualities of generated images that we would like to influence beyond semantic preservation. I.e. in general, we would like our outputs to be of good quality, adhere to a particular style, or be realistic.
20
+
21
+ We will document some of the techniques `diffusers` supports to control generation of diffusion models. Much is cutting edge research and can be quite nuanced. If something needs clarifying or you have a suggestion, don't hesitate to open a discussion on the [forum](https://discuss.huggingface.co/c/discussion-related-to-httpsgithubcomhuggingfacediffusers/63) or a [GitHub issue](https://github.com/huggingface/diffusers/issues).
22
+
23
+ We provide a high level explanation of how the generation can be controlled as well as a snippet of the technicals. For more in depth explanations on the technicals, the original papers which are linked from the pipelines are always the best resources.
24
+
25
+ Depending on the use case, one should choose a technique accordingly. In many cases, these techniques can be combined. For example, one can combine Textual Inversion with SEGA to provide more semantic guidance to the outputs generated using Textual Inversion.
26
+
27
+ Unless otherwise mentioned, these are techniques that work with existing models and don't require their own weights.
28
+
29
+ 1. [InstructPix2Pix](#instruct-pix2pix)
30
+ 2. [Pix2Pix Zero](#pix2pix-zero)
31
+ 3. [Attend and Excite](#attend-and-excite)
32
+ 4. [Semantic Guidance](#semantic-guidance-sega)
33
+ 5. [Self-attention Guidance](#self-attention-guidance-sag)
34
+ 6. [Depth2Image](#depth2image)
35
+ 7. [MultiDiffusion Panorama](#multidiffusion-panorama)
36
+ 8. [DreamBooth](#dreambooth)
37
+ 9. [Textual Inversion](#textual-inversion)
38
+ 10. [ControlNet](#controlnet)
39
+ 11. [Prompt Weighting](#prompt-weighting)
40
+ 12. [Custom Diffusion](#custom-diffusion)
41
+ 13. [Model Editing](#model-editing)
42
+ 14. [DiffEdit](#diffedit)
43
+ 15. [T2I-Adapter](#t2i-adapter)
44
+ 16. [FABRIC](#fabric)
45
+
46
+ For convenience, we provide a table to denote which methods are inference-only and which require fine-tuning/training.
47
+
48
+ | **Method** | **Inference only** | **Requires training /<br> fine-tuning** | **Comments** |
49
+ | :-------------------------------------------------: | :----------------: | :-------------------------------------: | :---------------------------------------------------------------------------------------------: |
50
+ | [InstructPix2Pix](#instruct-pix2pix) | ✅ | ❌ | Can additionally be<br>fine-tuned for better <br>performance on specific <br>edit instructions. |
51
+ | [Pix2Pix Zero](#pix2pix-zero) | ✅ | ❌ | |
52
+ | [Attend and Excite](#attend-and-excite) | ✅ | ❌ | |
53
+ | [Semantic Guidance](#semantic-guidance-sega) | ✅ | ❌ | |
54
+ | [Self-attention Guidance](#self-attention-guidance-sag) | ✅ | ❌ | |
55
+ | [Depth2Image](#depth2image) | ✅ | ❌ | |
56
+ | [MultiDiffusion Panorama](#multidiffusion-panorama) | ✅ | ❌ | |
57
+ | [DreamBooth](#dreambooth) | ❌ | ✅ | |
58
+ | [Textual Inversion](#textual-inversion) | ❌ | ✅ | |
59
+ | [ControlNet](#controlnet) | ✅ | ❌ | A ControlNet can be <br>trained/fine-tuned on<br>a custom conditioning. |
60
+ | [Prompt Weighting](#prompt-weighting) | ✅ | ❌ | |
61
+ | [Custom Diffusion](#custom-diffusion) | ❌ | ✅ | |
62
+ | [Model Editing](#model-editing) | ✅ | ❌ | |
63
+ | [DiffEdit](#diffedit) | ✅ | ❌ | |
64
+ | [T2I-Adapter](#t2i-adapter) | ✅ | ❌ | |
65
+ | [Fabric](#fabric) | ✅ | ❌ | |
66
+ ## InstructPix2Pix
67
+
68
+ [Paper](https://huggingface.co/papers/2211.09800)
69
+
70
+ [InstructPix2Pix](../api/pipelines/pix2pix) is fine-tuned from Stable Diffusion to support editing input images. It takes as inputs an image and a prompt describing an edit, and it outputs the edited image.
71
+ InstructPix2Pix has been explicitly trained to work well with [InstructGPT](https://openai.com/blog/instruction-following/)-like prompts.
72
+
73
+ ## Pix2Pix Zero
74
+
75
+ [Paper](https://huggingface.co/papers/2302.03027)
76
+
77
+ [Pix2Pix Zero](../api/pipelines/pix2pix_zero) allows modifying an image so that one concept or subject is translated to another one while preserving general image semantics.
78
+
79
+ The denoising process is guided from one conceptual embedding towards another conceptual embedding. The intermediate latents are optimized during the denoising process to push the attention maps towards reference attention maps. The reference attention maps are from the denoising process of the input image and are used to encourage semantic preservation.
80
+
81
+ Pix2Pix Zero can be used both to edit synthetic images as well as real images.
82
+
83
+ - To edit synthetic images, one first generates an image given a caption.
84
+ Next, we generate image captions for the concept that shall be edited and for the new target concept. We can use a model like [Flan-T5](https://huggingface.co/docs/transformers/model_doc/flan-t5) for this purpose. Then, "mean" prompt embeddings for both the source and target concepts are created via the text encoder. Finally, the pix2pix-zero algorithm is used to edit the synthetic image.
85
+ - To edit a real image, one first generates an image caption using a model like [BLIP](https://huggingface.co/docs/transformers/model_doc/blip). Then one applies DDIM inversion on the prompt and image to generate "inverse" latents. Similar to before, "mean" prompt embeddings for both source and target concepts are created and finally the pix2pix-zero algorithm in combination with the "inverse" latents is used to edit the image.
86
+
87
+ <Tip>
88
+
89
+ Pix2Pix Zero is the first model that allows "zero-shot" image editing. This means that the model
90
+ can edit an image in less than a minute on a consumer GPU as shown [here](../api/pipelines/pix2pix_zero#usage-example).
91
+
92
+ </Tip>
93
+
94
+ As mentioned above, Pix2Pix Zero includes optimizing the latents (and not any of the UNet, VAE, or the text encoder) to steer the generation toward a specific concept. This means that the overall
95
+ pipeline might require more memory than a standard [StableDiffusionPipeline](../api/pipelines/stable_diffusion/text2img).
96
+
97
+ <Tip>
98
+
99
+ An important distinction between methods like InstructPix2Pix and Pix2Pix Zero is that the former
100
+ involves fine-tuning the pre-trained weights while the latter does not. This means that you can
101
+ apply Pix2Pix Zero to any of the available Stable Diffusion models.
102
+
103
+ </Tip>
104
+
105
+ ## Attend and Excite
106
+
107
+ [Paper](https://huggingface.co/papers/2301.13826)
108
+
109
+ [Attend and Excite](../api/pipelines/attend_and_excite) allows subjects in the prompt to be faithfully represented in the final image.
110
+
111
+ A set of token indices are given as input, corresponding to the subjects in the prompt that need to be present in the image. During denoising, each token index is guaranteed to have a minimum attention threshold for at least one patch of the image. The intermediate latents are iteratively optimized during the denoising process to strengthen the attention of the most neglected subject token until the attention threshold is passed for all subject tokens.
112
+
113
+ Like Pix2Pix Zero, Attend and Excite also involves a mini optimization loop (leaving the pre-trained weights untouched) in its pipeline and can require more memory than the usual [StableDiffusionPipeline](../api/pipelines/stable_diffusion/text2img).
114
+
115
+ ## Semantic Guidance (SEGA)
116
+
117
+ [Paper](https://huggingface.co/papers/2301.12247)
118
+
119
+ [SEGA](../api/pipelines/semantic_stable_diffusion) allows applying or removing one or more concepts from an image. The strength of the concept can also be controlled. I.e. the smile concept can be used to incrementally increase or decrease the smile of a portrait.
120
+
121
+ Similar to how classifier free guidance provides guidance via empty prompt inputs, SEGA provides guidance on conceptual prompts. Multiple of these conceptual prompts can be applied simultaneously. Each conceptual prompt can either add or remove their concept depending on if the guidance is applied positively or negatively.
122
+
123
+ Unlike Pix2Pix Zero or Attend and Excite, SEGA directly interacts with the diffusion process instead of performing any explicit gradient-based optimization.
124
+
125
+ ## Self-attention Guidance (SAG)
126
+
127
+ [Paper](https://huggingface.co/papers/2210.00939)
128
+
129
+ [Self-attention Guidance](../api/pipelines/self_attention_guidance) improves the general quality of images.
130
+
131
+ SAG provides guidance from predictions not conditioned on high-frequency details to fully conditioned images. The high frequency details are extracted out of the UNet self-attention maps.
132
+
133
+ ## Depth2Image
134
+
135
+ [Project](https://huggingface.co/stabilityai/stable-diffusion-2-depth)
136
+
137
+ [Depth2Image](../api/pipelines/stable_diffusion/depth2img) is fine-tuned from Stable Diffusion to better preserve semantics for text guided image variation.
138
+
139
+ It conditions on a monocular depth estimate of the original image.
140
+
141
+ ## MultiDiffusion Panorama
142
+
143
+ [Paper](https://huggingface.co/papers/2302.08113)
144
+
145
+ [MultiDiffusion Panorama](../api/pipelines/panorama) defines a new generation process over a pre-trained diffusion model. This process binds together multiple diffusion generation methods that can be readily applied to generate high quality and diverse images. Results adhere to user-provided controls, such as desired aspect ratio (e.g., panorama), and spatial guiding signals, ranging from tight segmentation masks to bounding boxes.
146
+ MultiDiffusion Panorama allows to generate high-quality images at arbitrary aspect ratios (e.g., panoramas).
147
+
148
+ ## Fine-tuning your own models
149
+
150
+ In addition to pre-trained models, Diffusers has training scripts for fine-tuning models on user-provided data.
151
+
152
+ ## DreamBooth
153
+
154
+ [Project](https://dreambooth.github.io/)
155
+
156
+ [DreamBooth](../training/dreambooth) fine-tunes a model to teach it about a new subject. I.e. a few pictures of a person can be used to generate images of that person in different styles.
157
+
158
+ ## Textual Inversion
159
+
160
+ [Paper](https://huggingface.co/papers/2208.01618)
161
+
162
+ [Textual Inversion](../training/text_inversion) fine-tunes a model to teach it about a new concept. I.e. a few pictures of a style of artwork can be used to generate images in that style.
163
+
164
+ ## ControlNet
165
+
166
+ [Paper](https://huggingface.co/papers/2302.05543)
167
+
168
+ [ControlNet](../api/pipelines/controlnet) is an auxiliary network which adds an extra condition.
169
+ There are 8 canonical pre-trained ControlNets trained on different conditionings such as edge detection, scribbles,
170
+ depth maps, and semantic segmentations.
171
+
172
+ ## Prompt Weighting
173
+
174
+ [Prompt weighting](../using-diffusers/weighted_prompts) is a simple technique that puts more attention weight on certain parts of the text
175
+ input.
176
+
177
+ ## Custom Diffusion
178
+
179
+ [Paper](https://huggingface.co/papers/2212.04488)
180
+
181
+ [Custom Diffusion](../training/custom_diffusion) only fine-tunes the cross-attention maps of a pre-trained
182
+ text-to-image diffusion model. It also allows for additionally performing Textual Inversion. It supports
183
+ multi-concept training by design. Like DreamBooth and Textual Inversion, Custom Diffusion is also used to
184
+ teach a pre-trained text-to-image diffusion model about new concepts to generate outputs involving the
185
+ concept(s) of interest.
186
+
187
+ ## Model Editing
188
+
189
+ [Paper](https://huggingface.co/papers/2303.08084)
190
+
191
+ The [text-to-image model editing pipeline](../api/pipelines/model_editing) helps you mitigate some of the incorrect implicit assumptions a pre-trained text-to-image
192
+ diffusion model might make about the subjects present in the input prompt. For example, if you prompt Stable Diffusion to generate images for "A pack of roses", the roses in the generated images
193
+ are more likely to be red. This pipeline helps you change that assumption.
194
+
195
+ ## DiffEdit
196
+
197
+ [Paper](https://huggingface.co/papers/2210.11427)
198
+
199
+ [DiffEdit](../api/pipelines/diffedit) allows for semantic editing of input images along with
200
+ input prompts while preserving the original input images as much as possible.
201
+
202
+ ## T2I-Adapter
203
+
204
+ [Paper](https://huggingface.co/papers/2302.08453)
205
+
206
+ [T2I-Adapter](../api/pipelines/stable_diffusion/adapter) is an auxiliary network which adds an extra condition.
207
+ There are 8 canonical pre-trained adapters trained on different conditionings such as edge detection, sketch,
208
+ depth maps, and semantic segmentations.
209
+
210
+ ## Fabric
211
+
212
+ [Paper](https://huggingface.co/papers/2307.10159)
213
+
214
+ [Fabric](https://github.com/huggingface/diffusers/tree/442017ccc877279bcf24fbe92f92d3d0def191b6/examples/community#stable-diffusion-fabric-pipeline) is a training-free
215
+ approach applicable to a wide range of popular diffusion models, which exploits
216
+ the self-attention layer present in the most widely used architectures to condition
217
+ the diffusion process on a set of feedback images.
diffusers/docs/source/en/using-diffusers/depth2img.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2025 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Text-guided depth-to-image generation
14
+
15
+ [[open-in-colab]]
16
+
17
+ The [`StableDiffusionDepth2ImgPipeline`] lets you pass a text prompt and an initial image to condition the generation of new images. In addition, you can also pass a `depth_map` to preserve the image structure. If no `depth_map` is provided, the pipeline automatically predicts the depth via an integrated [depth-estimation model](https://github.com/isl-org/MiDaS).
18
+
19
+ Start by creating an instance of the [`StableDiffusionDepth2ImgPipeline`]:
20
+
21
+ ```python
22
+ import torch
23
+ from diffusers import StableDiffusionDepth2ImgPipeline
24
+ from diffusers.utils import load_image, make_image_grid
25
+
26
+ pipeline = StableDiffusionDepth2ImgPipeline.from_pretrained(
27
+ "stabilityai/stable-diffusion-2-depth",
28
+ torch_dtype=torch.float16,
29
+ use_safetensors=True,
30
+ ).to("cuda")
31
+ ```
32
+
33
+ Now pass your prompt to the pipeline. You can also pass a `negative_prompt` to prevent certain words from guiding how an image is generated:
34
+
35
+ ```python
36
+ url = "http://images.cocodataset.org/val2017/000000039769.jpg"
37
+ init_image = load_image(url)
38
+ prompt = "two tigers"
39
+ negative_prompt = "bad, deformed, ugly, bad anatomy"
40
+ image = pipeline(prompt=prompt, image=init_image, negative_prompt=negative_prompt, strength=0.7).images[0]
41
+ make_image_grid([init_image, image], rows=1, cols=2)
42
+ ```
43
+
44
+ | Input | Output |
45
+ |---------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|
46
+ | <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/coco-cats.png" width="500"/> | <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/depth2img-tigers.png" width="500"/> |
diffusers/docs/source/en/using-diffusers/ip_adapter.md ADDED
@@ -0,0 +1,790 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2025 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # IP-Adapter
14
+
15
+ [IP-Adapter](https://huggingface.co/papers/2308.06721) is a lightweight adapter designed to integrate image-based guidance with text-to-image diffusion models. The adapter uses an image encoder to extract image features that are passed to the newly added cross-attention layers in the UNet and fine-tuned. The original UNet model and the existing cross-attention layers corresponding to text features is frozen. Decoupling the cross-attention for image and text features enables more fine-grained and controllable generation.
16
+
17
+ IP-Adapter files are typically ~100MBs because they only contain the image embeddings. This means you need to load a model first, and then load the IP-Adapter with [`~loaders.IPAdapterMixin.load_ip_adapter`].
18
+
19
+ > [!TIP]
20
+ > IP-Adapters are available to many models such as [Flux](../api/pipelines/flux#ip-adapter) and [Stable Diffusion 3](../api/pipelines/stable_diffusion/stable_diffusion_3), and more. The examples in this guide use Stable Diffusion and Stable Diffusion XL.
21
+
22
+ Use the [`~loaders.IPAdapterMixin.set_ip_adapter_scale`] parameter to scale the influence of the IP-Adapter during generation. A value of `1.0` means the model is only conditioned on the image prompt, and `0.5` typically produces balanced results between the text and image prompt.
23
+
24
+ ```py
25
+ import torch
26
+ from diffusers import AutoPipelineForText2Image
27
+ from diffusers.utils import load_image
28
+
29
+ pipeline = AutoPipelineForText2Image.from_pretrained(
30
+ "stabilityai/stable-diffusion-xl-base-1.0",
31
+ torch_dtype=torch.float16
32
+ ).to("cuda")
33
+ pipeline.load_ip_adapter(
34
+ "h94/IP-Adapter",
35
+ subfolder="sdxl_models",
36
+ weight_name="ip-adapter_sdxl.bin"
37
+ )
38
+ pipeline.set_ip_adapter_scale(0.8)
39
+ ```
40
+
41
+ Pass an image to `ip_adapter_image` along with a text prompt to generate an image.
42
+
43
+ ```py
44
+ image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_diner.png")
45
+ pipeline(
46
+ prompt="a polar bear sitting in a chair drinking a milkshake",
47
+ ip_adapter_image=image,
48
+ negative_prompt="deformed, ugly, wrong proportion, low res, bad anatomy, worst quality, low quality",
49
+ ).images[0]
50
+ ```
51
+
52
+ <div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
53
+ <figure>
54
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_diner.png" width="400" alt="IP-Adapter image"/>
55
+ <figcaption style="text-align: center;">IP-Adapter image</figcaption>
56
+ </figure>
57
+ <figure>
58
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_diner_2.png" width="400" alt="generated image"/>
59
+ <figcaption style="text-align: center;">generated image</figcaption>
60
+ </figure>
61
+ </div>
62
+
63
+ Take a look at the examples below to learn how to use IP-Adapter for other tasks.
64
+
65
+ <hfoptions id="usage">
66
+ <hfoption id="image-to-image">
67
+
68
+ ```py
69
+ import torch
70
+ from diffusers import AutoPipelineForImage2Image
71
+ from diffusers.utils import load_image
72
+
73
+ pipeline = AutoPipelineForImage2Image.from_pretrained(
74
+ "stabilityai/stable-diffusion-xl-base-1.0",
75
+ torch_dtype=torch.float16
76
+ ).to("cuda")
77
+ pipeline.load_ip_adapter(
78
+ "h94/IP-Adapter",
79
+ subfolder="sdxl_models",
80
+ weight_name="ip-adapter_sdxl.bin"
81
+ )
82
+ pipeline.set_ip_adapter_scale(0.8)
83
+
84
+ image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_bear_1.png")
85
+ ip_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_gummy.png")
86
+ pipeline(
87
+ prompt="best quality, high quality",
88
+ image=image,
89
+ ip_adapter_image=ip_image,
90
+ strength=0.5,
91
+ ).images[0]
92
+ ```
93
+
94
+ <div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
95
+ <figure>
96
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_bear_1.png" width="300" alt="input image"/>
97
+ <figcaption style="text-align: center;">input image</figcaption>
98
+ </figure>
99
+ <figure>
100
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_gummy.png" width="300" alt="IP-Adapter image"/>
101
+ <figcaption style="text-align: center;">IP-Adapter image</figcaption>
102
+ </figure>
103
+ <figure>
104
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_bear_3.png" width="300" alt="generated image"/>
105
+ <figcaption style="text-align: center;">generated image</figcaption>
106
+ </figure>
107
+ </div>
108
+
109
+ </hfoption>
110
+ <hfoption id="inpainting">
111
+
112
+ ```py
113
+ import torch
114
+ from diffusers import AutoPipelineForImage2Image
115
+ from diffusers.utils import load_image
116
+
117
+ pipeline = AutoPipelineForImage2Image.from_pretrained(
118
+ "stabilityai/stable-diffusion-xl-base-1.0",
119
+ torch_dtype=torch.float16
120
+ ).to("cuda")
121
+ pipeline.load_ip_adapter(
122
+ "h94/IP-Adapter",
123
+ subfolder="sdxl_models",
124
+ weight_name="ip-adapter_sdxl.bin"
125
+ )
126
+ pipeline.set_ip_adapter_scale(0.6)
127
+
128
+ mask_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_mask.png")
129
+ image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_bear_1.png")
130
+ ip_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_gummy.png")
131
+ pipeline(
132
+ prompt="a cute gummy bear waving",
133
+ image=image,
134
+ mask_image=mask_image,
135
+ ip_adapter_image=ip_image,
136
+ ).images[0]
137
+ ```
138
+
139
+ <div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
140
+ <figure>
141
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_bear_1.png" width="300" alt="input image"/>
142
+ <figcaption style="text-align: center;">input image</figcaption>
143
+ </figure>
144
+ <figure>
145
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_gummy.png" width="300" alt="IP-Adapter image"/>
146
+ <figcaption style="text-align: center;">IP-Adapter image</figcaption>
147
+ </figure>
148
+ <figure>
149
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_inpaint.png" width="300" alt="generated image"/>
150
+ <figcaption style="text-align: center;">generated image</figcaption>
151
+ </figure>
152
+ </div>
153
+
154
+ </hfoption>
155
+ <hfoption id="video">
156
+
157
+ The [`~DiffusionPipeline.enable_model_cpu_offload`] method is useful for reducing memory and it should be enabled **after** the IP-Adapter is loaded. Otherwise, the IP-Adapter's image encoder is also offloaded to the CPU and returns an error.
158
+
159
+ ```py
160
+ import torch
161
+ from diffusers import AnimateDiffPipeline, DDIMScheduler, MotionAdapter
162
+ from diffusers.utils import export_to_gif
163
+ from diffusers.utils import load_image
164
+
165
+ adapter = MotionAdapter.from_pretrained(
166
+ "guoyww/animatediff-motion-adapter-v1-5-2",
167
+ torch_dtype=torch.float16
168
+ )
169
+ pipeline = AnimateDiffPipeline.from_pretrained(
170
+ "emilianJR/epiCRealism",
171
+ motion_adapter=adapter,
172
+ torch_dtype=torch.float16
173
+ )
174
+ scheduler = DDIMScheduler.from_pretrained(
175
+ "emilianJR/epiCRealism",
176
+ subfolder="scheduler",
177
+ clip_sample=False,
178
+ timestep_spacing="linspace",
179
+ beta_schedule="linear",
180
+ steps_offset=1,
181
+ )
182
+ pipeline.scheduler = scheduler
183
+ pipeline.enable_vae_slicing()
184
+ pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
185
+ pipeline.enable_model_cpu_offload()
186
+
187
+ ip_adapter_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_inpaint.png")
188
+ pipeline(
189
+ prompt="A cute gummy bear waving",
190
+ negative_prompt="bad quality, worse quality, low resolution",
191
+ ip_adapter_image=ip_adapter_image,
192
+ num_frames=16,
193
+ guidance_scale=7.5,
194
+ num_inference_steps=50,
195
+ ).frames[0]
196
+ ```
197
+
198
+ <div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
199
+ <figure>
200
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_inpaint.png" width="400" alt="IP-Adapter image"/>
201
+ <figcaption style="text-align: center;">IP-Adapter image</figcaption>
202
+ </figure>
203
+ <figure>
204
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/gummy_bear.gif" width="400" alt="generated video"/>
205
+ <figcaption style="text-align: center;">generated video</figcaption>
206
+ </figure>
207
+ </div>
208
+
209
+ </hfoption>
210
+ </hfoptions>
211
+
212
+ ## Model variants
213
+
214
+ There are two variants of IP-Adapter, Plus and FaceID. The Plus variant uses patch embeddings and the ViT-H image encoder. FaceID variant uses face embeddings generated from InsightFace.
215
+
216
+ <hfoptions id="ipadapter-variants">
217
+ <hfoption id="IP-Adapter Plus">
218
+
219
+ ```py
220
+ import torch
221
+ from transformers import CLIPVisionModelWithProjection, AutoPipelineForText2Image
222
+
223
+ image_encoder = CLIPVisionModelWithProjection.from_pretrained(
224
+ "h94/IP-Adapter",
225
+ subfolder="models/image_encoder",
226
+ torch_dtype=torch.float16
227
+ )
228
+
229
+ pipeline = AutoPipelineForText2Image.from_pretrained(
230
+ "stabilityai/stable-diffusion-xl-base-1.0",
231
+ image_encoder=image_encoder,
232
+ torch_dtype=torch.float16
233
+ ).to("cuda")
234
+
235
+ pipeline.load_ip_adapter(
236
+ "h94/IP-Adapter",
237
+ subfolder="sdxl_models",
238
+ weight_name="ip-adapter-plus_sdxl_vit-h.safetensors"
239
+ )
240
+ ```
241
+
242
+ </hfoption>
243
+ <hfoption id="IP-Adapter FaceID">
244
+
245
+ ```py
246
+ import torch
247
+ from transformers import AutoPipelineForText2Image
248
+
249
+ pipeline = AutoPipelineForText2Image.from_pretrained(
250
+ "stabilityai/stable-diffusion-xl-base-1.0",
251
+ torch_dtype=torch.float16
252
+ ).to("cuda")
253
+
254
+ pipeline.load_ip_adapter(
255
+ "h94/IP-Adapter-FaceID",
256
+ subfolder=None,
257
+ weight_name="ip-adapter-faceid_sdxl.bin",
258
+ image_encoder_folder=None
259
+ )
260
+ ```
261
+
262
+ To use a IP-Adapter FaceID Plus model, load the CLIP image encoder as well as [`~transformers.CLIPVisionModelWithProjection`].
263
+
264
+ ```py
265
+ from transformers import AutoPipelineForText2Image, CLIPVisionModelWithProjection
266
+
267
+ image_encoder = CLIPVisionModelWithProjection.from_pretrained(
268
+ "laion/CLIP-ViT-H-14-laion2B-s32B-b79K",
269
+ torch_dtype=torch.float16,
270
+ )
271
+
272
+ pipeline = AutoPipelineForText2Image.from_pretrained(
273
+ "stable-diffusion-v1-5/stable-diffusion-v1-5",
274
+ image_encoder=image_encoder,
275
+ torch_dtype=torch.float16
276
+ ).to("cuda")
277
+
278
+ pipeline.load_ip_adapter(
279
+ "h94/IP-Adapter-FaceID",
280
+ subfolder=None,
281
+ weight_name="ip-adapter-faceid-plus_sd15.bin"
282
+ )
283
+ ```
284
+
285
+ </hfoption>
286
+ </hfoptions>
287
+
288
+ ## Image embeddings
289
+
290
+ The `prepare_ip_adapter_image_embeds` generates image embeddings you can reuse if you're running the pipeline multiple times because you have more than one image. Loading and encoding multiple images each time you use the pipeline can be inefficient. Precomputing the image embeddings ahead of time, saving them to disk, and loading them when you need them is more efficient.
291
+
292
+ ```py
293
+ import torch
294
+ from diffusers import AutoPipelineForText2Image
295
+
296
+ pipeline = AutoPipelineForImage2Image.from_pretrained(
297
+ "stabilityai/stable-diffusion-xl-base-1.0",
298
+ torch_dtype=torch.float16
299
+ ).to("cuda")
300
+
301
+ image_embeds = pipeline.prepare_ip_adapter_image_embeds(
302
+ ip_adapter_image=image,
303
+ ip_adapter_image_embeds=None,
304
+ device="cuda",
305
+ num_images_per_prompt=1,
306
+ do_classifier_free_guidance=True,
307
+ )
308
+
309
+ torch.save(image_embeds, "image_embeds.ipadpt")
310
+ ```
311
+
312
+ Reload the image embeddings by passing them to the `ip_adapter_image_embeds` parameter. Set `image_encoder_folder` to `None` because you don't need the image encoder anymore to generate the image embeddings.
313
+
314
+ > [!TIP]
315
+ > You can also load image embeddings from other sources such as ComfyUI.
316
+
317
+ ```py
318
+ pipeline.load_ip_adapter(
319
+ "h94/IP-Adapter",
320
+ subfolder="sdxl_models",
321
+ image_encoder_folder=None,
322
+ weight_name="ip-adapter_sdxl.bin"
323
+ )
324
+ pipeline.set_ip_adapter_scale(0.8)
325
+ image_embeds = torch.load("image_embeds.ipadpt")
326
+ pipeline(
327
+ prompt="a polar bear sitting in a chair drinking a milkshake",
328
+ ip_adapter_image_embeds=image_embeds,
329
+ negative_prompt="deformed, ugly, wrong proportion, low res, bad anatomy, worst quality, low quality",
330
+ num_inference_steps=100,
331
+ generator=generator,
332
+ ).images[0]
333
+ ```
334
+
335
+ ## Masking
336
+
337
+ Binary masking enables assigning an IP-Adapter image to a specific area of the output image, making it useful for composing multiple IP-Adapter images. Each IP-Adapter image requires a binary mask.
338
+
339
+ Load the [`~image_processor.IPAdapterMaskProcessor`] to preprocess the image masks. For the best results, provide the output `height` and `width` to ensure masks with different aspect ratios are appropriately sized. If the input masks already match the aspect ratio of the generated image, you don't need to set the `height` and `width`.
340
+
341
+ ```py
342
+ import torch
343
+ from diffusers import AutoPipelineForText2Image
344
+ from diffusers.image_processor import IPAdapterMaskProcessor
345
+ from diffusers.utils import load_image
346
+
347
+ pipeline = AutoPipelineForImage2Image.from_pretrained(
348
+ "stabilityai/stable-diffusion-xl-base-1.0",
349
+ torch_dtype=torch.float16
350
+ ).to("cuda")
351
+
352
+ mask1 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_mask1.png")
353
+ mask2 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_mask2.png")
354
+
355
+ processor = IPAdapterMaskProcessor()
356
+ masks = processor.preprocess([mask1, mask2], height=1024, width=1024)
357
+ ```
358
+
359
+ <div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
360
+ <figure>
361
+ <img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_mask_mask1.png" width="200" alt="mask 1"/>
362
+ <figcaption style="text-align: center;">mask 1</figcaption>
363
+ </figure>
364
+ <figure>
365
+ <img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_mask_mask2.png" width="200" alt="mask 2"/>
366
+ <figcaption style="text-align: center;">mask 2</figcaption>
367
+ </figure>
368
+ </div>
369
+
370
+ Provide both the IP-Adapter images and their scales as a list. Pass the preprocessed masks to `cross_attention_kwargs` in the pipeline.
371
+
372
+ ```py
373
+ face_image1 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl1.png")
374
+ face_image2 = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl2.png")
375
+
376
+ pipeline.load_ip_adapter(
377
+ "h94/IP-Adapter",
378
+ subfolder="sdxl_models",
379
+ weight_name=["ip-adapter-plus-face_sdxl_vit-h.safetensors"]
380
+ )
381
+ pipeline.set_ip_adapter_scale([[0.7, 0.7]])
382
+
383
+ ip_images = [[face_image1, face_image2]]
384
+ masks = [masks.reshape(1, masks.shape[0], masks.shape[2], masks.shape[3])]
385
+
386
+ pipeline(
387
+ prompt="2 girls",
388
+ ip_adapter_image=ip_images,
389
+ negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
390
+ cross_attention_kwargs={"ip_adapter_masks": masks}
391
+ ).images[0]
392
+ ```
393
+
394
+ <div style="display: flex; flex-direction: column; gap: 10px;">
395
+ <div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
396
+ <figure>
397
+ <img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_mask_girl1.png" width="400" alt="IP-Adapter image 1"/>
398
+ <figcaption style="text-align: center;">IP-Adapter image 1</figcaption>
399
+ </figure>
400
+ <figure>
401
+ <img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_mask_girl2.png" width="400" alt="IP-Adapter image 2"/>
402
+ <figcaption style="text-align: center;">IP-Adapter image 2</figcaption>
403
+ </figure>
404
+ </div>
405
+ <div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
406
+ <figure>
407
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_attention_mask_result_seed_0.png" width="400" alt="Generated image with mask"/>
408
+ <figcaption style="text-align: center;">generated with mask</figcaption>
409
+ </figure>
410
+ <figure>
411
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_no_attention_mask_result_seed_0.png" width="400" alt="Generated image without mask"/>
412
+ <figcaption style="text-align: center;">generated without mask</figcaption>
413
+ </figure>
414
+ </div>
415
+ </div>
416
+
417
+ ## Applications
418
+
419
+ The section below covers some popular applications of IP-Adapter.
420
+
421
+ ### Face models
422
+
423
+ Face generation and preserving its details can be challenging. To help generate more accurate faces, there are checkpoints specifically conditioned on images of cropped faces. You can find the face models in the [h94/IP-Adapter](https://huggingface.co/h94/IP-Adapter) repository or the [h94/IP-Adapter-FaceID](https://huggingface.co/h94/IP-Adapter-FaceID) repository. The FaceID checkpoints use the FaceID embeddings from [InsightFace](https://github.com/deepinsight/insightface) instead of CLIP image embeddings.
424
+
425
+ We recommend using the [`DDIMScheduler`] or [`EulerDiscreteScheduler`] for face models.
426
+
427
+ <hfoptions id="usage">
428
+ <hfoption id="h94/IP-Adapter">
429
+
430
+ ```py
431
+ import torch
432
+ from diffusers import StableDiffusionPipeline, DDIMScheduler
433
+ from diffusers.utils import load_image
434
+
435
+ pipeline = StableDiffusionPipeline.from_pretrained(
436
+ "stable-diffusion-v1-5/stable-diffusion-v1-5",
437
+ torch_dtype=torch.float16,
438
+ ).to("cuda")
439
+ pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
440
+ pipeline.load_ip_adapter(
441
+ "h94/IP-Adapter",
442
+ subfolder="models",
443
+ weight_name="ip-adapter-full-face_sd15.bin"
444
+ )
445
+
446
+ pipeline.set_ip_adapter_scale(0.5)
447
+ image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_einstein_base.png")
448
+
449
+ pipeline(
450
+ prompt="A photo of Einstein as a chef, wearing an apron, cooking in a French restaurant",
451
+ ip_adapter_image=image,
452
+ negative_prompt="lowres, bad anatomy, worst quality, low quality",
453
+ num_inference_steps=100,
454
+ ).images[0]
455
+ ```
456
+
457
+ <div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
458
+ <figure>
459
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_einstein_base.png" width="400" alt="IP-Adapter image"/>
460
+ <figcaption style="text-align: center;">IP-Adapter image</figcaption>
461
+ </figure>
462
+ <figure>
463
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_einstein.png" width="400" alt="generated image"/>
464
+ <figcaption style="text-align: center;">generated image</figcaption>
465
+ </figure>
466
+ </div>
467
+
468
+ </hfoption>
469
+ <hfoption id="h94/IP-Adapter-FaceID">
470
+
471
+ For FaceID models, extract the face embeddings and pass them as a list of tensors to `ip_adapter_image_embeds`.
472
+
473
+ ```py
474
+ # pip install insightface
475
+ import torch
476
+ from diffusers import StableDiffusionPipeline, DDIMScheduler
477
+ from diffusers.utils import load_image
478
+ from insightface.app import FaceAnalysis
479
+
480
+ pipeline = StableDiffusionPipeline.from_pretrained(
481
+ "stable-diffusion-v1-5/stable-diffusion-v1-5",
482
+ torch_dtype=torch.float16,
483
+ ).to("cuda")
484
+ pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
485
+ pipeline.load_ip_adapter(
486
+ "h94/IP-Adapter-FaceID",
487
+ subfolder=None,
488
+ weight_name="ip-adapter-faceid_sd15.bin",
489
+ image_encoder_folder=None
490
+ )
491
+ pipeline.set_ip_adapter_scale(0.6)
492
+
493
+ image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_mask_girl1.png")
494
+
495
+ ref_images_embeds = []
496
+ app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
497
+ app.prepare(ctx_id=0, det_size=(640, 640))
498
+ image = cv2.cvtColor(np.asarray(image), cv2.COLOR_BGR2RGB)
499
+ faces = app.get(image)
500
+ image = torch.from_numpy(faces[0].normed_embedding)
501
+ ref_images_embeds.append(image.unsqueeze(0))
502
+ ref_images_embeds = torch.stack(ref_images_embeds, dim=0).unsqueeze(0)
503
+ neg_ref_images_embeds = torch.zeros_like(ref_images_embeds)
504
+ id_embeds = torch.cat([neg_ref_images_embeds, ref_images_embeds]).to(dtype=torch.float16, device="cuda")
505
+
506
+ pipeline(
507
+ prompt="A photo of a girl",
508
+ ip_adapter_image_embeds=[id_embeds],
509
+ negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
510
+ ).images[0]
511
+ ```
512
+
513
+ The IP-Adapter FaceID Plus and Plus v2 models require CLIP image embeddings. Prepare the face embeddings and then extract and pass the CLIP embeddings to the hidden image projection layers.
514
+
515
+ ```py
516
+ clip_embeds = pipeline.prepare_ip_adapter_image_embeds(
517
+ [ip_adapter_images], None, torch.device("cuda"), num_images, True)[0]
518
+
519
+ pipeline.unet.encoder_hid_proj.image_projection_layers[0].clip_embeds = clip_embeds.to(dtype=torch.float16)
520
+ # set to True if using IP-Adapter FaceID Plus v2
521
+ pipeline.unet.encoder_hid_proj.image_projection_layers[0].shortcut = False
522
+ ```
523
+
524
+ </hfoption>
525
+ </hfoptions>
526
+
527
+ ### Multiple IP-Adapters
528
+
529
+ Combine multiple IP-Adapters to generate images in more diverse styles. For example, you can use IP-Adapter Face to generate consistent faces and characters and IP-Adapter Plus to generate those faces in specific styles.
530
+
531
+ Load an image encoder with [`~transformers.CLIPVisionModelWithProjection`].
532
+
533
+ ```py
534
+ import torch
535
+ from diffusers import AutoPipelineForText2Image, DDIMScheduler
536
+ from transformers import CLIPVisionModelWithProjection
537
+ from diffusers.utils import load_image
538
+
539
+ image_encoder = CLIPVisionModelWithProjection.from_pretrained(
540
+ "h94/IP-Adapter",
541
+ subfolder="models/image_encoder",
542
+ torch_dtype=torch.float16,
543
+ )
544
+ ```
545
+
546
+ Load a base model, scheduler and the following IP-Adapters.
547
+
548
+ - [ip-adapter-plus_sdxl_vit-h](https://huggingface.co/h94/IP-Adapter#ip-adapter-for-sdxl-10) uses patch embeddings and a ViT-H image encoder
549
+ - [ip-adapter-plus-face_sdxl_vit-h](https://huggingface.co/h94/IP-Adapter#ip-adapter-for-sdxl-10) uses patch embeddings and a ViT-H image encoder but it is conditioned on images of cropped faces
550
+
551
+ ```py
552
+ pipeline = AutoPipelineForText2Image.from_pretrained(
553
+ "stabilityai/stable-diffusion-xl-base-1.0",
554
+ torch_dtype=torch.float16,
555
+ image_encoder=image_encoder,
556
+ )
557
+ pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
558
+ pipeline.load_ip_adapter(
559
+ "h94/IP-Adapter",
560
+ subfolder="sdxl_models",
561
+ weight_name=["ip-adapter-plus_sdxl_vit-h.safetensors", "ip-adapter-plus-face_sdxl_vit-h.safetensors"]
562
+ )
563
+ pipeline.set_ip_adapter_scale([0.7, 0.3])
564
+ # enable_model_cpu_offload to reduce memory usage
565
+ pipeline.enable_model_cpu_offload()
566
+ ```
567
+
568
+ Load an image and a folder containing images of a certain style to apply.
569
+
570
+ ```py
571
+ face_image = load_image("https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png")
572
+ style_folder = "https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/style_ziggy"
573
+ style_images = [load_image(f"{style_folder}/img{i}.png") for i in range(10)]
574
+ ```
575
+
576
+ <div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
577
+ <figure>
578
+ <img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/women_input.png" width="400" alt="Face image"/>
579
+ <figcaption style="text-align: center;">face image</figcaption>
580
+ </figure>
581
+ <figure>
582
+ <img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_style_grid.png" width="400" alt="Style images"/>
583
+ <figcaption style="text-align: center;">style images</figcaption>
584
+ </figure>
585
+ </div>
586
+
587
+ Pass style and face images as a list to `ip_adapter_image`.
588
+
589
+ ```py
590
+ generator = torch.Generator(device="cpu").manual_seed(0)
591
+
592
+ pipeline(
593
+ prompt="wonderwoman",
594
+ ip_adapter_image=[style_images, face_image],
595
+ negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
596
+ ).images[0]
597
+ ```
598
+
599
+ <div style="display: flex; justify-content: center;">
600
+ <figure>
601
+ <img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ip_multi_out.png" width="400" alt="Generated image"/>
602
+ <figcaption style="text-align: center;">generated image</figcaption>
603
+ </figure>
604
+ </div>
605
+
606
+ ### Instant generation
607
+
608
+ [Latent Consistency Models (LCM)](../api/pipelines/latent_consistency_models) can generate images 4 steps or less, unlike other diffusion models which require a lot more steps, making it feel "instantaneous". IP-Adapters are compatible with LCM models to instantly generate images.
609
+
610
+ Load the IP-Adapter weights and load the LoRA weights with [`~loaders.StableDiffusionLoraLoaderMixin.load_lora_weights`].
611
+
612
+ ```py
613
+ import torch
614
+ from diffusers import DiffusionPipeline, LCMScheduler
615
+ from diffusers.utils import load_image
616
+
617
+ pipeline = DiffusionPipeline.from_pretrained(
618
+ "sd-dreambooth-library/herge-style",
619
+ torch_dtype=torch.float16
620
+ )
621
+
622
+ pipeline.load_ip_adapter(
623
+ "h94/IP-Adapter",
624
+ subfolder="models",
625
+ weight_name="ip-adapter_sd15.bin"
626
+ )
627
+ pipeline.load_lora_weights("latent-consistency/lcm-lora-sdv1-5")
628
+ pipeline.scheduler = LCMScheduler.from_config(pipeline.scheduler.config)
629
+ # enable_model_cpu_offload to reduce memory usage
630
+ pipeline.enable_model_cpu_offload()
631
+ ```
632
+
633
+ Try using a lower IP-Adapter scale to condition generation more on the style you want to apply and remember to use the special token in your prompt to trigger its generation.
634
+
635
+ ```py
636
+ pipeline.set_ip_adapter_scale(0.4)
637
+
638
+ prompt = "herge_style woman in armor, best quality, high quality"
639
+
640
+ ip_adapter_image = load_image("https://user-images.githubusercontent.com/24734142/266492875-2d50d223-8475-44f0-a7c6-08b51cb53572.png")
641
+ pipeline(
642
+ prompt=prompt,
643
+ ip_adapter_image=ip_adapter_image,
644
+ num_inference_steps=4,
645
+ guidance_scale=1,
646
+ ).images[0]
647
+ ```
648
+
649
+ <div style="display: flex; justify-content: center;">
650
+ <figure>
651
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/ip_adapter_herge.png" width="400" alt="Generated image"/>
652
+ <figcaption style="text-align: center;">generated image</figcaption>
653
+ </figure>
654
+ </div>
655
+
656
+ ### Structural control
657
+
658
+ For structural control, combine IP-Adapter with [ControlNet](../api/pipelines/controlnet) conditioned on depth maps, edge maps, pose estimations, and more.
659
+
660
+ The example below loads a [`ControlNetModel`] checkpoint conditioned on depth maps and combines it with a IP-Adapter.
661
+
662
+ ```py
663
+ import torch
664
+ from diffusers.utils import load_image
665
+ from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
666
+
667
+ controlnet = ControlNetModel.from_pretrained(
668
+ "lllyasviel/control_v11f1p_sd15_depth",
669
+ torch_dtype=torch.float16
670
+ )
671
+
672
+ pipeline = StableDiffusionControlNetPipeline.from_pretrained(
673
+ "stable-diffusion-v1-5/stable-diffusion-v1-5",
674
+ controlnet=controlnet,
675
+ torch_dtype=torch.float16
676
+ ).to("cuda")
677
+ pipeline.load_ip_adapter(
678
+ "h94/IP-Adapter",
679
+ subfolder="models",
680
+ weight_name="ip-adapter_sd15.bin"
681
+ )
682
+ ```
683
+
684
+ Pass the depth map and IP-Adapter image to the pipeline.
685
+
686
+ ```py
687
+ pipeline(
688
+ prompt="best quality, high quality",
689
+ image=depth_map,
690
+ ip_adapter_image=ip_adapter_image,
691
+ negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality",
692
+ ).images[0]
693
+ ```
694
+
695
+ <div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
696
+ <figure>
697
+ <img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/statue.png" width="300" alt="IP-Adapter image"/>
698
+ <figcaption style="text-align: center;">IP-Adapter image</figcaption>
699
+ </figure>
700
+ <figure>
701
+ <img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/depth.png" width="300" alt="Depth map"/>
702
+ <figcaption style="text-align: center;">depth map</figcaption>
703
+ </figure>
704
+ <figure>
705
+ <img src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/ipa-controlnet-out.png" width="300" alt="Generated image"/>
706
+ <figcaption style="text-align: center;">generated image</figcaption>
707
+ </figure>
708
+ </div>
709
+
710
+ ### Style and layout control
711
+
712
+ For style and layout control, combine IP-Adapter with [InstantStyle](https://huggingface.co/papers/2404.02733). InstantStyle separates *style* (color, texture, overall feel) and *content* from each other. It only applies the style in style-specific blocks of the model to prevent it from distorting other areas of an image. This generates images with stronger and more consistent styles and better control over the layout.
713
+
714
+ The IP-Adapter is only activated for specific parts of the model. Use the [`~loaders.IPAdapterMixin.set_ip_adapter_scale`] method to scale the influence of the IP-Adapter in different layers. The example below activates the IP-Adapter in the second layer of the models down `block_2` and up `block_0`. Down `block_2` is where the IP-Adapter injects layout information and up `block_0` is where style is injected.
715
+
716
+ ```py
717
+ import torch
718
+ from diffusers import AutoPipelineForText2Image
719
+ from diffusers.utils import load_image
720
+
721
+ pipeline = AutoPipelineForText2Image.from_pretrained(
722
+ "stabilityai/stable-diffusion-xl-base-1.0",
723
+ torch_dtype=torch.float16
724
+ ).to("cuda")
725
+ pipeline.load_ip_adapter(
726
+ "h94/IP-Adapter",
727
+ subfolder="sdxl_models",
728
+ weight_name="ip-adapter_sdxl.bin"
729
+ )
730
+
731
+ scale = {
732
+ "down": {"block_2": [0.0, 1.0]},
733
+ "up": {"block_0": [0.0, 1.0, 0.0]},
734
+ }
735
+ pipeline.set_ip_adapter_scale(scale)
736
+ ```
737
+
738
+ Load the style image and generate an image.
739
+
740
+ ```py
741
+ style_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg")
742
+
743
+ pipeline(
744
+ prompt="a cat, masterpiece, best quality, high quality",
745
+ ip_adapter_image=style_image,
746
+ negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
747
+ guidance_scale=5,
748
+ ).images[0]
749
+ ```
750
+
751
+ <div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
752
+ <figure>
753
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg" width="400" alt="Style image"/>
754
+ <figcaption style="text-align: center;">style image</figcaption>
755
+ </figure>
756
+ <figure>
757
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/datasets/cat_style_layout.png" width="400" alt="Generated image"/>
758
+ <figcaption style="text-align: center;">generated image</figcaption>
759
+ </figure>
760
+ </div>
761
+
762
+ You can also insert the IP-Adapter in all the model layers. This tends to generate images that focus more on the image prompt and may reduce the diversity of generated images. Only activate the IP-Adapter in up `block_0` or the style layer.
763
+
764
+ > [!TIP]
765
+ > You don't need to specify all the layers in the `scale` dictionary. Layers not included are set to 0, which means the IP-Adapter is disabled.
766
+
767
+ ```py
768
+ scale = {
769
+ "up": {"block_0": [0.0, 1.0, 0.0]},
770
+ }
771
+ pipeline.set_ip_adapter_scale(scale)
772
+
773
+ pipeline(
774
+ prompt="a cat, masterpiece, best quality, high quality",
775
+ ip_adapter_image=style_image,
776
+ negative_prompt="text, watermark, lowres, low quality, worst quality, deformed, glitch, low contrast, noisy, saturation, blurry",
777
+ guidance_scale=5,
778
+ ).images[0]
779
+ ```
780
+
781
+ <div style="display: flex; gap: 10px; justify-content: space-around; align-items: flex-end;">
782
+ <figure>
783
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/datasets/cat_style_only.png" width="400" alt="Generated image (style only)"/>
784
+ <figcaption style="text-align: center;">style-layer generated image</figcaption>
785
+ </figure>
786
+ <figure>
787
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/datasets/cat_ip_adapter.png" width="400" alt="Generated image (IP-Adapter only)"/>
788
+ <figcaption style="text-align: center;">all layers generated image</figcaption>
789
+ </figure>
790
+ </div>
diffusers/docs/source/en/using-diffusers/loading.md ADDED
@@ -0,0 +1,583 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2025 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Load pipelines
14
+
15
+ [[open-in-colab]]
16
+
17
+ Diffusion systems consist of multiple components like parameterized models and schedulers that interact in complex ways. That is why we designed the [`DiffusionPipeline`] to wrap the complexity of the entire diffusion system into an easy-to-use API. At the same time, the [`DiffusionPipeline`] is entirely customizable so you can modify each component to build a diffusion system for your use case.
18
+
19
+ This guide will show you how to load:
20
+
21
+ - pipelines from the Hub and locally
22
+ - different components into a pipeline
23
+ - multiple pipelines without increasing memory usage
24
+ - checkpoint variants such as different floating point types or non-exponential mean averaged (EMA) weights
25
+
26
+ ## Load a pipeline
27
+
28
+ > [!TIP]
29
+ > Skip to the [DiffusionPipeline explained](#diffusionpipeline-explained) section if you're interested in an explanation about how the [`DiffusionPipeline`] class works.
30
+
31
+ There are two ways to load a pipeline for a task:
32
+
33
+ 1. Load the generic [`DiffusionPipeline`] class and allow it to automatically detect the correct pipeline class from the checkpoint.
34
+ 2. Load a specific pipeline class for a specific task.
35
+
36
+ <hfoptions id="pipelines">
37
+ <hfoption id="generic pipeline">
38
+
39
+ The [`DiffusionPipeline`] class is a simple and generic way to load the latest trending diffusion model from the [Hub](https://huggingface.co/models?library=diffusers&sort=trending). It uses the [`~DiffusionPipeline.from_pretrained`] method to automatically detect the correct pipeline class for a task from the checkpoint, downloads and caches all the required configuration and weight files, and returns a pipeline ready for inference.
40
+
41
+ ```python
42
+ from diffusers import DiffusionPipeline
43
+
44
+ pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", use_safetensors=True)
45
+ ```
46
+
47
+ This same checkpoint can also be used for an image-to-image task. The [`DiffusionPipeline`] class can handle any task as long as you provide the appropriate inputs. For example, for an image-to-image task, you need to pass an initial image to the pipeline.
48
+
49
+ ```py
50
+ from diffusers import DiffusionPipeline
51
+
52
+ pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", use_safetensors=True)
53
+
54
+ init_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/img2img-init.png")
55
+ prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
56
+ image = pipeline("Astronaut in a jungle, cold color palette, muted colors, detailed, 8k", image=init_image).images[0]
57
+ ```
58
+
59
+ </hfoption>
60
+ <hfoption id="specific pipeline">
61
+
62
+ Checkpoints can be loaded by their specific pipeline class if you already know it. For example, to load a Stable Diffusion model, use the [`StableDiffusionPipeline`] class.
63
+
64
+ ```python
65
+ from diffusers import StableDiffusionPipeline
66
+
67
+ pipeline = StableDiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", use_safetensors=True)
68
+ ```
69
+
70
+ This same checkpoint may also be used for another task like image-to-image. To differentiate what task you want to use the checkpoint for, you have to use the corresponding task-specific pipeline class. For example, to use the same checkpoint for image-to-image, use the [`StableDiffusionImg2ImgPipeline`] class.
71
+
72
+ ```py
73
+ from diffusers import StableDiffusionImg2ImgPipeline
74
+
75
+ pipeline = StableDiffusionImg2ImgPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", use_safetensors=True)
76
+ ```
77
+
78
+ </hfoption>
79
+ </hfoptions>
80
+
81
+ Use the Space below to gauge a pipeline's memory requirements before you download and load it to see if it runs on your hardware.
82
+
83
+ <div class="block dark:hidden">
84
+ <iframe
85
+ src="https://diffusers-compute-pipeline-size.hf.space?__theme=light"
86
+ width="850"
87
+ height="1600"
88
+ ></iframe>
89
+ </div>
90
+ <div class="hidden dark:block">
91
+ <iframe
92
+ src="https://diffusers-compute-pipeline-size.hf.space?__theme=dark"
93
+ width="850"
94
+ height="1600"
95
+ ></iframe>
96
+ </div>
97
+
98
+ ### Specifying Component-Specific Data Types
99
+
100
+ You can customize the data types for individual sub-models by passing a dictionary to the `torch_dtype` parameter. This allows you to load different components of a pipeline in different floating point precisions. For instance, if you want to load the transformer with `torch.bfloat16` and all other components with `torch.float16`, you can pass a dictionary mapping:
101
+
102
+ ```python
103
+ from diffusers import HunyuanVideoPipeline
104
+ import torch
105
+
106
+ pipe = HunyuanVideoPipeline.from_pretrained(
107
+ "hunyuanvideo-community/HunyuanVideo",
108
+ torch_dtype={"transformer": torch.bfloat16, "default": torch.float16},
109
+ )
110
+ print(pipe.transformer.dtype, pipe.vae.dtype) # (torch.bfloat16, torch.float16)
111
+ ```
112
+
113
+ If a component is not explicitly specified in the dictionary and no `default` is provided, it will be loaded with `torch.float32`.
114
+
115
+ ### Local pipeline
116
+
117
+ To load a pipeline locally, use [git-lfs](https://git-lfs.github.com/) to manually download a checkpoint to your local disk.
118
+
119
+ ```bash
120
+ git-lfs install
121
+ git clone https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5
122
+ ```
123
+
124
+ This creates a local folder, ./stable-diffusion-v1-5, on your disk and you should pass its path to [`~DiffusionPipeline.from_pretrained`].
125
+
126
+ ```python
127
+ from diffusers import DiffusionPipeline
128
+
129
+ stable_diffusion = DiffusionPipeline.from_pretrained("./stable-diffusion-v1-5", use_safetensors=True)
130
+ ```
131
+
132
+ The [`~DiffusionPipeline.from_pretrained`] method won't download files from the Hub when it detects a local path, but this also means it won't download and cache the latest changes to a checkpoint.
133
+
134
+ ## Customize a pipeline
135
+
136
+ You can customize a pipeline by loading different components into it. This is important because you can:
137
+
138
+ - change to a scheduler with faster generation speed or higher generation quality depending on your needs (call the `scheduler.compatibles` method on your pipeline to see compatible schedulers)
139
+ - change a default pipeline component to a newer and better performing one
140
+
141
+ For example, let's customize the default [stabilityai/stable-diffusion-xl-base-1.0](https://hf.co/stabilityai/stable-diffusion-xl-base-1.0) checkpoint with:
142
+
143
+ - The [`HeunDiscreteScheduler`] to generate higher quality images at the expense of slower generation speed. You must pass the `subfolder="scheduler"` parameter in [`~HeunDiscreteScheduler.from_pretrained`] to load the scheduler configuration into the correct [subfolder](https://hf.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main/scheduler) of the pipeline repository.
144
+ - A more stable VAE that runs in fp16.
145
+
146
+ ```py
147
+ from diffusers import StableDiffusionXLPipeline, HeunDiscreteScheduler, AutoencoderKL
148
+ import torch
149
+
150
+ scheduler = HeunDiscreteScheduler.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", subfolder="scheduler")
151
+ vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16, use_safetensors=True)
152
+ ```
153
+
154
+ Now pass the new scheduler and VAE to the [`StableDiffusionXLPipeline`].
155
+
156
+ ```py
157
+ pipeline = StableDiffusionXLPipeline.from_pretrained(
158
+ "stabilityai/stable-diffusion-xl-base-1.0",
159
+ scheduler=scheduler,
160
+ vae=vae,
161
+ torch_dtype=torch.float16,
162
+ variant="fp16",
163
+ use_safetensors=True
164
+ ).to("cuda")
165
+ ```
166
+
167
+ ## Reuse a pipeline
168
+
169
+ When you load multiple pipelines that share the same model components, it makes sense to reuse the shared components instead of reloading everything into memory again, especially if your hardware is memory-constrained. For example:
170
+
171
+ 1. You generated an image with the [`StableDiffusionPipeline`] but you want to improve its quality with the [`StableDiffusionSAGPipeline`]. Both of these pipelines share the same pretrained model, so it'd be a waste of memory to load the same model twice.
172
+ 2. You want to add a model component, like a [`MotionAdapter`](../api/pipelines/animatediff#animatediffpipeline), to [`AnimateDiffPipeline`] which was instantiated from an existing [`StableDiffusionPipeline`]. Again, both pipelines share the same pretrained model, so it'd be a waste of memory to load an entirely new pipeline again.
173
+
174
+ With the [`DiffusionPipeline.from_pipe`] API, you can switch between multiple pipelines to take advantage of their different features without increasing memory-usage. It is similar to turning on and off a feature in your pipeline.
175
+
176
+ > [!TIP]
177
+ > To switch between tasks (rather than features), use the [`~DiffusionPipeline.from_pipe`] method with the [AutoPipeline](../api/pipelines/auto_pipeline) class, which automatically identifies the pipeline class based on the task (learn more in the [AutoPipeline](../tutorials/autopipeline) tutorial).
178
+
179
+ Let's start with a [`StableDiffusionPipeline`] and then reuse the loaded model components to create a [`StableDiffusionSAGPipeline`] to increase generation quality. You'll use the [`StableDiffusionPipeline`] with an [IP-Adapter](./ip_adapter) to generate a bear eating pizza.
180
+
181
+ ```python
182
+ from diffusers import DiffusionPipeline, StableDiffusionSAGPipeline
183
+ import torch
184
+ import gc
185
+ from diffusers.utils import load_image
186
+ from accelerate.utils import compute_module_sizes
187
+
188
+ image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_neg_embed.png")
189
+
190
+ pipe_sd = DiffusionPipeline.from_pretrained("SG161222/Realistic_Vision_V6.0_B1_noVAE", torch_dtype=torch.float16)
191
+ pipe_sd.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
192
+ pipe_sd.set_ip_adapter_scale(0.6)
193
+ pipe_sd.to("cuda")
194
+
195
+ generator = torch.Generator(device="cpu").manual_seed(33)
196
+ out_sd = pipe_sd(
197
+ prompt="bear eats pizza",
198
+ negative_prompt="wrong white balance, dark, sketches,worst quality,low quality",
199
+ ip_adapter_image=image,
200
+ num_inference_steps=50,
201
+ generator=generator,
202
+ ).images[0]
203
+ out_sd
204
+ ```
205
+
206
+ <div class="flex justify-center">
207
+ <img class="rounded-xl" src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/from_pipe_out_sd_0.png"/>
208
+ </div>
209
+
210
+ For reference, you can check how much memory this process consumed.
211
+
212
+ ```python
213
+ def bytes_to_giga_bytes(bytes):
214
+ return bytes / 1024 / 1024 / 1024
215
+ print(f"Max memory allocated: {bytes_to_giga_bytes(torch.cuda.max_memory_allocated())} GB")
216
+ "Max memory allocated: 4.406213283538818 GB"
217
+ ```
218
+
219
+ Now, reuse the same pipeline components from [`StableDiffusionPipeline`] in [`StableDiffusionSAGPipeline`] with the [`~DiffusionPipeline.from_pipe`] method.
220
+
221
+ > [!WARNING]
222
+ > Some pipeline methods may not function properly on new pipelines created with [`~DiffusionPipeline.from_pipe`]. For instance, the [`~DiffusionPipeline.enable_model_cpu_offload`] method installs hooks on the model components based on a unique offloading sequence for each pipeline. If the models are executed in a different order in the new pipeline, the CPU offloading may not work correctly.
223
+ >
224
+ > To ensure everything works as expected, we recommend re-applying a pipeline method on a new pipeline created with [`~DiffusionPipeline.from_pipe`].
225
+
226
+ ```python
227
+ pipe_sag = StableDiffusionSAGPipeline.from_pipe(
228
+ pipe_sd
229
+ )
230
+
231
+ generator = torch.Generator(device="cpu").manual_seed(33)
232
+ out_sag = pipe_sag(
233
+ prompt="bear eats pizza",
234
+ negative_prompt="wrong white balance, dark, sketches,worst quality,low quality",
235
+ ip_adapter_image=image,
236
+ num_inference_steps=50,
237
+ generator=generator,
238
+ guidance_scale=1.0,
239
+ sag_scale=0.75
240
+ ).images[0]
241
+ out_sag
242
+ ```
243
+
244
+ <div class="flex justify-center">
245
+ <img class="rounded-xl" src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/from_pipe_out_sag_1.png"/>
246
+ </div>
247
+
248
+ If you check the memory usage, you'll see it remains the same as before because [`StableDiffusionPipeline`] and [`StableDiffusionSAGPipeline`] are sharing the same pipeline components. This allows you to use them interchangeably without any additional memory overhead.
249
+
250
+ ```py
251
+ print(f"Max memory allocated: {bytes_to_giga_bytes(torch.cuda.max_memory_allocated())} GB")
252
+ "Max memory allocated: 4.406213283538818 GB"
253
+ ```
254
+
255
+ Let's animate the image with the [`AnimateDiffPipeline`] and also add a [`MotionAdapter`] module to the pipeline. For the [`AnimateDiffPipeline`], you need to unload the IP-Adapter first and reload it *after* you've created your new pipeline (this only applies to the [`AnimateDiffPipeline`]).
256
+
257
+ ```py
258
+ from diffusers import AnimateDiffPipeline, MotionAdapter, DDIMScheduler
259
+ from diffusers.utils import export_to_gif
260
+
261
+ pipe_sag.unload_ip_adapter()
262
+ adapter = MotionAdapter.from_pretrained("guoyww/animatediff-motion-adapter-v1-5-2", torch_dtype=torch.float16)
263
+
264
+ pipe_animate = AnimateDiffPipeline.from_pipe(pipe_sd, motion_adapter=adapter)
265
+ pipe_animate.scheduler = DDIMScheduler.from_config(pipe_animate.scheduler.config, beta_schedule="linear")
266
+ # load IP-Adapter and LoRA weights again
267
+ pipe_animate.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
268
+ pipe_animate.load_lora_weights("guoyww/animatediff-motion-lora-zoom-out", adapter_name="zoom-out")
269
+ pipe_animate.to("cuda")
270
+
271
+ generator = torch.Generator(device="cpu").manual_seed(33)
272
+ pipe_animate.set_adapters("zoom-out", adapter_weights=0.75)
273
+ out = pipe_animate(
274
+ prompt="bear eats pizza",
275
+ num_frames=16,
276
+ num_inference_steps=50,
277
+ ip_adapter_image=image,
278
+ generator=generator,
279
+ ).frames[0]
280
+ export_to_gif(out, "out_animate.gif")
281
+ ```
282
+
283
+ <div class="flex justify-center">
284
+ <img class="rounded-xl" src="https://huggingface.co/datasets/YiYiXu/testing-images/resolve/main/from_pipe_out_animate_3.gif"/>
285
+ </div>
286
+
287
+ The [`AnimateDiffPipeline`] is more memory-intensive and consumes 15GB of memory (see the [Memory-usage of from_pipe](#memory-usage-of-from_pipe) section to learn what this means for your memory-usage).
288
+
289
+ ```py
290
+ print(f"Max memory allocated: {bytes_to_giga_bytes(torch.cuda.max_memory_allocated())} GB")
291
+ "Max memory allocated: 15.178664207458496 GB"
292
+ ```
293
+
294
+ ### Modify from_pipe components
295
+
296
+ Pipelines loaded with [`~DiffusionPipeline.from_pipe`] can be customized with different model components or methods. However, whenever you modify the *state* of the model components, it affects all the other pipelines that share the same components. For example, if you call [`~diffusers.loaders.IPAdapterMixin.unload_ip_adapter`] on the [`StableDiffusionSAGPipeline`], you won't be able to use IP-Adapter with the [`StableDiffusionPipeline`] because it's been removed from their shared components.
297
+
298
+ ```py
299
+ pipe.sag_unload_ip_adapter()
300
+
301
+ generator = torch.Generator(device="cpu").manual_seed(33)
302
+ out_sd = pipe_sd(
303
+ prompt="bear eats pizza",
304
+ negative_prompt="wrong white balance, dark, sketches,worst quality,low quality",
305
+ ip_adapter_image=image,
306
+ num_inference_steps=50,
307
+ generator=generator,
308
+ ).images[0]
309
+ "AttributeError: 'NoneType' object has no attribute 'image_projection_layers'"
310
+ ```
311
+
312
+ ### Memory usage of from_pipe
313
+
314
+ The memory requirement of loading multiple pipelines with [`~DiffusionPipeline.from_pipe`] is determined by the pipeline with the highest memory-usage regardless of the number of pipelines you create.
315
+
316
+ | Pipeline | Memory usage (GB) |
317
+ |---|---|
318
+ | StableDiffusionPipeline | 4.400 |
319
+ | StableDiffusionSAGPipeline | 4.400 |
320
+ | AnimateDiffPipeline | 15.178 |
321
+
322
+ The [`AnimateDiffPipeline`] has the highest memory requirement, so the *total memory-usage* is based only on the [`AnimateDiffPipeline`]. Your memory-usage will not increase if you create additional pipelines as long as their memory requirements doesn't exceed that of the [`AnimateDiffPipeline`]. Each pipeline can be used interchangeably without any additional memory overhead.
323
+
324
+ ## Safety checker
325
+
326
+ Diffusers implements a [safety checker](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py) for Stable Diffusion models which can generate harmful content. The safety checker screens the generated output against known hardcoded not-safe-for-work (NSFW) content. If for whatever reason you'd like to disable the safety checker, pass `safety_checker=None` to the [`~DiffusionPipeline.from_pretrained`] method.
327
+
328
+ ```python
329
+ from diffusers import DiffusionPipeline
330
+
331
+ pipeline = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", safety_checker=None, use_safetensors=True)
332
+ """
333
+ You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide by the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend keeping the safety filter enabled in all public-facing circumstances, disabling it only for use cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
334
+ """
335
+ ```
336
+
337
+ ## Checkpoint variants
338
+
339
+ A checkpoint variant is usually a checkpoint whose weights are:
340
+
341
+ - Stored in a different floating point type, such as [torch.float16](https://pytorch.org/docs/stable/tensors.html#data-types), because it only requires half the bandwidth and storage to download. You can't use this variant if you're continuing training or using a CPU.
342
+ - Non-exponential mean averaged (EMA) weights which shouldn't be used for inference. You should use this variant to continue finetuning a model.
343
+
344
+ > [!TIP]
345
+ > When the checkpoints have identical model structures, but they were trained on different datasets and with a different training setup, they should be stored in separate repositories. For example, [stabilityai/stable-diffusion-2](https://hf.co/stabilityai/stable-diffusion-2) and [stabilityai/stable-diffusion-2-1](https://hf.co/stabilityai/stable-diffusion-2-1) are stored in separate repositories.
346
+
347
+ Otherwise, a variant is **identical** to the original checkpoint. They have exactly the same serialization format (like [safetensors](./using_safetensors)), model structure, and their weights have identical tensor shapes.
348
+
349
+ | **checkpoint type** | **weight name** | **argument for loading weights** |
350
+ |---------------------|---------------------------------------------|----------------------------------|
351
+ | original | diffusion_pytorch_model.safetensors | |
352
+ | floating point | diffusion_pytorch_model.fp16.safetensors | `variant`, `torch_dtype` |
353
+ | non-EMA | diffusion_pytorch_model.non_ema.safetensors | `variant` |
354
+
355
+ There are two important arguments for loading variants:
356
+
357
+ - `torch_dtype` specifies the floating point precision of the loaded checkpoint. For example, if you want to save bandwidth by loading a fp16 variant, you should set `variant="fp16"` and `torch_dtype=torch.float16` to *convert the weights* to fp16. Otherwise, the fp16 weights are converted to the default fp32 precision.
358
+
359
+ If you only set `torch_dtype=torch.float16`, the default fp32 weights are downloaded first and then converted to fp16.
360
+
361
+ - `variant` specifies which files should be loaded from the repository. For example, if you want to load a non-EMA variant of a UNet from [stable-diffusion-v1-5/stable-diffusion-v1-5](https://hf.co/stable-diffusion-v1-5/stable-diffusion-v1-5/tree/main/unet), set `variant="non_ema"` to download the `non_ema` file.
362
+
363
+ <hfoptions id="variants">
364
+ <hfoption id="fp16">
365
+
366
+ ```py
367
+ from diffusers import DiffusionPipeline
368
+ import torch
369
+
370
+ pipeline = DiffusionPipeline.from_pretrained(
371
+ "stable-diffusion-v1-5/stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16, use_safetensors=True
372
+ )
373
+ ```
374
+
375
+ </hfoption>
376
+ <hfoption id="non-EMA">
377
+
378
+ ```py
379
+ pipeline = DiffusionPipeline.from_pretrained(
380
+ "stable-diffusion-v1-5/stable-diffusion-v1-5", variant="non_ema", use_safetensors=True
381
+ )
382
+ ```
383
+
384
+ </hfoption>
385
+ </hfoptions>
386
+
387
+ Use the `variant` parameter in the [`DiffusionPipeline.save_pretrained`] method to save a checkpoint as a different floating point type or as a non-EMA variant. You should try save a variant to the same folder as the original checkpoint, so you have the option of loading both from the same folder.
388
+
389
+ <hfoptions id="save">
390
+ <hfoption id="fp16">
391
+
392
+ ```python
393
+ from diffusers import DiffusionPipeline
394
+
395
+ pipeline.save_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", variant="fp16")
396
+ ```
397
+
398
+ </hfoption>
399
+ <hfoption id="non_ema">
400
+
401
+ ```py
402
+ pipeline.save_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", variant="non_ema")
403
+ ```
404
+
405
+ </hfoption>
406
+ </hfoptions>
407
+
408
+ If you don't save the variant to an existing folder, you must specify the `variant` argument otherwise it'll throw an `Exception` because it can't find the original checkpoint.
409
+
410
+ ```python
411
+ # 👎 this won't work
412
+ pipeline = DiffusionPipeline.from_pretrained(
413
+ "./stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True
414
+ )
415
+ # 👍 this works
416
+ pipeline = DiffusionPipeline.from_pretrained(
417
+ "./stable-diffusion-v1-5", variant="fp16", torch_dtype=torch.float16, use_safetensors=True
418
+ )
419
+ ```
420
+
421
+ ## DiffusionPipeline explained
422
+
423
+ As a class method, [`DiffusionPipeline.from_pretrained`] is responsible for two things:
424
+
425
+ - Download the latest version of the folder structure required for inference and cache it. If the latest folder structure is available in the local cache, [`DiffusionPipeline.from_pretrained`] reuses the cache and won't redownload the files.
426
+ - Load the cached weights into the correct pipeline [class](../api/pipelines/overview#diffusers-summary) - retrieved from the `model_index.json` file - and return an instance of it.
427
+
428
+ The pipelines' underlying folder structure corresponds directly with their class instances. For example, the [`StableDiffusionPipeline`] corresponds to the folder structure in [`stable-diffusion-v1-5/stable-diffusion-v1-5`](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5).
429
+
430
+ ```python
431
+ from diffusers import DiffusionPipeline
432
+
433
+ repo_id = "stable-diffusion-v1-5/stable-diffusion-v1-5"
434
+ pipeline = DiffusionPipeline.from_pretrained(repo_id, use_safetensors=True)
435
+ print(pipeline)
436
+ ```
437
+
438
+ You'll see pipeline is an instance of [`StableDiffusionPipeline`], which consists of seven components:
439
+
440
+ - `"feature_extractor"`: a [`~transformers.CLIPImageProcessor`] from 🤗 Transformers.
441
+ - `"safety_checker"`: a [component](https://github.com/huggingface/diffusers/blob/e55687e1e15407f60f32242027b7bb8170e58266/src/diffusers/pipelines/stable_diffusion/safety_checker.py#L32) for screening against harmful content.
442
+ - `"scheduler"`: an instance of [`PNDMScheduler`].
443
+ - `"text_encoder"`: a [`~transformers.CLIPTextModel`] from 🤗 Transformers.
444
+ - `"tokenizer"`: a [`~transformers.CLIPTokenizer`] from 🤗 Transformers.
445
+ - `"unet"`: an instance of [`UNet2DConditionModel`].
446
+ - `"vae"`: an instance of [`AutoencoderKL`].
447
+
448
+ ```json
449
+ StableDiffusionPipeline {
450
+ "feature_extractor": [
451
+ "transformers",
452
+ "CLIPImageProcessor"
453
+ ],
454
+ "safety_checker": [
455
+ "stable_diffusion",
456
+ "StableDiffusionSafetyChecker"
457
+ ],
458
+ "scheduler": [
459
+ "diffusers",
460
+ "PNDMScheduler"
461
+ ],
462
+ "text_encoder": [
463
+ "transformers",
464
+ "CLIPTextModel"
465
+ ],
466
+ "tokenizer": [
467
+ "transformers",
468
+ "CLIPTokenizer"
469
+ ],
470
+ "unet": [
471
+ "diffusers",
472
+ "UNet2DConditionModel"
473
+ ],
474
+ "vae": [
475
+ "diffusers",
476
+ "AutoencoderKL"
477
+ ]
478
+ }
479
+ ```
480
+
481
+ Compare the components of the pipeline instance to the [`stable-diffusion-v1-5/stable-diffusion-v1-5`](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/tree/main) folder structure, and you'll see there is a separate folder for each of the components in the repository:
482
+
483
+ ```
484
+ .
485
+ ├── feature_extractor
486
+ │   └── preprocessor_config.json
487
+ ├── model_index.json
488
+ ├── safety_checker
489
+ │   ├── config.json
490
+ | ├── model.fp16.safetensors
491
+ │ ├── model.safetensors
492
+ │ ├── pytorch_model.bin
493
+ | └── pytorch_model.fp16.bin
494
+ ├── scheduler
495
+ │   └── scheduler_config.json
496
+ ├── text_encoder
497
+ │   ├── config.json
498
+ | ├── model.fp16.safetensors
499
+ │ ├── model.safetensors
500
+ │ |── pytorch_model.bin
501
+ | └── pytorch_model.fp16.bin
502
+ ├── tokenizer
503
+ │   ├── merges.txt
504
+ │   ├── special_tokens_map.json
505
+ │   ├── tokenizer_config.json
506
+ │   └── vocab.json
507
+ ├── unet
508
+ │   ├── config.json
509
+ │   ├── diffusion_pytorch_model.bin
510
+ | |── diffusion_pytorch_model.fp16.bin
511
+ │ |── diffusion_pytorch_model.f16.safetensors
512
+ │ |── diffusion_pytorch_model.non_ema.bin
513
+ │ |── diffusion_pytorch_model.non_ema.safetensors
514
+ │ └── diffusion_pytorch_model.safetensors
515
+ |── vae
516
+ . ├── config.json
517
+ . ├── diffusion_pytorch_model.bin
518
+ ├── diffusion_pytorch_model.fp16.bin
519
+ ├── diffusion_pytorch_model.fp16.safetensors
520
+ └── diffusion_pytorch_model.safetensors
521
+ ```
522
+
523
+ You can access each of the components of the pipeline as an attribute to view its configuration:
524
+
525
+ ```py
526
+ pipeline.tokenizer
527
+ CLIPTokenizer(
528
+ name_or_path="/root/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/39593d5650112b4cc580433f6b0435385882d819/tokenizer",
529
+ vocab_size=49408,
530
+ model_max_length=77,
531
+ is_fast=False,
532
+ padding_side="right",
533
+ truncation_side="right",
534
+ special_tokens={
535
+ "bos_token": AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
536
+ "eos_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
537
+ "unk_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
538
+ "pad_token": "<|endoftext|>",
539
+ },
540
+ clean_up_tokenization_spaces=True
541
+ )
542
+ ```
543
+
544
+ Every pipeline expects a [`model_index.json`](https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/model_index.json) file that tells the [`DiffusionPipeline`]:
545
+
546
+ - which pipeline class to load from `_class_name`
547
+ - which version of 🧨 Diffusers was used to create the model in `_diffusers_version`
548
+ - what components from which library are stored in the subfolders (`name` corresponds to the component and subfolder name, `library` corresponds to the name of the library to load the class from, and `class` corresponds to the class name)
549
+
550
+ ```json
551
+ {
552
+ "_class_name": "StableDiffusionPipeline",
553
+ "_diffusers_version": "0.6.0",
554
+ "feature_extractor": [
555
+ "transformers",
556
+ "CLIPImageProcessor"
557
+ ],
558
+ "safety_checker": [
559
+ "stable_diffusion",
560
+ "StableDiffusionSafetyChecker"
561
+ ],
562
+ "scheduler": [
563
+ "diffusers",
564
+ "PNDMScheduler"
565
+ ],
566
+ "text_encoder": [
567
+ "transformers",
568
+ "CLIPTextModel"
569
+ ],
570
+ "tokenizer": [
571
+ "transformers",
572
+ "CLIPTokenizer"
573
+ ],
574
+ "unet": [
575
+ "diffusers",
576
+ "UNet2DConditionModel"
577
+ ],
578
+ "vae": [
579
+ "diffusers",
580
+ "AutoencoderKL"
581
+ ]
582
+ }
583
+ ```
diffusers/docs/source/en/using-diffusers/other-formats.md ADDED
@@ -0,0 +1,512 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!--Copyright 2025 The HuggingFace Team. All rights reserved.
2
+
3
+ Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+ the License. You may obtain a copy of the License at
5
+
6
+ http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+ Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+ specific language governing permissions and limitations under the License.
11
+ -->
12
+
13
+ # Model files and layouts
14
+
15
+ [[open-in-colab]]
16
+
17
+ Diffusion models are saved in various file types and organized in different layouts. Diffusers stores model weights as safetensors files in *Diffusers-multifolder* layout and it also supports loading files (like safetensors and ckpt files) from a *single-file* layout which is commonly used in the diffusion ecosystem.
18
+
19
+ Each layout has its own benefits and use cases, and this guide will show you how to load the different files and layouts, and how to convert them.
20
+
21
+ ## Files
22
+
23
+ PyTorch model weights are typically saved with Python's [pickle](https://docs.python.org/3/library/pickle.html) utility as ckpt or bin files. However, pickle is not secure and pickled files may contain malicious code that can be executed. This vulnerability is a serious concern given the popularity of model sharing. To address this security issue, the [Safetensors](https://hf.co/docs/safetensors) library was developed as a secure alternative to pickle, which saves models as safetensors files.
24
+
25
+ ### safetensors
26
+
27
+ > [!TIP]
28
+ > Learn more about the design decisions and why safetensor files are preferred for saving and loading model weights in the [Safetensors audited as really safe and becoming the default](https://blog.eleuther.ai/safetensors-security-audit/) blog post.
29
+
30
+ [Safetensors](https://hf.co/docs/safetensors) is a safe and fast file format for securely storing and loading tensors. Safetensors restricts the header size to limit certain types of attacks, supports lazy loading (useful for distributed setups), and has generally faster loading speeds.
31
+
32
+ Make sure you have the [Safetensors](https://hf.co/docs/safetensors) library installed.
33
+
34
+ ```py
35
+ !pip install safetensors
36
+ ```
37
+
38
+ Safetensors stores weights in a safetensors file. Diffusers loads safetensors files by default if they're available and the Safetensors library is installed. There are two ways safetensors files can be organized:
39
+
40
+ 1. Diffusers-multifolder layout: there may be several separate safetensors files, one for each pipeline component (text encoder, UNet, VAE), organized in subfolders (check out the [stable-diffusion-v1-5/stable-diffusion-v1-5](https://hf.co/stable-diffusion-v1-5/stable-diffusion-v1-5/tree/main) repository as an example)
41
+ 2. single-file layout: all the model weights may be saved in a single file (check out the [WarriorMama777/OrangeMixs](https://hf.co/WarriorMama777/OrangeMixs/tree/main/Models/AbyssOrangeMix) repository as an example)
42
+
43
+ <hfoptions id="safetensors">
44
+ <hfoption id="multifolder">
45
+
46
+ Use the [`~DiffusionPipeline.from_pretrained`] method to load a model with safetensors files stored in multiple folders.
47
+
48
+ ```py
49
+ from diffusers import DiffusionPipeline
50
+
51
+ pipeline = DiffusionPipeline.from_pretrained(
52
+ "stable-diffusion-v1-5/stable-diffusion-v1-5",
53
+ use_safetensors=True
54
+ )
55
+ ```
56
+
57
+ </hfoption>
58
+ <hfoption id="single file">
59
+
60
+ Use the [`~loaders.FromSingleFileMixin.from_single_file`] method to load a model with all the weights stored in a single safetensors file.
61
+
62
+ ```py
63
+ from diffusers import StableDiffusionPipeline
64
+
65
+ pipeline = StableDiffusionPipeline.from_single_file(
66
+ "https://huggingface.co/WarriorMama777/OrangeMixs/blob/main/Models/AbyssOrangeMix/AbyssOrangeMix.safetensors"
67
+ )
68
+ ```
69
+
70
+ </hfoption>
71
+ </hfoptions>
72
+
73
+ #### LoRAs
74
+
75
+ [LoRAs](../tutorials/using_peft_for_inference) are lightweight checkpoints fine-tuned to generate images or video in a specific style. If you are using a checkpoint trained with a Diffusers training script, the LoRA configuration is automatically saved as metadata in a safetensors file. When the safetensors file is loaded, the metadata is parsed to correctly configure the LoRA and avoids missing or incorrect LoRA configurations.
76
+
77
+ The easiest way to inspect the metadata, if available, is by clicking on the Safetensors logo next to the weights.
78
+
79
+ <div class="flex justify-center">
80
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/safetensors_lora.png"/>
81
+ </div>
82
+
83
+ For LoRAs that aren't trained with Diffusers, you can still save metadata with the `transformer_lora_adapter_metadata` and `text_encoder_lora_adapter_metadata` arguments in [`~loaders.FluxLoraLoaderMixin.save_lora_weights`] as long as it is a safetensors file.
84
+
85
+ ```py
86
+ import torch
87
+ from diffusers import FluxPipeline
88
+
89
+ pipeline = FluxPipeline.from_pretrained(
90
+ "black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16
91
+ ).to("cuda")
92
+ pipeline.load_lora_weights("linoyts/yarn_art_Flux_LoRA")
93
+ pipeline.save_lora_weights(
94
+ transformer_lora_adapter_metadata={"r": 16, "lora_alpha": 16},
95
+ text_encoder_lora_adapter_metadata={"r": 8, "lora_alpha": 8}
96
+ )
97
+ ```
98
+
99
+ ### ckpt
100
+
101
+ > [!WARNING]
102
+ > Pickled files may be unsafe because they can be exploited to execute malicious code. It is recommended to use safetensors files instead where possible, or convert the weights to safetensors files.
103
+
104
+ PyTorch's [torch.save](https://pytorch.org/docs/stable/generated/torch.save.html) function uses Python's [pickle](https://docs.python.org/3/library/pickle.html) utility to serialize and save models. These files are saved as a ckpt file and they contain the entire model's weights.
105
+
106
+ Use the [`~loaders.FromSingleFileMixin.from_single_file`] method to directly load a ckpt file.
107
+
108
+ ```py
109
+ from diffusers import StableDiffusionPipeline
110
+
111
+ pipeline = StableDiffusionPipeline.from_single_file(
112
+ "https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/v1-5-pruned.ckpt"
113
+ )
114
+ ```
115
+
116
+ ## Storage layout
117
+
118
+ There are two ways model files are organized, either in a Diffusers-multifolder layout or in a single-file layout. The Diffusers-multifolder layout is the default, and each component file (text encoder, UNet, VAE) is stored in a separate subfolder. Diffusers also supports loading models from a single-file layout where all the components are bundled together.
119
+
120
+ ### Diffusers-multifolder
121
+
122
+ The Diffusers-multifolder layout is the default storage layout for Diffusers. Each component's (text encoder, UNet, VAE) weights are stored in a separate subfolder. The weights can be stored as safetensors or ckpt files.
123
+
124
+ <div class="flex flex-row gap-4">
125
+ <div class="flex-1">
126
+ <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/multifolder-layout.png"/>
127
+ <figcaption class="mt-2 text-center text-sm text-gray-500">multifolder layout</figcaption>
128
+ </div>
129
+ <div class="flex-1">
130
+ <img class="rounded-xl" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/multifolder-unet.png"/>
131
+ <figcaption class="mt-2 text-center text-sm text-gray-500">UNet subfolder</figcaption>
132
+ </div>
133
+ </div>
134
+
135
+ To load from Diffusers-multifolder layout, use the [`~DiffusionPipeline.from_pretrained`] method.
136
+
137
+ ```py
138
+ from diffusers import DiffusionPipeline
139
+
140
+ pipeline = DiffusionPipeline.from_pretrained(
141
+ "stabilityai/stable-diffusion-xl-base-1.0",
142
+ torch_dtype=torch.float16,
143
+ variant="fp16",
144
+ use_safetensors=True,
145
+ ).to("cuda")
146
+ ```
147
+
148
+ Benefits of using the Diffusers-multifolder layout include:
149
+
150
+ 1. Faster to load each component file individually or in parallel.
151
+ 2. Reduced memory usage because you only load the components you need. For example, models like [SDXL Turbo](https://hf.co/stabilityai/sdxl-turbo), [SDXL Lightning](https://hf.co/ByteDance/SDXL-Lightning), and [Hyper-SD](https://hf.co/ByteDance/Hyper-SD) have the same components except for the UNet. You can reuse their shared components with the [`~DiffusionPipeline.from_pipe`] method without consuming any additional memory (take a look at the [Reuse a pipeline](./loading#reuse-a-pipeline) guide) and only load the UNet. This way, you don't need to download redundant components and unnecessarily use more memory.
152
+
153
+ ```py
154
+ import torch
155
+ from diffusers import StableDiffusionXLPipeline, UNet2DConditionModel, EulerDiscreteScheduler
156
+
157
+ # download one model
158
+ sdxl_pipeline = StableDiffusionXLPipeline.from_pretrained(
159
+ "stabilityai/stable-diffusion-xl-base-1.0",
160
+ torch_dtype=torch.float16,
161
+ variant="fp16",
162
+ use_safetensors=True,
163
+ ).to("cuda")
164
+
165
+ # switch UNet for another model
166
+ unet = UNet2DConditionModel.from_pretrained(
167
+ "stabilityai/sdxl-turbo",
168
+ subfolder="unet",
169
+ torch_dtype=torch.float16,
170
+ variant="fp16",
171
+ use_safetensors=True
172
+ )
173
+ # reuse all the same components in new model except for the UNet
174
+ turbo_pipeline = StableDiffusionXLPipeline.from_pipe(
175
+ sdxl_pipeline, unet=unet,
176
+ ).to("cuda")
177
+ turbo_pipeline.scheduler = EulerDiscreteScheduler.from_config(
178
+ turbo_pipeline.scheduler.config,
179
+ timestep+spacing="trailing"
180
+ )
181
+ image = turbo_pipeline(
182
+ "an astronaut riding a unicorn on mars",
183
+ num_inference_steps=1,
184
+ guidance_scale=0.0,
185
+ ).images[0]
186
+ image
187
+ ```
188
+
189
+ 3. Reduced storage requirements because if a component, such as the SDXL [VAE](https://hf.co/madebyollin/sdxl-vae-fp16-fix), is shared across multiple models, you only need to download and store a single copy of it instead of downloading and storing it multiple times. For 10 SDXL models, this can save ~3.5GB of storage. The storage savings is even greater for newer models like PixArt Sigma, where the [text encoder](https://hf.co/PixArt-alpha/PixArt-Sigma-XL-2-1024-MS/tree/main/text_encoder) alone is ~19GB!
190
+ 4. Flexibility to replace a component in the model with a newer or better version.
191
+
192
+ ```py
193
+ from diffusers import DiffusionPipeline, AutoencoderKL
194
+
195
+ vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16, use_safetensors=True)
196
+ pipeline = DiffusionPipeline.from_pretrained(
197
+ "stabilityai/stable-diffusion-xl-base-1.0",
198
+ vae=vae,
199
+ torch_dtype=torch.float16,
200
+ variant="fp16",
201
+ use_safetensors=True,
202
+ ).to("cuda")
203
+ ```
204
+
205
+ 5. More visibility and information about a model's components, which are stored in a [config.json](https://hf.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/unet/config.json) file in each component subfolder.
206
+
207
+ ### Single-file
208
+
209
+ The single-file layout stores all the model weights in a single file. All the model components (text encoder, UNet, VAE) weights are kept together instead of separately in subfolders. This can be a safetensors or ckpt file.
210
+
211
+ <div class="flex justify-center">
212
+ <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/single-file-layout.png"/>
213
+ </div>
214
+
215
+ To load from a single-file layout, use the [`~loaders.FromSingleFileMixin.from_single_file`] method.
216
+
217
+ ```py
218
+ import torch
219
+ from diffusers import StableDiffusionXLPipeline
220
+
221
+ pipeline = StableDiffusionXLPipeline.from_single_file(
222
+ "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0.safetensors",
223
+ torch_dtype=torch.float16,
224
+ variant="fp16",
225
+ use_safetensors=True,
226
+ ).to("cuda")
227
+ ```
228
+
229
+ Benefits of using a single-file layout include:
230
+
231
+ 1. Easy compatibility with diffusion interfaces such as [ComfyUI](https://github.com/comfyanonymous/ComfyUI) or [Automatic1111](https://github.com/AUTOMATIC1111/stable-diffusion-webui) which commonly use a single-file layout.
232
+ 2. Easier to manage (download and share) a single file.
233
+
234
+ ### DDUF
235
+
236
+ > [!WARNING]
237
+ > DDUF is an experimental file format and APIs related to it can change in the future.
238
+
239
+ DDUF (**D**DUF **D**iffusion **U**nified **F**ormat) is a file format designed to make storing, distributing, and using diffusion models much easier. Built on the ZIP file format, DDUF offers a standardized, efficient, and flexible way to package all parts of a diffusion model into a single, easy-to-manage file. It provides a balance between Diffusers multi-folder format and the widely popular single-file format.
240
+
241
+ Learn more details about DDUF on the Hugging Face Hub [documentation](https://huggingface.co/docs/hub/dduf).
242
+
243
+ Pass a checkpoint to the `dduf_file` parameter to load it in [`DiffusionPipeline`].
244
+
245
+ ```py
246
+ from diffusers import DiffusionPipeline
247
+ import torch
248
+
249
+ pipe = DiffusionPipeline.from_pretrained(
250
+ "DDUF/FLUX.1-dev-DDUF", dduf_file="FLUX.1-dev.dduf", torch_dtype=torch.bfloat16
251
+ ).to("cuda")
252
+ image = pipe(
253
+ "photo a cat holding a sign that says Diffusers", num_inference_steps=50, guidance_scale=3.5
254
+ ).images[0]
255
+ image.save("cat.png")
256
+ ```
257
+
258
+ To save a pipeline as a `.dduf` checkpoint, use the [`~huggingface_hub.export_folder_as_dduf`] utility, which takes care of all the necessary file-level validations.
259
+
260
+ ```py
261
+ from huggingface_hub import export_folder_as_dduf
262
+ from diffusers import DiffusionPipeline
263
+ import torch
264
+
265
+ pipe = DiffusionPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
266
+
267
+ save_folder = "flux-dev"
268
+ pipe.save_pretrained("flux-dev")
269
+ export_folder_as_dduf("flux-dev.dduf", folder_path=save_folder)
270
+
271
+ > [!TIP]
272
+ > Packaging and loading quantized checkpoints in the DDUF format is supported as long as they respect the multi-folder structure.
273
+
274
+ ## Convert layout and files
275
+
276
+ Diffusers provides many scripts and methods to convert storage layouts and file formats to enable broader support across the diffusion ecosystem.
277
+
278
+ Take a look at the [diffusers/scripts](https://github.com/huggingface/diffusers/tree/main/scripts) collection to find a script that fits your conversion needs.
279
+
280
+ > [!TIP]
281
+ > Scripts that have "`to_diffusers`" appended at the end mean they convert a model to the Diffusers-multifolder layout. Each script has their own specific set of arguments for configuring the conversion, so make sure you check what arguments are available!
282
+
283
+ For example, to convert a Stable Diffusion XL model stored in Diffusers-multifolder layout to a single-file layout, run the [convert_diffusers_to_original_sdxl.py](https://github.com/huggingface/diffusers/blob/main/scripts/convert_diffusers_to_original_sdxl.py) script. Provide the path to the model to convert, and the path to save the converted model to. You can optionally specify whether you want to save the model as a safetensors file and whether to save the model in half-precision.
284
+
285
+ ```bash
286
+ python convert_diffusers_to_original_sdxl.py --model_path path/to/model/to/convert --checkpoint_path path/to/save/model/to --use_safetensors
287
+ ```
288
+
289
+ You can also save a model to Diffusers-multifolder layout with the [`~DiffusionPipeline.save_pretrained`] method. This creates a directory for you if it doesn't already exist, and it also saves the files as a safetensors file by default.
290
+
291
+ ```py
292
+ from diffusers import StableDiffusionXLPipeline
293
+
294
+ pipeline = StableDiffusionXLPipeline.from_single_file(
295
+ "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0.safetensors",
296
+ )
297
+ pipeline.save_pretrained()
298
+ ```
299
+
300
+ Lastly, there are also Spaces, such as [SD To Diffusers](https://hf.co/spaces/diffusers/sd-to-diffusers) and [SD-XL To Diffusers](https://hf.co/spaces/diffusers/sdxl-to-diffusers), that provide a more user-friendly interface for converting models to Diffusers-multifolder layout. This is the easiest and most convenient option for converting layouts, and it'll open a PR on your model repository with the converted files. However, this option is not as reliable as running a script, and the Space may fail for more complicated models.
301
+
302
+ ## Single-file layout usage
303
+
304
+ Now that you're familiar with the differences between the Diffusers-multifolder and single-file layout, this section shows you how to load models and pipeline components, customize configuration options for loading, and load local files with the [`~loaders.FromSingleFileMixin.from_single_file`] method.
305
+
306
+ ### Load a pipeline or model
307
+
308
+ Pass the file path of the pipeline or model to the [`~loaders.FromSingleFileMixin.from_single_file`] method to load it.
309
+
310
+ <hfoptions id="pipeline-model">
311
+ <hfoption id="pipeline">
312
+
313
+ ```py
314
+ from diffusers import StableDiffusionXLPipeline
315
+
316
+ ckpt_path = "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0_0.9vae.safetensors"
317
+ pipeline = StableDiffusionXLPipeline.from_single_file(ckpt_path)
318
+ ```
319
+
320
+ </hfoption>
321
+ <hfoption id="model">
322
+
323
+ ```py
324
+ from diffusers import StableCascadeUNet
325
+
326
+ ckpt_path = "https://huggingface.co/stabilityai/stable-cascade/blob/main/stage_b_lite.safetensors"
327
+ model = StableCascadeUNet.from_single_file(ckpt_path)
328
+ ```
329
+
330
+ </hfoption>
331
+ </hfoptions>
332
+
333
+ Customize components in the pipeline by passing them directly to the [`~loaders.FromSingleFileMixin.from_single_file`] method. For example, you can use a different scheduler in a pipeline.
334
+
335
+ ```py
336
+ from diffusers import StableDiffusionXLPipeline, DDIMScheduler
337
+
338
+ ckpt_path = "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0_0.9vae.safetensors"
339
+ scheduler = DDIMScheduler()
340
+ pipeline = StableDiffusionXLPipeline.from_single_file(ckpt_path, scheduler=scheduler)
341
+ ```
342
+
343
+ Or you could use a ControlNet model in the pipeline.
344
+
345
+ ```py
346
+ from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
347
+
348
+ ckpt_path = "https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/v1-5-pruned-emaonly.safetensors"
349
+ controlnet = ControlNetModel.from_pretrained("lllyasviel/control_v11p_sd15_canny")
350
+ pipeline = StableDiffusionControlNetPipeline.from_single_file(ckpt_path, controlnet=controlnet)
351
+ ```
352
+
353
+ ### Customize configuration options
354
+
355
+ Models have a configuration file that define their attributes like the number of inputs in a UNet. Pipelines configuration options are available in the pipeline's class. For example, if you look at the [`StableDiffusionXLInstructPix2PixPipeline`] class, there is an option to scale the image latents with the `is_cosxl_edit` parameter.
356
+
357
+ These configuration files can be found in the models Hub repository or another location from which the configuration file originated (for example, a GitHub repository or locally on your device).
358
+
359
+ <hfoptions id="config-file">
360
+ <hfoption id="Hub configuration file">
361
+
362
+ > [!TIP]
363
+ > The [`~loaders.FromSingleFileMixin.from_single_file`] method automatically maps the checkpoint to the appropriate model repository, but there are cases where it is useful to use the `config` parameter. For example, if the model components in the checkpoint are different from the original checkpoint or if a checkpoint doesn't have the necessary metadata to correctly determine the configuration to use for the pipeline.
364
+
365
+ The [`~loaders.FromSingleFileMixin.from_single_file`] method automatically determines the configuration to use from the configuration file in the model repository. You could also explicitly specify the configuration to use by providing the repository id to the `config` parameter.
366
+
367
+ ```py
368
+ from diffusers import StableDiffusionXLPipeline
369
+
370
+ ckpt_path = "https://huggingface.co/segmind/SSD-1B/blob/main/SSD-1B.safetensors"
371
+ repo_id = "segmind/SSD-1B"
372
+
373
+ pipeline = StableDiffusionXLPipeline.from_single_file(ckpt_path, config=repo_id)
374
+ ```
375
+
376
+ The model loads the configuration file for the [UNet](https://huggingface.co/segmind/SSD-1B/blob/main/unet/config.json), [VAE](https://huggingface.co/segmind/SSD-1B/blob/main/vae/config.json), and [text encoder](https://huggingface.co/segmind/SSD-1B/blob/main/text_encoder/config.json) from their respective subfolders in the repository.
377
+
378
+ </hfoption>
379
+ <hfoption id="original configuration file">
380
+
381
+ The [`~loaders.FromSingleFileMixin.from_single_file`] method can also load the original configuration file of a pipeline that is stored elsewhere. Pass a local path or URL of the original configuration file to the `original_config` parameter.
382
+
383
+ ```py
384
+ from diffusers import StableDiffusionXLPipeline
385
+
386
+ ckpt_path = "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0_0.9vae.safetensors"
387
+ original_config = "https://raw.githubusercontent.com/Stability-AI/generative-models/main/configs/inference/sd_xl_base.yaml"
388
+
389
+ pipeline = StableDiffusionXLPipeline.from_single_file(ckpt_path, original_config=original_config)
390
+ ```
391
+
392
+ > [!TIP]
393
+ > Diffusers attempts to infer the pipeline components based on the type signatures of the pipeline class when you use `original_config` with `local_files_only=True`, instead of fetching the configuration files from the model repository on the Hub. This prevents backward breaking changes in code that can't connect to the internet to fetch the necessary configuration files.
394
+ >
395
+ > This is not as reliable as providing a path to a local model repository with the `config` parameter, and might lead to errors during pipeline configuration. To avoid errors, run the pipeline with `local_files_only=False` once to download the appropriate pipeline configuration files to the local cache.
396
+
397
+ </hfoption>
398
+ </hfoptions>
399
+
400
+ While the configuration files specify the pipeline or models default parameters, you can override them by providing the parameters directly to the [`~loaders.FromSingleFileMixin.from_single_file`] method. Any parameter supported by the model or pipeline class can be configured in this way.
401
+
402
+ <hfoptions id="override">
403
+ <hfoption id="pipeline">
404
+
405
+ For example, to scale the image latents in [`StableDiffusionXLInstructPix2PixPipeline`] pass the `is_cosxl_edit` parameter.
406
+
407
+ ```python
408
+ from diffusers import StableDiffusionXLInstructPix2PixPipeline
409
+
410
+ ckpt_path = "https://huggingface.co/stabilityai/cosxl/blob/main/cosxl_edit.safetensors"
411
+ pipeline = StableDiffusionXLInstructPix2PixPipeline.from_single_file(ckpt_path, config="diffusers/sdxl-instructpix2pix-768", is_cosxl_edit=True)
412
+ ```
413
+
414
+ </hfoption>
415
+ <hfoption id="model">
416
+
417
+ For example, to upcast the attention dimensions in a [`UNet2DConditionModel`] pass the `upcast_attention` parameter.
418
+
419
+ ```python
420
+ from diffusers import UNet2DConditionModel
421
+
422
+ ckpt_path = "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/blob/main/sd_xl_base_1.0_0.9vae.safetensors"
423
+ model = UNet2DConditionModel.from_single_file(ckpt_path, upcast_attention=True)
424
+ ```
425
+
426
+ </hfoption>
427
+ </hfoptions>
428
+
429
+ ### Local files
430
+
431
+ In Diffusers>=v0.28.0, the [`~loaders.FromSingleFileMixin.from_single_file`] method attempts to configure a pipeline or model by inferring the model type from the keys in the checkpoint file. The inferred model type is used to determine the appropriate model repository on the Hugging Face Hub to configure the model or pipeline.
432
+
433
+ For example, any single file checkpoint based on the Stable Diffusion XL base model will use the [stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) model repository to configure the pipeline.
434
+
435
+ But if you're working in an environment with restricted internet access, you should download the configuration files with the [`~huggingface_hub.snapshot_download`] function, and the model checkpoint with the [`~huggingface_hub.hf_hub_download`] function. By default, these files are downloaded to the Hugging Face Hub [cache directory](https://huggingface.co/docs/huggingface_hub/en/guides/manage-cache), but you can specify a preferred directory to download the files to with the `local_dir` parameter.
436
+
437
+ Pass the configuration and checkpoint paths to the [`~loaders.FromSingleFileMixin.from_single_file`] method to load locally.
438
+
439
+ <hfoptions id="local">
440
+ <hfoption id="Hub cache directory">
441
+
442
+ ```python
443
+ from huggingface_hub import hf_hub_download, snapshot_download
444
+
445
+ my_local_checkpoint_path = hf_hub_download(
446
+ repo_id="segmind/SSD-1B",
447
+ filename="SSD-1B.safetensors"
448
+ )
449
+
450
+ my_local_config_path = snapshot_download(
451
+ repo_id="segmind/SSD-1B",
452
+ allow_patterns=["*.json", "**/*.json", "*.txt", "**/*.txt"]
453
+ )
454
+
455
+ pipeline = StableDiffusionXLPipeline.from_single_file(my_local_checkpoint_path, config=my_local_config_path, local_files_only=True)
456
+ ```
457
+
458
+ </hfoption>
459
+ <hfoption id="specific local directory">
460
+
461
+ ```python
462
+ from huggingface_hub import hf_hub_download, snapshot_download
463
+
464
+ my_local_checkpoint_path = hf_hub_download(
465
+ repo_id="segmind/SSD-1B",
466
+ filename="SSD-1B.safetensors"
467
+ local_dir="my_local_checkpoints"
468
+ )
469
+
470
+ my_local_config_path = snapshot_download(
471
+ repo_id="segmind/SSD-1B",
472
+ allow_patterns=["*.json", "**/*.json", "*.txt", "**/*.txt"]
473
+ local_dir="my_local_config"
474
+ )
475
+
476
+ pipeline = StableDiffusionXLPipeline.from_single_file(my_local_checkpoint_path, config=my_local_config_path, local_files_only=True)
477
+ ```
478
+
479
+ </hfoption>
480
+ </hfoptions>
481
+
482
+ #### Local files without symlink
483
+
484
+ > [!TIP]
485
+ > In huggingface_hub>=v0.23.0, the `local_dir_use_symlinks` argument isn't necessary for the [`~huggingface_hub.hf_hub_download`] and [`~huggingface_hub.snapshot_download`] functions.
486
+
487
+ The [`~loaders.FromSingleFileMixin.from_single_file`] method relies on the [huggingface_hub](https://hf.co/docs/huggingface_hub/index) caching mechanism to fetch and store checkpoints and configuration files for models and pipelines. If you're working with a file system that does not support symlinking, you should download the checkpoint file to a local directory first, and disable symlinking with the `local_dir_use_symlink=False` parameter in the [`~huggingface_hub.hf_hub_download`] function and [`~huggingface_hub.snapshot_download`] functions.
488
+
489
+ ```python
490
+ from huggingface_hub import hf_hub_download, snapshot_download
491
+
492
+ my_local_checkpoint_path = hf_hub_download(
493
+ repo_id="segmind/SSD-1B",
494
+ filename="SSD-1B.safetensors"
495
+ local_dir="my_local_checkpoints",
496
+ local_dir_use_symlinks=False
497
+ )
498
+ print("My local checkpoint: ", my_local_checkpoint_path)
499
+
500
+ my_local_config_path = snapshot_download(
501
+ repo_id="segmind/SSD-1B",
502
+ allow_patterns=["*.json", "**/*.json", "*.txt", "**/*.txt"]
503
+ local_dir_use_symlinks=False,
504
+ )
505
+ print("My local config: ", my_local_config_path)
506
+ ```
507
+
508
+ Then you can pass the local paths to the `pretrained_model_link_or_path` and `config` parameters.
509
+
510
+ ```python
511
+ pipeline = StableDiffusionXLPipeline.from_single_file(my_local_checkpoint_path, config=my_local_config_path, local_files_only=True)
512
+ ```