Buckets:

rtrm's picture
download
raw
27.1 kB
<meta charset="utf-8" /><meta http-equiv="content-security-policy" content=""><meta name="hf:doc:metadata" content="{&quot;local&quot;:&quot;pipelines&quot;,&quot;sections&quot;:[{&quot;local&quot;:&quot;diffusers-summary&quot;,&quot;title&quot;:&quot;🧨 Diffusers Summary&quot;},{&quot;local&quot;:&quot;pipelines-api&quot;,&quot;title&quot;:&quot;Pipelines API&quot;}],&quot;title&quot;:&quot;Pipelines&quot;}" data-svelte="svelte-1phssyn">
<link rel="modulepreload" href="/docs/diffusers/v0.18.2/en/_app/assets/pages/__layout.svelte-hf-doc-builder.css">
<link rel="modulepreload" href="/docs/diffusers/v0.18.2/en/_app/start-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/diffusers/v0.18.2/en/_app/chunks/vendor-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/diffusers/v0.18.2/en/_app/chunks/paths-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/diffusers/v0.18.2/en/_app/pages/__layout.svelte-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/diffusers/v0.18.2/en/_app/pages/api/pipelines/overview.mdx-hf-doc-builder.js">
<link rel="modulepreload" href="/docs/diffusers/v0.18.2/en/_app/chunks/IconCopyLink-hf-doc-builder.js">
<h1 class="relative group"><a id="pipelines" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#pipelines"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a>
<span>Pipelines
</span></h1>
<p>Pipelines provide a simple way to run state-of-the-art diffusion models in inference.
Most diffusion systems consist of multiple independently-trained models and highly adaptable scheduler
components - all of which are needed to have a functioning end-to-end diffusion system.</p>
<p>As an example, <a href="https://huggingface.co/blog/stable_diffusion" rel="nofollow">Stable Diffusion</a> has three independently trained models:</p>
<ul><li><a href="./api/models#vae">Autoencoder</a></li>
<li><a href="./api/models#UNet2DConditionModel">Conditional Unet</a></li>
<li><a href="https://huggingface.co/docs/transformers/v4.27.1/en/model_doc/clip#transformers.CLIPTextModel" rel="nofollow">CLIP text encoder</a></li>
<li>a scheduler component, <a href="./api/scheduler#pndm">scheduler</a>, </li>
<li>a <a href="https://huggingface.co/docs/transformers/v4.27.1/en/model_doc/clip#transformers.CLIPImageProcessor" rel="nofollow">CLIPImageProcessor</a>,</li>
<li>as well as a <a href="./stable_diffusion#safety_checker">safety checker</a>.
All of these components are necessary to run stable diffusion in inference even though they were trained
or created independently from each other.</li></ul>
<p>To that end, we strive to offer all open-sourced, state-of-the-art diffusion system under a unified API.
More specifically, we strive to provide pipelines that</p>
<ul><li><ol><li>can load the officially published weights and yield 1-to-1 the same outputs as the original implementation according to the corresponding paper (<em>e.g.</em> <a href="https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/latent_diffusion" rel="nofollow">LDMTextToImagePipeline</a>, uses the officially released weights of <a href="https://arxiv.org/abs/2112.10752" rel="nofollow">High-Resolution Image Synthesis with Latent Diffusion Models</a>),</li></ol></li>
<li><ol start="2"><li>have a simple user interface to run the model in inference (see the <a href="#pipelines-api">Pipelines API</a> section), </li></ol></li>
<li><ol start="3"><li>are easy to understand with code that is self-explanatory and can be read along-side the official paper (see <a href="#pipelines-summary">Pipelines summary</a>),</li></ol></li>
<li><ol start="4"><li>can easily be contributed by the community (see the <a href="#contribution">Contribution</a> section).</li></ol></li></ul>
<p><strong>Note</strong> that pipelines do not (and should not) offer any training functionality.
If you are looking for <em>official</em> training examples, please have a look at <a href="https://github.com/huggingface/diffusers/tree/main/examples" rel="nofollow">examples</a>.</p>
<h2 class="relative group"><a id="diffusers-summary" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#diffusers-summary"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a>
<span>🧨 Diffusers Summary
</span></h2>
<p>The following table summarizes all officially supported pipelines, their corresponding paper, and if
available a colab notebook to directly try them out.</p>
<table><thead><tr><th>Pipeline</th>
<th>Paper</th>
<th align="center">Tasks</th>
<th align="center">Colab</th></tr></thead>
<tbody><tr><td><a href="./alt_diffusion">alt_diffusion</a></td>
<td><a href="https://arxiv.org/abs/2211.06679" rel="nofollow"><strong>AltDiffusion</strong></a></td>
<td align="center">Image-to-Image Text-Guided Generation</td>
<td align="center">-</td></tr>
<tr><td><a href="./audio_diffusion">audio_diffusion</a></td>
<td><a href="https://github.com/teticio/audio_diffusion.git" rel="nofollow"><strong>Audio Diffusion</strong></a></td>
<td align="center">Unconditional Audio Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./api/pipelines/controlnet">controlnet</a></td>
<td><a href="https://arxiv.org/abs/2302.05543" rel="nofollow"><strong>ControlNet with Stable Diffusion</strong></a></td>
<td align="center">Image-to-Image Text-Guided Generation</td>
<td align="center"><a href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/controlnet.ipynb" rel="nofollow"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a></td></tr>
<tr><td><a href="./cycle_diffusion">cycle_diffusion</a></td>
<td><a href="https://arxiv.org/abs/2210.05559" rel="nofollow"><strong>Cycle Diffusion</strong></a></td>
<td align="center">Image-to-Image Text-Guided Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./dance_diffusion">dance_diffusion</a></td>
<td><a href="https://github.com/williamberman/diffusers.git" rel="nofollow"><strong>Dance Diffusion</strong></a></td>
<td align="center">Unconditional Audio Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./ddpm">ddpm</a></td>
<td><a href="https://arxiv.org/abs/2006.11239" rel="nofollow"><strong>Denoising Diffusion Probabilistic Models</strong></a></td>
<td align="center">Unconditional Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./ddim">ddim</a></td>
<td><a href="https://arxiv.org/abs/2010.02502" rel="nofollow"><strong>Denoising Diffusion Implicit Models</strong></a></td>
<td align="center">Unconditional Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./if">if</a></td>
<td><a href="https://github.com/deep-floyd/IF" rel="nofollow"><strong>IF</strong></a></td>
<td align="center">Image Generation</td>
<td align="center"><a href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/deepfloyd_if_free_tier_google_colab.ipynb" rel="nofollow"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a></td></tr>
<tr><td><a href="./if">if_img2img</a></td>
<td><a href="https://github.com/deep-floyd/IF" rel="nofollow"><strong>IF</strong></a></td>
<td align="center">Image-to-Image Generation</td>
<td align="center"><a href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/deepfloyd_if_free_tier_google_colab.ipynb" rel="nofollow"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a></td></tr>
<tr><td><a href="./if">if_inpainting</a></td>
<td><a href="https://github.com/deep-floyd/IF" rel="nofollow"><strong>IF</strong></a></td>
<td align="center">Image-to-Image Generation</td>
<td align="center"><a href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/deepfloyd_if_free_tier_google_colab.ipynb" rel="nofollow"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a></td></tr>
<tr><td><a href="./kandinsky">kandinsky</a></td>
<td><strong>Kandinsky</strong></td>
<td align="center">Text-to-Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./kandinsky">kandinsky_inpaint</a></td>
<td><strong>Kandinsky</strong></td>
<td align="center">Image-to-Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./kandinsky">kandinsky_img2img</a></td>
<td><strong>Kandinsksy</strong></td>
<td align="center">Image-to-Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./latent_diffusion">latent_diffusion</a></td>
<td><a href="https://arxiv.org/abs/2112.10752" rel="nofollow"><strong>High-Resolution Image Synthesis with Latent Diffusion Models</strong></a></td>
<td align="center">Text-to-Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./latent_diffusion">latent_diffusion</a></td>
<td><a href="https://arxiv.org/abs/2112.10752" rel="nofollow"><strong>High-Resolution Image Synthesis with Latent Diffusion Models</strong></a></td>
<td align="center">Super Resolution Image-to-Image</td>
<td align="center"></td></tr>
<tr><td><a href="./latent_diffusion_uncond">latent_diffusion_uncond</a></td>
<td><a href="https://arxiv.org/abs/2112.10752" rel="nofollow"><strong>High-Resolution Image Synthesis with Latent Diffusion Models</strong></a></td>
<td align="center">Unconditional Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./paint_by_example">paint_by_example</a></td>
<td><a href="https://arxiv.org/abs/2211.13227" rel="nofollow"><strong>Paint by Example: Exemplar-based Image Editing with Diffusion Models</strong></a></td>
<td align="center">Image-Guided Image Inpainting</td>
<td align="center"></td></tr>
<tr><td><a href="./paradigms">paradigms</a></td>
<td><a href="https://arxiv.org/abs/2305.16317" rel="nofollow"><strong>Parallel Sampling of Diffusion Models</strong></a></td>
<td align="center">Text-to-Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./pndm">pndm</a></td>
<td><a href="https://arxiv.org/abs/2202.09778" rel="nofollow"><strong>Pseudo Numerical Methods for Diffusion Models on Manifolds</strong></a></td>
<td align="center">Unconditional Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./score_sde_ve">score_sde_ve</a></td>
<td><a href="https://openreview.net/forum?id=PxTIG12RRHS" rel="nofollow"><strong>Score-Based Generative Modeling through Stochastic Differential Equations</strong></a></td>
<td align="center">Unconditional Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./score_sde_vp">score_sde_vp</a></td>
<td><a href="https://openreview.net/forum?id=PxTIG12RRHS" rel="nofollow"><strong>Score-Based Generative Modeling through Stochastic Differential Equations</strong></a></td>
<td align="center">Unconditional Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./semantic_stable_diffusion">semantic_stable_diffusion</a></td>
<td><a href="https://arxiv.org/abs/2301.12247" rel="nofollow"><strong>SEGA: Instructing Diffusion using Semantic Dimensions</strong></a></td>
<td align="center">Text-to-Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./stable_diffusion/text2img">stable_diffusion_text2img</a></td>
<td><a href="https://stability.ai/blog/stable-diffusion-public-release" rel="nofollow"><strong>Stable Diffusion</strong></a></td>
<td align="center">Text-to-Image Generation</td>
<td align="center"><a href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb" rel="nofollow"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a></td></tr>
<tr><td><a href="./stable_diffusion/img2img">stable_diffusion_img2img</a></td>
<td><a href="https://stability.ai/blog/stable-diffusion-public-release" rel="nofollow"><strong>Stable Diffusion</strong></a></td>
<td align="center">Image-to-Image Text-Guided Generation</td>
<td align="center"><a href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/image_2_image_using_diffusers.ipynb" rel="nofollow"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a></td></tr>
<tr><td><a href="./stable_diffusion/inpaint">stable_diffusion_inpaint</a></td>
<td><a href="https://stability.ai/blog/stable-diffusion-public-release" rel="nofollow"><strong>Stable Diffusion</strong></a></td>
<td align="center">Text-Guided Image Inpainting</td>
<td align="center"><a href="https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/in_painting_with_stable_diffusion_using_diffusers.ipynb" rel="nofollow"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a></td></tr>
<tr><td><a href="./stable_diffusion/panorama">stable_diffusion_panorama</a></td>
<td><a href="https://arxiv.org/abs/2302.08113" rel="nofollow"><strong>MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation</strong></a></td>
<td align="center">Text-Guided Panorama View Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./stable_diffusion/pix2pix">stable_diffusion_pix2pix</a></td>
<td><a href="https://arxiv.org/abs/2211.09800" rel="nofollow"><strong>InstructPix2Pix: Learning to Follow Image Editing Instructions</strong></a></td>
<td align="center">Text-Based Image Editing</td>
<td align="center"></td></tr>
<tr><td><a href="./stable_diffusion/pix2pix_zero">stable_diffusion_pix2pix_zero</a></td>
<td><a href="https://arxiv.org/abs/2302.03027" rel="nofollow"><strong>Zero-shot Image-to-Image Translation</strong></a></td>
<td align="center">Text-Based Image Editing</td>
<td align="center"></td></tr>
<tr><td><a href="./stable_diffusion/attend_and_excite">stable_diffusion_attend_and_excite</a></td>
<td><a href="https://arxiv.org/abs/2301.13826" rel="nofollow"><strong>Attend and Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models</strong></a></td>
<td align="center">Text-to-Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./stable_diffusion/self_attention_guidance">stable_diffusion_self_attention_guidance</a></td>
<td><a href="https://arxiv.org/abs/2210.00939" rel="nofollow"><strong>Self-Attention Guidance</strong></a></td>
<td align="center">Text-to-Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./stable_diffusion/image_variation">stable_diffusion_image_variation</a></td>
<td><a href="https://github.com/LambdaLabsML/lambda-diffusers#stable-diffusion-image-variations" rel="nofollow"><strong>Stable Diffusion Image Variations</strong></a></td>
<td align="center">Image-to-Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./stable_diffusion/latent_upscale">stable_diffusion_latent_upscale</a></td>
<td><a href="https://twitter.com/StabilityAI/status/1590531958815064065" rel="nofollow"><strong>Stable Diffusion Latent Upscaler</strong></a></td>
<td align="center">Text-Guided Super Resolution Image-to-Image</td>
<td align="center"></td></tr>
<tr><td><a href="./stable_diffusion/stable_diffusion_2">stable_diffusion_2</a></td>
<td><a href="https://stability.ai/blog/stable-diffusion-v2-release" rel="nofollow"><strong>Stable Diffusion 2</strong></a></td>
<td align="center">Text-Guided Image Inpainting</td>
<td align="center"></td></tr>
<tr><td><a href="./stable_diffusion/stable_diffusion_2">stable_diffusion_2</a></td>
<td><a href="https://stability.ai/blog/stable-diffusion-v2-release" rel="nofollow"><strong>Stable Diffusion 2</strong></a></td>
<td align="center">Depth-to-Image Text-Guided Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./stable_diffusion/stable_diffusion_2">stable_diffusion_2</a></td>
<td><a href="https://stability.ai/blog/stable-diffusion-v2-release" rel="nofollow"><strong>Stable Diffusion 2</strong></a></td>
<td align="center">Text-Guided Super Resolution Image-to-Image</td>
<td align="center"></td></tr>
<tr><td><a href="./stable_diffusion_safe">stable_diffusion_safe</a></td>
<td><a href="https://arxiv.org/abs/2211.05105" rel="nofollow"><strong>Safe Stable Diffusion</strong></a></td>
<td align="center">Text-Guided Generation</td>
<td align="center"><a href="https://colab.research.google.com/github/ml-research/safe-latent-diffusion/blob/main/examples/Safe%20Latent%20Diffusion.ipynb" rel="nofollow"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"></a></td></tr>
<tr><td><a href="./stable_unclip">stable_unclip</a></td>
<td><strong>Stable unCLIP</strong></td>
<td align="center">Text-to-Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./stable_unclip">stable_unclip</a></td>
<td><strong>Stable unCLIP</strong></td>
<td align="center">Image-to-Image Text-Guided Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./stochastic_karras_ve">stochastic_karras_ve</a></td>
<td><a href="https://arxiv.org/abs/2206.00364" rel="nofollow"><strong>Elucidating the Design Space of Diffusion-Based Generative Models</strong></a></td>
<td align="center">Unconditional Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./api/pipelines/text_to_video">text_to_video_sd</a></td>
<td><a href="https://modelscope.cn/models/damo/text-to-video-synthesis/summary" rel="nofollow"><strong>Modelscope’s Text-to-video-synthesis Model in Open Domain</strong></a></td>
<td align="center">Text-to-Video Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./unclip">unclip</a></td>
<td><a href="https://arxiv.org/abs/2204.06125" rel="nofollow">**Hierarchical Text-Conditional Image Generation with CLIP Latents</a></td>
<td align="center">Text-to-Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./versatile_diffusion">versatile_diffusion</a></td>
<td><a href="https://arxiv.org/abs/2211.08332" rel="nofollow"><strong>Versatile Diffusion: Text, Images and Variations All in One Diffusion Model</strong></a></td>
<td align="center">Text-to-Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./versatile_diffusion">versatile_diffusion</a></td>
<td><a href="https://arxiv.org/abs/2211.08332" rel="nofollow"><strong>Versatile Diffusion: Text, Images and Variations All in One Diffusion Model</strong></a></td>
<td align="center">Image Variations Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./versatile_diffusion">versatile_diffusion</a></td>
<td><a href="https://arxiv.org/abs/2211.08332" rel="nofollow"><strong>Versatile Diffusion: Text, Images and Variations All in One Diffusion Model</strong></a></td>
<td align="center">Dual Image and Text Guided Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./vq_diffusion">vq_diffusion</a></td>
<td><a href="https://arxiv.org/abs/2111.14822" rel="nofollow"><strong>Vector Quantized Diffusion Model for Text-to-Image Synthesis</strong></a></td>
<td align="center">Text-to-Image Generation</td>
<td align="center"></td></tr>
<tr><td><a href="./text_to_video_zero">text_to_video_zero</a></td>
<td><a href="https://arxiv.org/abs/2303.13439" rel="nofollow"><strong>Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators</strong></a></td>
<td align="center">Text-to-Video Generation</td>
<td align="center"></td></tr></tbody></table>
<p><strong>Note</strong>: Pipelines are simple examples of how to play around with the diffusion systems as described in the corresponding papers. </p>
<p>However, most of them can be adapted to use different scheduler components or even different model components. Some pipeline examples are shown in the <a href="#examples">Examples</a> below.</p>
<h2 class="relative group"><a id="pipelines-api" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#pipelines-api"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a>
<span>Pipelines API
</span></h2>
<p>Diffusion models often consist of multiple independently-trained models or other previously existing components. </p>
<p>Each model has been trained independently on a different task and the scheduler can easily be swapped out and replaced with a different one.
During inference, we however want to be able to easily load all components and use them in inference - even if one component, <em>e.g.</em> CLIP’s text encoder, originates from a different library, such as <a href="https://github.com/huggingface/transformers" rel="nofollow">Transformers</a>. To that end, all pipelines provide the following functionality:</p>
<ul><li><a href="../diffusion_pipeline"><code>from_pretrained</code> method</a> that accepts a Hugging Face Hub repository id, <em>e.g.</em> <a href="https://huggingface.co/runwayml/stable-diffusion-v1-5" rel="nofollow">runwayml/stable-diffusion-v1-5</a> or a path to a local directory, <em>e.g.</em>
”./stable-diffusion”. To correctly retrieve which models and components should be loaded, one has to provide a <code>model_index.json</code> file, <em>e.g.</em> <a href="https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/model_index.json" rel="nofollow">runwayml/stable-diffusion-v1-5/model_index.json</a>, which defines all components that should be
loaded into the pipelines. More specifically, for each model/component one needs to define the format <code>&lt;name&gt;: [&quot;&lt;library&gt;&quot;, &quot;&lt;class name&gt;&quot;]</code>. <code>&lt;name&gt;</code> is the attribute name given to the loaded instance of <code>&lt;class name&gt;</code> which can be found in the library or pipeline folder called <code>&quot;&lt;library&gt;&quot;</code>.</li>
<li><a href="../diffusion_pipeline"><code>save_pretrained</code></a> that accepts a local path, <em>e.g.</em> <code>./stable-diffusion</code> under which all models/components of the pipeline will be saved. For each component/model a folder is created inside the local path that is named after the given attribute name, <em>e.g.</em> <code>./stable_diffusion/unet</code>.
In addition, a <code>model_index.json</code> file is created at the root of the local path, <em>e.g.</em> <code>./stable_diffusion/model_index.json</code> so that the complete pipeline can again be instantiated
from the local path.</li>
<li><a href="../diffusion_pipeline"><code>to</code></a> which accepts a <code>string</code> or <code>torch.device</code> to move all models that are of type <code>torch.nn.Module</code> to the passed device. The behavior is fully analogous to <a href="https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.to" rel="nofollow">PyTorch’s <code>to</code> method</a>.</li>
<li><code>__call__</code> method to use the pipeline in inference. <code>__call__</code> defines inference logic of the pipeline and should ideally encompass all aspects of it, from pre-processing to forwarding tensors to the different models and schedulers, as well as post-processing. The API of the <code>__call__</code> method can strongly vary from pipeline to pipeline. <em>E.g.</em> a text-to-image pipeline, such as <a href="./stable_diffusion"><code>StableDiffusionPipeline</code></a> should accept among other things the text prompt to generate the image. A pure image generation pipeline, such as <a href="https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/ddpm" rel="nofollow">DDPMPipeline</a> on the other hand can be run without providing any inputs. To better understand what inputs can be adapted for
each pipeline, one should look directly into the respective pipeline.</li></ul>
<p><strong>Note</strong>: All pipelines have PyTorch’s autograd disabled by decorating the <code>__call__</code> method with a <a href="https://pytorch.org/docs/stable/generated/torch.no_grad.html" rel="nofollow"><code>torch.no_grad</code></a> decorator because pipelines should
not be used for training. If you want to store the gradients during the forward pass, we recommend writing your own pipeline, see also our <a href="https://github.com/huggingface/diffusers/tree/main/examples/community" rel="nofollow">community-examples</a>.</p>
<script type="module" data-hydrate="shz8d3">
import { start } from "/docs/diffusers/v0.18.2/en/_app/start-hf-doc-builder.js";
start({
target: document.querySelector('[data-hydrate="shz8d3"]').parentNode,
paths: {"base":"/docs/diffusers/v0.18.2/en","assets":"/docs/diffusers/v0.18.2/en"},
session: {},
route: false,
spa: false,
trailing_slash: "never",
hydrate: {
status: 200,
error: null,
nodes: [
import("/docs/diffusers/v0.18.2/en/_app/pages/__layout.svelte-hf-doc-builder.js"),
import("/docs/diffusers/v0.18.2/en/_app/pages/api/pipelines/overview.mdx-hf-doc-builder.js")
],
params: {}
}
});
</script>

Xet Storage Details

Size:
27.1 kB
·
Xet hash:
b0626af0794904c46a2547902d8b88f18d34a145a872fef1af994f9078e8b9fb

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.