Buckets:
| <meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"设计哲学","local":"设计哲学","sections":[{"title":"可用性优先于性能","local":"可用性优先于性能","sections":[],"depth":2},{"title":"简洁优于简易","local":"简洁优于简易","sections":[],"depth":2},{"title":"可定制与贡献友好优于抽象","local":"可定制与贡献友好优于抽象","sections":[],"depth":2},{"title":"设计哲学细节","local":"设计哲学细节","sections":[{"title":"流程(Pipelines)","local":"流程pipelines","sections":[],"depth":3},{"title":"模型","local":"模型","sections":[],"depth":3},{"title":"调度器(Schedulers)","local":"调度器schedulers","sections":[],"depth":3}],"depth":2}],"depth":1}"> | |
| <link href="/docs/diffusers/pr_11739/zh/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11739/zh/_app/immutable/entry/start.95a8faef.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11739/zh/_app/immutable/chunks/scheduler.e4ff9b64.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11739/zh/_app/immutable/chunks/singletons.0a6f1d19.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11739/zh/_app/immutable/chunks/index.f9be34a7.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11739/zh/_app/immutable/chunks/paths.37f6e25a.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11739/zh/_app/immutable/entry/app.a988cdaf.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11739/zh/_app/immutable/chunks/preload-helper.3e2c3f46.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11739/zh/_app/immutable/chunks/index.09f1bca0.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11739/zh/_app/immutable/nodes/0.0ec3fec6.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11739/zh/_app/immutable/chunks/each.e59479a4.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11739/zh/_app/immutable/nodes/6.d696a00b.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/pr_11739/zh/_app/immutable/chunks/MermaidChart.svelte_svelte_type_style_lang.f5199cd9.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"设计哲学","local":"设计哲学","sections":[{"title":"可用性优先于性能","local":"可用性优先于性能","sections":[],"depth":2},{"title":"简洁优于简易","local":"简洁优于简易","sections":[],"depth":2},{"title":"可定制与贡献友好优于抽象","local":"可定制与贡献友好优于抽象","sections":[],"depth":2},{"title":"设计哲学细节","local":"设计哲学细节","sections":[{"title":"流程(Pipelines)","local":"流程pipelines","sections":[],"depth":3},{"title":"模型","local":"模型","sections":[],"depth":3},{"title":"调度器(Schedulers)","local":"调度器schedulers","sections":[],"depth":3}],"depth":2}],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <div class="items-center shrink-0 min-w-[100px] max-sm:min-w-[50px] justify-end ml-auto flex" style="float: right; margin-left: 10px; display: inline-flex; position: relative; z-index: 10;"><div class="inline-flex rounded-md max-sm:rounded-sm"><button class="inline-flex items-center gap-1 max-sm:gap-0.5 h-6 max-sm:h-5 px-2 max-sm:px-1.5 text-[11px] max-sm:text-[9px] font-medium text-gray-800 border border-r-0 rounded-l-md max-sm:rounded-l-sm border-gray-200 bg-white hover:shadow-inner dark:border-gray-850 dark:bg-gray-950 dark:text-gray-200 dark:hover:bg-gray-800" aria-live="polite"><span class="inline-flex items-center justify-center rounded-md p-0.5 max-sm:p-0"><svg class="w-3 h-3 max-sm:w-2.5 max-sm:h-2.5" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg></span> <span>Copy page</span></button> <button class="inline-flex items-center justify-center w-6 max-sm:w-5 h-6 max-sm:h-5 disabled:pointer-events-none text-sm text-gray-500 hover:text-gray-700 dark:hover:text-white rounded-r-md max-sm:rounded-r-sm border border-l transition border-gray-200 bg-white hover:shadow-inner dark:border-gray-850 dark:bg-gray-950 dark:text-gray-200 dark:hover:bg-gray-800" aria-haspopup="menu" aria-expanded="false" aria-label="Open copy menu"><svg class="transition-transform text-gray-400 overflow-visible w-3 h-3 max-sm:w-2.5 max-sm:h-2.5 rotate-0" width="1em" height="1em" viewBox="0 0 12 7" fill="none" xmlns="http://www.w3.org/2000/svg"><path d="M1 1L6 6L11 1" stroke="currentColor"></path></svg></button></div> </div> <h1 class="relative group"><a id="设计哲学" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#设计哲学"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>设计哲学</span></h1> <p data-svelte-h="svelte-2ee83w">🧨 Diffusers 提供<strong>最先进</strong>的预训练扩散模型支持多模态任务。 | |
| 其目标是成为推理和训练通用的<strong>模块化工具箱</strong>。</p> <p data-svelte-h="svelte-ez3bid">我们致力于构建一个经得起时间考验的库,因此对API设计极为重视。</p> <p data-svelte-h="svelte-bzsbip">简而言之,Diffusers 被设计为 PyTorch 的自然延伸。因此,我们的多数设计决策都基于 <a href="https://pytorch.org/docs/stable/community/design.html#pytorch-design-philosophy" rel="nofollow">PyTorch 设计原则</a>。以下是核心原则:</p> <h2 class="relative group"><a id="可用性优先于性能" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#可用性优先于性能"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>可用性优先于性能</span></h2> <ul data-svelte-h="svelte-131cj8o"><li>尽管 Diffusers 包含众多性能优化特性(参见<a href="https://huggingface.co/docs/diffusers/optimization/fp16" rel="nofollow">内存与速度优化</a>),模型默认总是以最高精度和最低优化级别加载。因此除非用户指定,扩散流程(pipeline)默认在CPU上以float32精度初始化。这确保了跨平台和加速器的可用性,意味着运行本库无需复杂安装。</li> <li>Diffusers 追求<strong>轻量化</strong>,仅有少量必需依赖,但提供诸多可选依赖以提升性能(如<code>accelerate</code>、<code>safetensors</code>、<code>onnx</code>等)。我们竭力保持库的轻量级特性,使其能轻松作为其他包的依赖项。</li> <li>Diffusers 偏好简单、自解释的代码而非浓缩的”魔法”代码。这意味着lambda函数等简写语法和高级PyTorch操作符通常不被采用。</li></ul> <h2 class="relative group"><a id="简洁优于简易" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#简洁优于简易"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>简洁优于简易</span></h2> <p data-svelte-h="svelte-856wcm">正如PyTorch所言:<strong>显式优于隐式</strong>,<strong>简洁优于复杂</strong>。这一哲学体现在库的多个方面:</p> <ul data-svelte-h="svelte-16a0e83"><li>我们遵循PyTorch的API设计,例如使用<a href="https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.to" rel="nofollow"><code>DiffusionPipeline.to</code></a>让用户自主管理设备。</li> <li>明确的错误提示优于静默纠正错误输入。Diffusers 旨在教育用户,而非单纯降低使用难度。</li> <li>暴露复杂的模型与调度器(scheduler)交互逻辑而非内部魔法处理。调度器/采样器与扩散模型分离且相互依赖最小化,迫使用户编写展开的去噪循环。但这种分离便于调试,并赋予用户更多控制权来调整去噪过程或切换模型/调度器。</li> <li>扩散流程中独立训练的组件(如文本编码器、UNet、变分自编码器)各有专属模型类。这要求用户处理组件间交互,且序列化格式将组件分存不同文件。但此举便于调试和定制,得益于组件分离,DreamBooth或Textual Inversion训练变得极为简单。</li></ul> <h2 class="relative group"><a id="可定制与贡献友好优于抽象" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#可定制与贡献友好优于抽象"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>可定制与贡献友好优于抽象</span></h2> <p data-svelte-h="svelte-1p7f16p">库的大部分沿用了<a href="https://github.com/huggingface/transformers" rel="nofollow">Transformers库</a>的重要设计原则:宁要重复代码,勿要仓促抽象。这一原则与<a href="https://en.wikipedia.org/wiki/Don%27t_repeat_yourself" rel="nofollow">DRY原则</a>形成鲜明对比。</p> <p data-svelte-h="svelte-46p1u">简言之,正如Transformers对建模文件的做法,Diffusers对流程(pipeline)和调度器(scheduler)保持极低抽象度与高度自包含代码。函数、长代码块甚至类可能在多文件中重复,初看像是糟糕的松散设计。但该设计已被Transformers证明极其成功,对社区驱动的开源机器学习库意义重大:</p> <ul data-svelte-h="svelte-7h7hv0"><li>机器学习领域发展迅猛,范式、模型架构和算法快速迭代,难以定义长效代码抽象。</li> <li>ML从业者常需快速修改现有代码进行研究,因此偏好自包含代码而非多重抽象。</li> <li>开源库依赖社区贡献,必须构建易于参与的代码库。抽象度越高、依赖越复杂、可读性越差,贡献难度越大。过度抽象的库会吓退贡献者。若贡献不会破坏核心功能,不仅吸引新贡献者,也更便于并行审查和修改。</li></ul> <p data-svelte-h="svelte-1flexkj">Hugging Face称此设计为<strong>单文件政策</strong>——即某个类的几乎所有代码都应写在单一自包含文件中。更多哲学探讨可参阅<a href="https://huggingface.co/blog/transformers-design-philosophy" rel="nofollow">此博文</a>。</p> <p data-svelte-h="svelte-593bid">Diffusers对流程和调度器完全遵循该哲学,但对diffusion模型仅部分适用。原因在于多数扩散流程(如<a href="https://huggingface.co/docs/diffusers/api/pipelines/ddpm" rel="nofollow">DDPM</a>、<a href="https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/overview#stable-diffusion-pipelines" rel="nofollow">Stable Diffusion</a>、<a href="https://huggingface.co/docs/diffusers/api/pipelines/unclip" rel="nofollow">unCLIP (DALL·E 2)</a>和<a href="https://imagen.research.google/" rel="nofollow">Imagen</a>)都基于相同扩散模型——<a href="https://huggingface.co/docs/diffusers/api/models/unet2d-cond" rel="nofollow">UNet</a>。</p> <p data-svelte-h="svelte-9tx02f">现在您应已理解🧨 Diffusers的设计理念🤗。我们力求在全库贯彻这些原则,但仍存在少数例外或欠佳设计。如有反馈,我们❤️欢迎在<a href="https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feedback.md&title=" rel="nofollow">GitHub提交</a>。</p> <h2 class="relative group"><a id="设计哲学细节" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#设计哲学细节"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>设计哲学细节</span></h2> <p data-svelte-h="svelte-108eoeg">现在深入探讨设计细节。Diffusers主要包含三类:<a href="https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines" rel="nofollow">流程(pipeline)</a>、<a href="https://github.com/huggingface/diffusers/tree/main/src/diffusers/models" rel="nofollow">模型</a>和<a href="https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers" rel="nofollow">调度器(scheduler)</a>。以下是各类的具体设计决策。</p> <h3 class="relative group"><a id="流程pipelines" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#流程pipelines"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>流程(Pipelines)</span></h3> <p data-svelte-h="svelte-1l2p2pp">流程设计追求易用性(因此不完全遵循<a href="#%E7%AE%80%E6%B4%81%E4%BC%98%E4%BA%8E%E7%AE%80%E6%98%93"><em>简洁优于简易</em></a>),不要求功能完备,应视为使用<a href="#%E6%A8%A1%E5%9E%8B">模型</a>和<a href="#%E8%B0%83%E5%BA%A6%E5%99%A8schedulers">调度器</a>进行推理的示例。</p> <p data-svelte-h="svelte-hkd8r">遵循原则:</p> <ul data-svelte-h="svelte-csdnjg"><li>采用单文件政策。所有流程位于src/diffusers/pipelines下的独立目录。一个流程文件夹对应一篇扩散论文/项目/发布。如<a href="https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion" rel="nofollow"><code>src/diffusers/pipelines/stable-diffusion</code></a>可包含多个流程文件。若流程功能相似,可使用<a href="https://github.com/huggingface/diffusers/blob/125d783076e5bd9785beb05367a2d2566843a271/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py#L251" rel="nofollow"># Copied from机制</a>。</li> <li>所有流程继承<code>DiffusionPipeline</code>。</li> <li>每个流程由不同模型和调度器组件构成,这些组件记录于<a href="https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5/blob/main/model_index.json" rel="nofollow"><code>model_index.json</code>文件</a>,可通过同名属性访问,并可用<a href="https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.components" rel="nofollow"><code>DiffusionPipeline.components</code></a>在流程间共享。</li> <li>所有流程应能通过<a href="https://huggingface.co/docs/diffusers/main/en/api/diffusion_pipeline#diffusers.DiffusionPipeline.from_pretrained" rel="nofollow"><code>DiffusionPipeline.from_pretrained</code></a>加载。</li> <li>流程<strong>仅</strong>用于推理。</li> <li>流程代码应具备高可读性、自解释性和易修改性。</li> <li>流程应设计为可相互构建,便于集成到高层API。</li> <li>流程<strong>非</strong>功能完备的用户界面。完整UI推荐<a href="https://github.com/invoke-ai/InvokeAI" rel="nofollow">InvokeAI</a>、<a href="https://github.com/abhishekkrthakur/diffuzers" rel="nofollow">Diffuzers</a>或<a href="https://github.com/Sanster/lama-cleaner" rel="nofollow">lama-cleaner</a>。</li> <li>每个流程应通过唯一的<code>__call__</code>方法运行,且参数命名应跨流程统一。</li> <li>流程应以其解决的任务命名。</li> <li>几乎所有新diffusion流程都应在新文件夹/文件中实现。</li></ul> <h3 class="relative group"><a id="模型" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#模型"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>模型</span></h3> <p data-svelte-h="svelte-1cgqpug">模型设计为可配置的工具箱,是<a href="https://pytorch.org/docs/stable/generated/torch.nn.Module.html" rel="nofollow">PyTorch Module类</a>的自然延伸,仅部分遵循<strong>单文件政策</strong>。</p> <p data-svelte-h="svelte-hkd8r">遵循原则:</p> <ul data-svelte-h="svelte-18rj72v"><li>模型对应<strong>特定架构类型</strong>。如<code>UNet2DConditionModel</code>类适用于所有需要2D图像输入且受上下文调节的UNet变体。</li> <li>所有模型位于<a href="https://github.com/huggingface/diffusers/tree/main/src/diffusers/models" rel="nofollow"><code>src/diffusers/models</code></a>,每种架构应有独立文件,如<a href="https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_condition.py" rel="nofollow"><code>unets/unet_2d_condition.py</code></a>、<a href="https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/transformers/transformer_2d.py" rel="nofollow"><code>transformers/transformer_2d.py</code></a>等。</li> <li>模型<strong>不</strong>采用单文件政策,应使用小型建模模块如<a href="https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention.py" rel="nofollow"><code>attention.py</code></a>、<a href="https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/resnet.py" rel="nofollow"><code>resnet.py</code></a>、<a href="https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/embeddings.py" rel="nofollow"><code>embeddings.py</code></a>等。<strong>注意</strong>:这与Transformers的建模文件截然不同,表明模型未完全遵循单文件政策。</li> <li>模型意图暴露复杂度(类似PyTorch的<code>Module</code>类),并提供明确错误提示。</li> <li>所有模型继承<code>ModelMixin</code>和<code>ConfigMixin</code>。</li> <li>当不涉及重大代码变更、保持向后兼容性且显著提升内存/计算效率时,可对模型进行性能优化。</li> <li>模型默认应具备最高精度和最低性能设置。</li> <li>若新模型检查点可归类为现有架构,应适配现有架构而非新建文件。仅当架构根本性不同时才创建新文件。</li> <li>模型设计应便于未来扩展。可通过限制公开函数参数、配置参数和”预见”变更实现。例如:优先采用可扩展的<code>string</code>类型参数而非布尔型<code>is_..._type</code>参数。对现有架构的修改应保持最小化。</li> <li>模型设计需在代码可读性与多检查点支持间权衡。多数情况下应适配现有类,但某些例外(如<a href="https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/unets/unet_2d_blocks.py" rel="nofollow">UNet块</a>和<a href="https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py" rel="nofollow">注意力处理器</a>)需新建类以保证长期可读性。</li></ul> <h3 class="relative group"><a id="调度器schedulers" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#调度器schedulers"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>调度器(Schedulers)</span></h3> <p data-svelte-h="svelte-195pti2">调度器负责引导推理去噪过程及定义训练噪声计划。它们设计为独立的可加载配置类,严格遵循<strong>单文件政策</strong>。</p> <p data-svelte-h="svelte-hkd8r">遵循原则:</p> <ul data-svelte-h="svelte-17yyxkz"><li>所有调度器位于<a href="https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers" rel="nofollow"><code>src/diffusers/schedulers</code></a>。</li> <li>调度器<strong>禁止</strong>从大型工具文件导入,必须保持高度自包含。</li> <li>一个调度器Python文件对应一种算法(如论文定义的算法)。</li> <li>若调度器功能相似,可使用<code># Copied from</code>机制。</li> <li>所有调度器继承<code>SchedulerMixin</code>和<code>ConfigMixin</code>。</li> <li>调度器可通过<a href="https://huggingface.co/docs/diffusers/main/en/api/configuration#diffusers.ConfigMixin.from_config" rel="nofollow"><code>ConfigMixin.from_config</code></a>轻松切换(详见<a href="../using-diffusers/schedulers">此处</a>)。</li> <li>每个调度器必须包含<code>set_num_inference_steps</code>和<code>step</code>函数。在每次去噪过程前(即调用<code>step(...)</code>前)必须调用<code>set_num_inference_steps(...)</code>。</li> <li>每个调度器通过<code>timesteps</code>属性暴露需要”循环”的时间步,这是模型将被调用的时间步数组。</li> <li><code>step(...)</code>函数接收模型预测输出和”当前”样本(x_t),返回”前一个”略去噪的样本(x_t-1)。</li> <li>鉴于扩散调度器的复杂性,<code>step</code>函数不暴露全部细节,可视为”黑盒”。</li> <li>几乎所有新调度器都应在新文件中实现。</li></ul> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/diffusers/blob/main/docs/source/zh/conceptual/philosophy.md" target="_blank"><svg class="mr-1" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M31,16l-7,7l-1.41-1.41L28.17,16l-5.58-5.59L24,9l7,7z"></path><path d="M1,16l7-7l1.41,1.41L3.83,16l5.58,5.59L8,23l-7-7z"></path><path d="M12.419,25.484L17.639,6.552l1.932,0.518L14.351,26.002z"></path></svg> <span data-svelte-h="svelte-zjs2n5"><span class="underline">Update</span> on GitHub</span></a> <p></p> | |
| <script> | |
| { | |
| __sveltekit_pa8cr6 = { | |
| assets: "/docs/diffusers/pr_11739/zh", | |
| base: "/docs/diffusers/pr_11739/zh", | |
| env: {} | |
| }; | |
| const element = document.currentScript.parentElement; | |
| const data = [null,null]; | |
| Promise.all([ | |
| import("/docs/diffusers/pr_11739/zh/_app/immutable/entry/start.95a8faef.js"), | |
| import("/docs/diffusers/pr_11739/zh/_app/immutable/entry/app.a988cdaf.js") | |
| ]).then(([kit, app]) => { | |
| kit.start(app, element, { | |
| node_ids: [0, 6], | |
| data, | |
| form: null, | |
| error: null | |
| }); | |
| }); | |
| } | |
| </script> | |
Xet Storage Details
- Size:
- 29 kB
- Xet hash:
- 94b7920b0e186fe32bc57aeb80c18378800b2e0eb0a43a9df41d0e45221ee403
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.