Buckets:
| # DDIMScheduler | |
| [Denoising Diffusion Implicit Models](https://huggingface.co/papers/2010.02502) (DDIM) by Jiaming Song, Chenlin Meng and Stefano Ermon. | |
| The abstract from the paper is: | |
| *Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. | |
| To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models | |
| with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process. | |
| We construct a class of non-Markovian diffusion processes that lead to the same training objective, but whose reverse process can be much faster to sample from. | |
| We empirically demonstrate that DDIMs can produce high quality samples 10× to 50× faster in terms of wall-clock time compared to DDPMs, allow us to trade off computation for sample quality, and can perform semantically meaningful image interpolation directly in the latent space.* | |
| The original codebase of this paper can be found at [ermongroup/ddim](https://github.com/ermongroup/ddim), and you can contact the author on [tsong.me](https://tsong.me/). | |
| ## Tips | |
| The paper [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) claims that a mismatch between the training and inference settings leads to suboptimal inference generation results for Stable Diffusion. To fix this, the authors propose: | |
| > [!WARNING] | |
| > 🧪 This is an experimental feature! | |
| 1. rescale the noise schedule to enforce zero terminal signal-to-noise ratio (SNR) | |
| ```py | |
| pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config, rescale_betas_zero_snr=True) | |
| ``` | |
| 2. train a model with `v_prediction` (add the following argument to the [train_text_to_image.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image.py) or [train_text_to_image_lora.py](https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py) scripts) | |
| ```bash | |
| --prediction_type="v_prediction" | |
| ``` | |
| 3. change the sampler to always start from the last timestep | |
| ```py | |
| pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing") | |
| ``` | |
| 4. rescale classifier-free guidance to prevent over-exposure | |
| ```py | |
| image = pipe(prompt, guidance_rescale=0.7).images[0] | |
| ``` | |
| For example: | |
| ```py | |
| from diffusers import DiffusionPipeline, DDIMScheduler | |
| import torch | |
| pipe = DiffusionPipeline.from_pretrained("ptx0/pseudo-journey-v2", torch_dtype=torch.float16) | |
| pipe.scheduler = DDIMScheduler.from_config( | |
| pipe.scheduler.config, rescale_betas_zero_snr=True, timestep_spacing="trailing" | |
| ) | |
| pipe.to("cuda") | |
| prompt = "A lion in galaxies, spirals, nebulae, stars, smoke, iridescent, intricate detail, octane render, 8k" | |
| image = pipe(prompt, guidance_rescale=0.7).images[0] | |
| image | |
| ``` | |
| ## DDIMScheduler[[diffusers.DDIMScheduler]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.DDIMScheduler</name><anchor>diffusers.DDIMScheduler</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/schedulers/scheduling_ddim.py#L131</source><parameters>[{"name": "num_train_timesteps", "val": ": int = 1000"}, {"name": "beta_start", "val": ": float = 0.0001"}, {"name": "beta_end", "val": ": float = 0.02"}, {"name": "beta_schedule", "val": ": str = 'linear'"}, {"name": "trained_betas", "val": ": typing.Union[numpy.ndarray, typing.List[float], NoneType] = None"}, {"name": "clip_sample", "val": ": bool = True"}, {"name": "set_alpha_to_one", "val": ": bool = True"}, {"name": "steps_offset", "val": ": int = 0"}, {"name": "prediction_type", "val": ": str = 'epsilon'"}, {"name": "thresholding", "val": ": bool = False"}, {"name": "dynamic_thresholding_ratio", "val": ": float = 0.995"}, {"name": "clip_sample_range", "val": ": float = 1.0"}, {"name": "sample_max_value", "val": ": float = 1.0"}, {"name": "timestep_spacing", "val": ": str = 'leading'"}, {"name": "rescale_betas_zero_snr", "val": ": bool = False"}]</parameters><paramsdesc>- **num_train_timesteps** (`int`, defaults to 1000) -- | |
| The number of diffusion steps to train the model. | |
| - **beta_start** (`float`, defaults to 0.0001) -- | |
| The starting `beta` value of inference. | |
| - **beta_end** (`float`, defaults to 0.02) -- | |
| The final `beta` value. | |
| - **beta_schedule** (`str`, defaults to `"linear"`) -- | |
| The beta schedule, a mapping from a beta range to a sequence of betas for stepping the model. Choose from | |
| `linear`, `scaled_linear`, or `squaredcos_cap_v2`. | |
| - **trained_betas** (`np.ndarray`, *optional*) -- | |
| Pass an array of betas directly to the constructor to bypass `beta_start` and `beta_end`. | |
| - **clip_sample** (`bool`, defaults to `True`) -- | |
| Clip the predicted sample for numerical stability. | |
| - **clip_sample_range** (`float`, defaults to 1.0) -- | |
| The maximum magnitude for sample clipping. Valid only when `clip_sample=True`. | |
| - **set_alpha_to_one** (`bool`, defaults to `True`) -- | |
| Each diffusion step uses the alphas product value at that step and at the previous one. For the final step | |
| there is no previous alpha. When this option is `True` the previous alpha product is fixed to `1`, | |
| otherwise it uses the alpha value at step 0. | |
| - **steps_offset** (`int`, defaults to 0) -- | |
| An offset added to the inference steps, as required by some model families. | |
| - **prediction_type** (`str`, defaults to `epsilon`, *optional*) -- | |
| Prediction type of the scheduler function; can be `epsilon` (predicts the noise of the diffusion process), | |
| `sample` (directly predicts the noisy sample`) or `v_prediction` (see section 2.4 of [Imagen | |
| Video](https://imagen.research.google/video/paper.pdf) paper). | |
| - **thresholding** (`bool`, defaults to `False`) -- | |
| Whether to use the "dynamic thresholding" method. This is unsuitable for latent-space diffusion models such | |
| as Stable Diffusion. | |
| - **dynamic_thresholding_ratio** (`float`, defaults to 0.995) -- | |
| The ratio for the dynamic thresholding method. Valid only when `thresholding=True`. | |
| - **sample_max_value** (`float`, defaults to 1.0) -- | |
| The threshold value for dynamic thresholding. Valid only when `thresholding=True`. | |
| - **timestep_spacing** (`str`, defaults to `"leading"`) -- | |
| The way the timesteps should be scaled. Refer to Table 2 of the [Common Diffusion Noise Schedules and | |
| Sample Steps are Flawed](https://huggingface.co/papers/2305.08891) for more information. | |
| - **rescale_betas_zero_snr** (`bool`, defaults to `False`) -- | |
| Whether to rescale the betas to have zero terminal SNR. This enables the model to generate very bright and | |
| dark samples instead of limiting it to samples with medium brightness. Loosely related to | |
| [`--offset_noise`](https://github.com/huggingface/diffusers/blob/74fd735eb073eb1d774b1ab4154a0876eb82f055/examples/dreambooth/train_dreambooth.py#L506).</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| `DDIMScheduler` extends the denoising procedure introduced in denoising diffusion probabilistic models (DDPMs) with | |
| non-Markovian guidance. | |
| This model inherits from [SchedulerMixin](/docs/diffusers/pr_12229/en/api/schedulers/overview#diffusers.SchedulerMixin) and [ConfigMixin](/docs/diffusers/pr_12229/en/api/configuration#diffusers.ConfigMixin). Check the superclass documentation for the generic | |
| methods the library implements for all schedulers such as loading and saving. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>scale_model_input</name><anchor>diffusers.DDIMScheduler.scale_model_input</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/schedulers/scheduling_ddim.py#L236</source><parameters>[{"name": "sample", "val": ": Tensor"}, {"name": "timestep", "val": ": typing.Optional[int] = None"}]</parameters><paramsdesc>- **sample** (`torch.Tensor`) -- | |
| The input sample. | |
| - **timestep** (`int`, *optional*) -- | |
| The current timestep in the diffusion chain.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor`</rettype><retdesc>A scaled input sample.</retdesc></docstring> | |
| Ensures interchangeability with schedulers that need to scale the denoising model input depending on the | |
| current timestep. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>set_timesteps</name><anchor>diffusers.DDIMScheduler.set_timesteps</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/schedulers/scheduling_ddim.py#L297</source><parameters>[{"name": "num_inference_steps", "val": ": int"}, {"name": "device", "val": ": typing.Union[str, torch.device] = None"}]</parameters><paramsdesc>- **num_inference_steps** (`int`) -- | |
| The number of diffusion steps used when generating samples with a pre-trained model.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Sets the discrete timesteps used for the diffusion chain (to be run before inference). | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>step</name><anchor>diffusers.DDIMScheduler.step</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/schedulers/scheduling_ddim.py#L342</source><parameters>[{"name": "model_output", "val": ": Tensor"}, {"name": "timestep", "val": ": int"}, {"name": "sample", "val": ": Tensor"}, {"name": "eta", "val": ": float = 0.0"}, {"name": "use_clipped_model_output", "val": ": bool = False"}, {"name": "generator", "val": " = None"}, {"name": "variance_noise", "val": ": typing.Optional[torch.Tensor] = None"}, {"name": "return_dict", "val": ": bool = True"}]</parameters><paramsdesc>- **model_output** (`torch.Tensor`) -- | |
| The direct output from learned diffusion model. | |
| - **timestep** (`float`) -- | |
| The current discrete timestep in the diffusion chain. | |
| - **sample** (`torch.Tensor`) -- | |
| A current instance of a sample created by the diffusion process. | |
| - **eta** (`float`) -- | |
| The weight of noise for added noise in diffusion step. | |
| - **use_clipped_model_output** (`bool`, defaults to `False`) -- | |
| If `True`, computes "corrected" `model_output` from the clipped predicted original sample. Necessary | |
| because predicted original sample is clipped to [-1, 1] when `self.config.clip_sample` is `True`. If no | |
| clipping has happened, "corrected" `model_output` would coincide with the one provided as input and | |
| `use_clipped_model_output` has no effect. | |
| - **generator** (`torch.Generator`, *optional*) -- | |
| A random number generator. | |
| - **variance_noise** (`torch.Tensor`) -- | |
| Alternative to generating noise with `generator` by directly providing the noise for the variance | |
| itself. Useful for methods such as `CycleDiffusion`. | |
| - **return_dict** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to return a [DDIMSchedulerOutput](/docs/diffusers/pr_12229/en/api/schedulers/ddim#diffusers.schedulers.scheduling_ddim.DDIMSchedulerOutput) or `tuple`.</paramsdesc><paramgroups>0</paramgroups><rettype>[DDIMSchedulerOutput](/docs/diffusers/pr_12229/en/api/schedulers/ddim#diffusers.schedulers.scheduling_ddim.DDIMSchedulerOutput) or `tuple`</rettype><retdesc>If return_dict is `True`, [DDIMSchedulerOutput](/docs/diffusers/pr_12229/en/api/schedulers/ddim#diffusers.schedulers.scheduling_ddim.DDIMSchedulerOutput) is returned, otherwise a | |
| tuple is returned where the first element is the sample tensor.</retdesc></docstring> | |
| Predict the sample from the previous timestep by reversing the SDE. This function propagates the diffusion | |
| process from the learned model outputs (most often the predicted noise). | |
| </div></div> | |
| ## DDIMSchedulerOutput[[diffusers.schedulers.scheduling_ddim.DDIMSchedulerOutput]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.schedulers.scheduling_ddim.DDIMSchedulerOutput</name><anchor>diffusers.schedulers.scheduling_ddim.DDIMSchedulerOutput</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/schedulers/scheduling_ddim.py#L33</source><parameters>[{"name": "prev_sample", "val": ": Tensor"}, {"name": "pred_original_sample", "val": ": typing.Optional[torch.Tensor] = None"}]</parameters><paramsdesc>- **prev_sample** (`torch.Tensor` of shape `(batch_size, num_channels, height, width)` for images) -- | |
| Computed sample `(x_{t-1})` of previous timestep. `prev_sample` should be used as next model input in the | |
| denoising loop. | |
| - **pred_original_sample** (`torch.Tensor` of shape `(batch_size, num_channels, height, width)` for images) -- | |
| The predicted denoised sample `(x_{0})` based on the model output from the current timestep. | |
| `pred_original_sample` can be used to preview progress or for guidance.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Output class for the scheduler's `step` function output. | |
| </div> | |
| <EditOnGithub source="https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/schedulers/ddim.md" /> |
Xet Storage Details
- Size:
- 13.2 kB
- Xet hash:
- 5ca0f4ca88a3b51618b73efc492c77517f4ab6c9cef006b74ba86e89a8fbc5ca
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.