Buckets:
| # VQDiffusionScheduler | |
| `VQDiffusionScheduler` converts the transformer model's output into a sample for the unnoised image at the previous diffusion timestep. It was introduced in [Vector Quantized Diffusion Model for Text-to-Image Synthesis](https://huggingface.co/papers/2111.14822) by Shuyang Gu, Dong Chen, Jianmin Bao, Fang Wen, Bo Zhang, Dongdong Chen, Lu Yuan, Baining Guo. | |
| The abstract from the paper is: | |
| *We present the vector quantized diffusion (VQ-Diffusion) model for text-to-image generation. This method is based on a vector quantized variational autoencoder (VQ-VAE) whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). We find that this latent-space method is well-suited for text-to-image generation tasks because it not only eliminates the unidirectional bias with existing methods but also allows us to incorporate a mask-and-replace diffusion strategy to avoid the accumulation of errors, which is a serious problem with existing methods. Our experiments show that the VQ-Diffusion produces significantly better text-to-image generation results when compared with conventional autoregressive (AR) models with similar numbers of parameters. Compared with previous GAN-based text-to-image methods, our VQ-Diffusion can handle more complex scenes and improve the synthesized image quality by a large margin. Finally, we show that the image generation computation in our method can be made highly efficient by reparameterization. With traditional AR methods, the text-to-image generation time increases linearly with the output image resolution and hence is quite time consuming even for normal size images. The VQ-Diffusion allows us to achieve a better trade-off between quality and speed. Our experiments indicate that the VQ-Diffusion model with the reparameterization is fifteen times faster than traditional AR methods while achieving a better image quality.* | |
| ## VQDiffusionScheduler[[diffusers.VQDiffusionScheduler]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.VQDiffusionScheduler</name><anchor>diffusers.VQDiffusionScheduler</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/schedulers/scheduling_vq_diffusion.py#L106</source><parameters>[{"name": "num_vec_classes", "val": ": int"}, {"name": "num_train_timesteps", "val": ": int = 100"}, {"name": "alpha_cum_start", "val": ": float = 0.99999"}, {"name": "alpha_cum_end", "val": ": float = 9e-06"}, {"name": "gamma_cum_start", "val": ": float = 9e-06"}, {"name": "gamma_cum_end", "val": ": float = 0.99999"}]</parameters><paramsdesc>- **num_vec_classes** (`int`) -- | |
| The number of classes of the vector embeddings of the latent pixels. Includes the class for the masked | |
| latent pixel. | |
| - **num_train_timesteps** (`int`, defaults to 100) -- | |
| The number of diffusion steps to train the model. | |
| - **alpha_cum_start** (`float`, defaults to 0.99999) -- | |
| The starting cumulative alpha value. | |
| - **alpha_cum_end** (`float`, defaults to 0.00009) -- | |
| The ending cumulative alpha value. | |
| - **gamma_cum_start** (`float`, defaults to 0.00009) -- | |
| The starting cumulative gamma value. | |
| - **gamma_cum_end** (`float`, defaults to 0.99999) -- | |
| The ending cumulative gamma value.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| A scheduler for vector quantized diffusion. | |
| This model inherits from [SchedulerMixin](/docs/diffusers/pr_12595/en/api/schedulers/overview#diffusers.SchedulerMixin) and [ConfigMixin](/docs/diffusers/pr_12595/en/api/configuration#diffusers.ConfigMixin). Check the superclass documentation for the generic | |
| methods the library implements for all schedulers such as loading and saving. | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>log_Q_t_transitioning_to_known_class</name><anchor>diffusers.VQDiffusionScheduler.log_Q_t_transitioning_to_known_class</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/schedulers/scheduling_vq_diffusion.py#L356</source><parameters>[{"name": "t", "val": ": torch.int32"}, {"name": "x_t", "val": ": LongTensor"}, {"name": "log_onehot_x_t", "val": ": Tensor"}, {"name": "cumulative", "val": ": bool"}]</parameters><paramsdesc>- **t** (`torch.Long`) -- | |
| The timestep that determines which transition matrix is used. | |
| - **x_t** (`torch.LongTensor` of shape `(batch size, num latent pixels)`) -- | |
| The classes of each latent pixel at time `t`. | |
| - **log_onehot_x_t** (`torch.Tensor` of shape `(batch size, num classes, num latent pixels)`) -- | |
| The log one-hot vectors of `x_t`. | |
| - **cumulative** (`bool`) -- | |
| If cumulative is `False`, the single step transition matrix `t-1`->`t` is used. If cumulative is | |
| `True`, the cumulative transition matrix `0`->`t` is used.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor` of shape `(batch size, num classes - 1, num latent pixels)`</rettype><retdesc>Each _column_ of the returned matrix is a _row_ of log probabilities of the complete probability | |
| transition matrix. | |
| When non cumulative, returns `self.num_classes - 1` rows because the initial latent pixel cannot be | |
| masked. | |
| Where: | |
| - `q_n` is the probability distribution for the forward process of the `n`th latent pixel. | |
| - C_0 is a class of a latent pixel embedding | |
| - C_k is the class of the masked latent pixel | |
| non-cumulative result (omitting logarithms): | |
| ``` | |
| q_0(x_t | x_{t-1} = C_0) ... q_n(x_t | x_{t-1} = C_0) | |
| . . . | |
| . . . | |
| . . . | |
| q_0(x_t | x_{t-1} = C_k) ... q_n(x_t | x_{t-1} = C_k) | |
| ``` | |
| cumulative result (omitting logarithms): | |
| ``` | |
| q_0_cumulative(x_t | x_0 = C_0) ... q_n_cumulative(x_t | x_0 = C_0) | |
| . . . | |
| . . . | |
| . . . | |
| q_0_cumulative(x_t | x_0 = C_{k-1}) ... q_n_cumulative(x_t | x_0 = C_{k-1}) | |
| ```</retdesc></docstring> | |
| Calculates the log probabilities of the rows from the (cumulative or non-cumulative) transition matrix for each | |
| latent pixel in `x_t`. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>q_posterior</name><anchor>diffusers.VQDiffusionScheduler.q_posterior</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/schedulers/scheduling_vq_diffusion.py#L245</source><parameters>[{"name": "log_p_x_0", "val": ""}, {"name": "x_t", "val": ""}, {"name": "t", "val": ""}]</parameters><paramsdesc>- **log_p_x_0** (`torch.Tensor` of shape `(batch size, num classes - 1, num latent pixels)`) -- | |
| The log probabilities for the predicted classes of the initial latent pixels. Does not include a | |
| prediction for the masked class as the initial unnoised image cannot be masked. | |
| - **x_t** (`torch.LongTensor` of shape `(batch size, num latent pixels)`) -- | |
| The classes of each latent pixel at time `t`. | |
| - **t** (`torch.Long`) -- | |
| The timestep that determines which transition matrix is used.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor` of shape `(batch size, num classes, num latent pixels)`</rettype><retdesc>The log probabilities for the predicted classes of the image at timestep `t-1`.</retdesc></docstring> | |
| <ExampleCodeBlock anchor="diffusers.VQDiffusionScheduler.q_posterior.example"> | |
| Calculates the log probabilities for the predicted classes of the image at timestep `t-1`: | |
| ``` | |
| p(x_{t-1} | x_t) = sum( q(x_t | x_{t-1}) * q(x_{t-1} | x_0) * p(x_0) / q(x_t | x_0) ) | |
| ``` | |
| </ExampleCodeBlock> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>set_timesteps</name><anchor>diffusers.VQDiffusionScheduler.set_timesteps</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/schedulers/scheduling_vq_diffusion.py#L178</source><parameters>[{"name": "num_inference_steps", "val": ": int"}, {"name": "device", "val": ": typing.Union[str, torch.device] = None"}]</parameters><paramsdesc>- **num_inference_steps** (`int`) -- | |
| The number of diffusion steps used when generating samples with a pre-trained model. | |
| - **device** (`str` or `torch.device`, *optional*) -- | |
| The device to which the timesteps and diffusion process parameters (alpha, beta, gamma) should be moved | |
| to.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Sets the discrete timesteps used for the diffusion chain (to be run before inference). | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>step</name><anchor>diffusers.VQDiffusionScheduler.step</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/schedulers/scheduling_vq_diffusion.py#L200</source><parameters>[{"name": "model_output", "val": ": Tensor"}, {"name": "timestep", "val": ": torch.int64"}, {"name": "sample", "val": ": LongTensor"}, {"name": "generator", "val": ": typing.Optional[torch._C.Generator] = None"}, {"name": "return_dict", "val": ": bool = True"}]</parameters><paramsdesc>- **log_p_x_0** -- (`torch.Tensor` of shape `(batch size, num classes - 1, num latent pixels)`): | |
| The log probabilities for the predicted classes of the initial latent pixels. Does not include a | |
| prediction for the masked class as the initial unnoised image cannot be masked. | |
| - **t** (`torch.long`) -- | |
| The timestep that determines which transition matrices are used. | |
| - **x_t** (`torch.LongTensor` of shape `(batch size, num latent pixels)`) -- | |
| The classes of each latent pixel at time `t`. | |
| - **generator** (`torch.Generator`, or `None`) -- | |
| A random number generator for the noise applied to `p(x_{t-1} | x_t)` before it is sampled from. | |
| - **return_dict** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to return a [VQDiffusionSchedulerOutput](/docs/diffusers/pr_12595/en/api/schedulers/vq_diffusion#diffusers.schedulers.scheduling_vq_diffusion.VQDiffusionSchedulerOutput) or | |
| `tuple`.</paramsdesc><paramgroups>0</paramgroups><rettype>[VQDiffusionSchedulerOutput](/docs/diffusers/pr_12595/en/api/schedulers/vq_diffusion#diffusers.schedulers.scheduling_vq_diffusion.VQDiffusionSchedulerOutput) or `tuple`</rettype><retdesc>If return_dict is `True`, [VQDiffusionSchedulerOutput](/docs/diffusers/pr_12595/en/api/schedulers/vq_diffusion#diffusers.schedulers.scheduling_vq_diffusion.VQDiffusionSchedulerOutput) is | |
| returned, otherwise a tuple is returned where the first element is the sample tensor.</retdesc></docstring> | |
| Predict the sample from the previous timestep by the reverse transition distribution. See | |
| [q_posterior()](/docs/diffusers/pr_12595/en/api/schedulers/vq_diffusion#diffusers.VQDiffusionScheduler.q_posterior) for more details about how the distribution is computer. | |
| </div></div> | |
| ## VQDiffusionSchedulerOutput[[diffusers.schedulers.scheduling_vq_diffusion.VQDiffusionSchedulerOutput]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.schedulers.scheduling_vq_diffusion.VQDiffusionSchedulerOutput</name><anchor>diffusers.schedulers.scheduling_vq_diffusion.VQDiffusionSchedulerOutput</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/schedulers/scheduling_vq_diffusion.py#L28</source><parameters>[{"name": "prev_sample", "val": ": LongTensor"}]</parameters><paramsdesc>- **prev_sample** (`torch.LongTensor` of shape `(batch size, num latent pixels)`) -- | |
| Computed sample x_{t-1} of previous timestep. `prev_sample` should be used as next model input in the | |
| denoising loop.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Output class for the scheduler's step function output. | |
| </div> | |
| <EditOnGithub source="https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/schedulers/vq_diffusion.md" /> |
Xet Storage Details
- Size:
- 12.1 kB
- Xet hash:
- 2c7f573a408dd6446598774554721b912b1afaeac06c61f8ca759ce51a20d651
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.