Buckets:
| # AutoencoderKLAllegro | |
| The 3D variational autoencoder (VAE) model with KL loss used in [Allegro](https://github.com/rhymes-ai/Allegro) was introduced in [Allegro: Open the Black Box of Commercial-Level Video Generation Model](https://huggingface.co/papers/2410.15458) by RhymesAI. | |
| The model can be loaded with the following code snippet. | |
| ```python | |
| from diffusers import AutoencoderKLAllegro | |
| vae = AutoencoderKLAllegro.from_pretrained("rhymes-ai/Allegro", subfolder="vae", torch_dtype=torch.float32).to("cuda") | |
| ``` | |
| ## AutoencoderKLAllegro[[diffusers.AutoencoderKLAllegro]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.AutoencoderKLAllegro</name><anchor>diffusers.AutoencoderKLAllegro</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/autoencoders/autoencoder_kl_allegro.py#L677</source><parameters>[{"name": "in_channels", "val": ": int = 3"}, {"name": "out_channels", "val": ": int = 3"}, {"name": "down_block_types", "val": ": typing.Tuple[str, ...] = ('AllegroDownBlock3D', 'AllegroDownBlock3D', 'AllegroDownBlock3D', 'AllegroDownBlock3D')"}, {"name": "up_block_types", "val": ": typing.Tuple[str, ...] = ('AllegroUpBlock3D', 'AllegroUpBlock3D', 'AllegroUpBlock3D', 'AllegroUpBlock3D')"}, {"name": "block_out_channels", "val": ": typing.Tuple[int, ...] = (128, 256, 512, 512)"}, {"name": "temporal_downsample_blocks", "val": ": typing.Tuple[bool, ...] = (True, True, False, False)"}, {"name": "temporal_upsample_blocks", "val": ": typing.Tuple[bool, ...] = (False, True, True, False)"}, {"name": "latent_channels", "val": ": int = 4"}, {"name": "layers_per_block", "val": ": int = 2"}, {"name": "act_fn", "val": ": str = 'silu'"}, {"name": "norm_num_groups", "val": ": int = 32"}, {"name": "temporal_compression_ratio", "val": ": float = 4"}, {"name": "sample_size", "val": ": int = 320"}, {"name": "scaling_factor", "val": ": float = 0.13"}, {"name": "force_upcast", "val": ": bool = True"}]</parameters><paramsdesc>- **in_channels** (int, defaults to `3`) -- | |
| Number of channels in the input image. | |
| - **out_channels** (int, defaults to `3`) -- | |
| Number of channels in the output. | |
| - **down_block_types** (`Tuple[str, ...]`, defaults to `("AllegroDownBlock3D", "AllegroDownBlock3D", "AllegroDownBlock3D", "AllegroDownBlock3D")`) -- | |
| Tuple of strings denoting which types of down blocks to use. | |
| - **up_block_types** (`Tuple[str, ...]`, defaults to `("AllegroUpBlock3D", "AllegroUpBlock3D", "AllegroUpBlock3D", "AllegroUpBlock3D")`) -- | |
| Tuple of strings denoting which types of up blocks to use. | |
| - **block_out_channels** (`Tuple[int, ...]`, defaults to `(128, 256, 512, 512)`) -- | |
| Tuple of integers denoting number of output channels in each block. | |
| - **temporal_downsample_blocks** (`Tuple[bool, ...]`, defaults to `(True, True, False, False)`) -- | |
| Tuple of booleans denoting which blocks to enable temporal downsampling in. | |
| - **latent_channels** (`int`, defaults to `4`) -- | |
| Number of channels in latents. | |
| - **layers_per_block** (`int`, defaults to `2`) -- | |
| Number of resnet or attention or temporal convolution layers per down/up block. | |
| - **act_fn** (`str`, defaults to `"silu"`) -- | |
| The activation function to use. | |
| - **norm_num_groups** (`int`, defaults to `32`) -- | |
| Number of groups to use in normalization layers. | |
| - **temporal_compression_ratio** (`int`, defaults to `4`) -- | |
| Ratio by which temporal dimension of samples are compressed. | |
| - **sample_size** (`int`, defaults to `320`) -- | |
| Default latent size. | |
| - **scaling_factor** (`float`, defaults to `0.13235`) -- | |
| The component-wise standard deviation of the trained latent space computed using the first batch of the | |
| training set. This is used to scale the latent space to have unit variance when training the diffusion | |
| model. The latents are scaled with the formula `z = z * scaling_factor` before being passed to the | |
| diffusion model. When decoding, the latents are scaled back to the original scale with the formula: `z = 1 | |
| / scaling_factor * z`. For more details, refer to sections 4.3.2 and D.1 of the [High-Resolution Image | |
| Synthesis with Latent Diffusion Models](https://huggingface.co/papers/2112.10752) paper. | |
| - **force_upcast** (`bool`, default to `True`) -- | |
| If enabled it will force the VAE to run in float32 for high image resolution pipelines, such as SD-XL. VAE | |
| can be fine-tuned / trained to a lower range without losing too much precision in which case `force_upcast` | |
| can be set to `False` - see: https://huggingface.co/madebyollin/sdxl-vae-fp16-fix</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| A VAE model with KL loss for encoding videos into latents and decoding latent representations into videos. Used in | |
| [Allegro](https://github.com/rhymes-ai/Allegro). | |
| This model inherits from [ModelMixin](/docs/diffusers/pr_12595/en/api/models/overview#diffusers.ModelMixin). Check the superclass documentation for it's generic methods implemented | |
| for all models (such as downloading or saving). | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>wrapper</name><anchor>diffusers.AutoencoderKLAllegro.decode</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/utils/accelerate_utils.py#L43</source><parameters>[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>wrapper</name><anchor>diffusers.AutoencoderKLAllegro.encode</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/utils/accelerate_utils.py#L43</source><parameters>[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>diffusers.AutoencoderKLAllegro.forward</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/autoencoders/autoencoder_kl_allegro.py#L1042</source><parameters>[{"name": "sample", "val": ": Tensor"}, {"name": "sample_posterior", "val": ": bool = False"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "generator", "val": ": typing.Optional[torch._C.Generator] = None"}]</parameters><paramsdesc>- **sample** (`torch.Tensor`) -- Input sample. | |
| - **sample_posterior** (`bool`, *optional*, defaults to `False`) -- | |
| Whether to sample from the posterior. | |
| - **return_dict** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to return a `DecoderOutput` instead of a plain tuple. | |
| - **generator** (`torch.Generator`, *optional*) -- | |
| PyTorch random number generator.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| </div></div> | |
| ## AutoencoderKLOutput[[diffusers.models.modeling_outputs.AutoencoderKLOutput]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.models.modeling_outputs.AutoencoderKLOutput</name><anchor>diffusers.models.modeling_outputs.AutoencoderKLOutput</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/modeling_outputs.py#L7</source><parameters>[{"name": "latent_dist", "val": ": DiagonalGaussianDistribution"}]</parameters><paramsdesc>- **latent_dist** (`DiagonalGaussianDistribution`) -- | |
| Encoded outputs of `Encoder` represented as the mean and logvar of `DiagonalGaussianDistribution`. | |
| `DiagonalGaussianDistribution` allows for sampling latents from the distribution.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Output of AutoencoderKL encoding method. | |
| </div> | |
| ## DecoderOutput[[diffusers.models.autoencoders.vae.DecoderOutput]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.models.autoencoders.vae.DecoderOutput</name><anchor>diffusers.models.autoencoders.vae.DecoderOutput</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/autoencoders/vae.py#L47</source><parameters>[{"name": "sample", "val": ": Tensor"}, {"name": "commit_loss", "val": ": typing.Optional[torch.FloatTensor] = None"}]</parameters><paramsdesc>- **sample** (`torch.Tensor` of shape `(batch_size, num_channels, height, width)`) -- | |
| The decoded output sample from the last layer of the model.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Output of decoding method. | |
| </div> | |
| <EditOnGithub source="https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/models/autoencoderkl_allegro.md" /> |
Xet Storage Details
- Size:
- 8.73 kB
- Xet hash:
- f7b9b00a9f22c3b692e920d1812014dfe23096ffbb38bb5dec67c8b9e6ef0bb1
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.