Buckets:
| # AutoencoderKLMagvit | |
| The 3D variational autoencoder (VAE) model with KL loss used in [EasyAnimate](https://github.com/aigc-apps/EasyAnimate) was introduced by Alibaba PAI. | |
| The model can be loaded with the following code snippet. | |
| ```python | |
| from diffusers import AutoencoderKLMagvit | |
| vae = AutoencoderKLMagvit.from_pretrained("alibaba-pai/EasyAnimateV5.1-12b-zh", subfolder="vae", torch_dtype=torch.float16).to("cuda") | |
| ``` | |
| ## AutoencoderKLMagvit[[diffusers.AutoencoderKLMagvit]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.AutoencoderKLMagvit</name><anchor>diffusers.AutoencoderKLMagvit</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/autoencoders/autoencoder_kl_magvit.py#L666</source><parameters>[{"name": "in_channels", "val": ": int = 3"}, {"name": "latent_channels", "val": ": int = 16"}, {"name": "out_channels", "val": ": int = 3"}, {"name": "block_out_channels", "val": ": typing.Tuple[int, ...] = [128, 256, 512, 512]"}, {"name": "down_block_types", "val": ": typing.Tuple[str, ...] = ['SpatialDownBlock3D', 'SpatialTemporalDownBlock3D', 'SpatialTemporalDownBlock3D', 'SpatialTemporalDownBlock3D']"}, {"name": "up_block_types", "val": ": typing.Tuple[str, ...] = ['SpatialUpBlock3D', 'SpatialTemporalUpBlock3D', 'SpatialTemporalUpBlock3D', 'SpatialTemporalUpBlock3D']"}, {"name": "layers_per_block", "val": ": int = 2"}, {"name": "act_fn", "val": ": str = 'silu'"}, {"name": "norm_num_groups", "val": ": int = 32"}, {"name": "scaling_factor", "val": ": float = 0.7125"}, {"name": "spatial_group_norm", "val": ": bool = True"}]</parameters></docstring> | |
| A VAE model with KL loss for encoding images into latents and decoding latent representations into images. This | |
| model is used in [EasyAnimate](https://huggingface.co/papers/2405.18991). | |
| This model inherits from [ModelMixin](/docs/diffusers/pr_12595/en/api/models/overview#diffusers.ModelMixin). Check the superclass documentation for it's generic methods implemented | |
| for all models (such as downloading or saving). | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>wrapper</name><anchor>diffusers.AutoencoderKLMagvit.decode</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/utils/accelerate_utils.py#L43</source><parameters>[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>wrapper</name><anchor>diffusers.AutoencoderKLMagvit.encode</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/utils/accelerate_utils.py#L43</source><parameters>[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>enable_tiling</name><anchor>diffusers.AutoencoderKLMagvit.enable_tiling</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/autoencoders/autoencoder_kl_magvit.py#L772</source><parameters>[{"name": "tile_sample_min_height", "val": ": typing.Optional[int] = None"}, {"name": "tile_sample_min_width", "val": ": typing.Optional[int] = None"}, {"name": "tile_sample_min_num_frames", "val": ": typing.Optional[int] = None"}, {"name": "tile_sample_stride_height", "val": ": typing.Optional[float] = None"}, {"name": "tile_sample_stride_width", "val": ": typing.Optional[float] = None"}, {"name": "tile_sample_stride_num_frames", "val": ": typing.Optional[float] = None"}]</parameters><paramsdesc>- **tile_sample_min_height** (`int`, *optional*) -- | |
| The minimum height required for a sample to be separated into tiles across the height dimension. | |
| - **tile_sample_min_width** (`int`, *optional*) -- | |
| The minimum width required for a sample to be separated into tiles across the width dimension. | |
| - **tile_sample_stride_height** (`int`, *optional*) -- | |
| The minimum amount of overlap between two consecutive vertical tiles. This is to ensure that there are | |
| no tiling artifacts produced across the height dimension. | |
| - **tile_sample_stride_width** (`int`, *optional*) -- | |
| The stride between two consecutive horizontal tiles. This is to ensure that there are no tiling | |
| artifacts produced across the width dimension.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Enable tiled VAE decoding. When this option is enabled, the VAE will split the input tensor into tiles to | |
| compute decoding and encoding in several steps. This is useful for saving a large amount of memory and to allow | |
| processing larger images. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>forward</name><anchor>diffusers.AutoencoderKLMagvit.forward</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/autoencoders/autoencoder_kl_magvit.py#L1047</source><parameters>[{"name": "sample", "val": ": Tensor"}, {"name": "sample_posterior", "val": ": bool = False"}, {"name": "return_dict", "val": ": bool = True"}, {"name": "generator", "val": ": typing.Optional[torch._C.Generator] = None"}]</parameters><paramsdesc>- **sample** (`torch.Tensor`) -- Input sample. | |
| - **sample_posterior** (`bool`, *optional*, defaults to `False`) -- | |
| Whether to sample from the posterior. | |
| - **return_dict** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to return a `DecoderOutput` instead of a plain tuple.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| </div></div> | |
| ## AutoencoderKLOutput[[diffusers.models.modeling_outputs.AutoencoderKLOutput]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.models.modeling_outputs.AutoencoderKLOutput</name><anchor>diffusers.models.modeling_outputs.AutoencoderKLOutput</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/modeling_outputs.py#L7</source><parameters>[{"name": "latent_dist", "val": ": DiagonalGaussianDistribution"}]</parameters><paramsdesc>- **latent_dist** (`DiagonalGaussianDistribution`) -- | |
| Encoded outputs of `Encoder` represented as the mean and logvar of `DiagonalGaussianDistribution`. | |
| `DiagonalGaussianDistribution` allows for sampling latents from the distribution.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Output of AutoencoderKL encoding method. | |
| </div> | |
| ## DecoderOutput[[diffusers.models.autoencoders.vae.DecoderOutput]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.models.autoencoders.vae.DecoderOutput</name><anchor>diffusers.models.autoencoders.vae.DecoderOutput</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12595/src/diffusers/models/autoencoders/vae.py#L47</source><parameters>[{"name": "sample", "val": ": Tensor"}, {"name": "commit_loss", "val": ": typing.Optional[torch.FloatTensor] = None"}]</parameters><paramsdesc>- **sample** (`torch.Tensor` of shape `(batch_size, num_channels, height, width)`) -- | |
| The decoded output sample from the last layer of the model.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Output of decoding method. | |
| </div> | |
| <EditOnGithub source="https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/models/autoencoderkl_magvit.md" /> |
Xet Storage Details
- Size:
- 7.6 kB
- Xet hash:
- f2635c51bb68a732a6eaad278a78dceda602e7a2c0a7febb2015e8b6a0f26367
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.