Buckets:
| # AutoencoderKLMochi | |
| The 3D variational autoencoder (VAE) model with KL loss used in [Mochi](https://github.com/genmoai/models) was introduced in [Mochi 1 Preview](https://huggingface.co/genmo/mochi-1-preview) by Tsinghua University & ZhipuAI. | |
| The model can be loaded with the following code snippet. | |
| ```python | |
| from diffusers import AutoencoderKLMochi | |
| vae = AutoencoderKLMochi.from_pretrained("genmo/mochi-1-preview", subfolder="vae", torch_dtype=torch.float32).to("cuda") | |
| ``` | |
| ## AutoencoderKLMochi[[diffusers.AutoencoderKLMochi]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.AutoencoderKLMochi</name><anchor>diffusers.AutoencoderKLMochi</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/models/autoencoders/autoencoder_kl_mochi.py#L660</source><parameters>[{"name": "in_channels", "val": ": int = 15"}, {"name": "out_channels", "val": ": int = 3"}, {"name": "encoder_block_out_channels", "val": ": typing.Tuple[int] = (64, 128, 256, 384)"}, {"name": "decoder_block_out_channels", "val": ": typing.Tuple[int] = (128, 256, 512, 768)"}, {"name": "latent_channels", "val": ": int = 12"}, {"name": "layers_per_block", "val": ": typing.Tuple[int, ...] = (3, 3, 4, 6, 3)"}, {"name": "act_fn", "val": ": str = 'silu'"}, {"name": "temporal_expansions", "val": ": typing.Tuple[int, ...] = (1, 2, 3)"}, {"name": "spatial_expansions", "val": ": typing.Tuple[int, ...] = (2, 2, 2)"}, {"name": "add_attention_block", "val": ": typing.Tuple[bool, ...] = (False, True, True, True, True)"}, {"name": "latents_mean", "val": ": typing.Tuple[float, ...] = (-0.06730895953510081, -0.038011381506090416, -0.07477820912866141, -0.05565264470995561, 0.012767231469026969, -0.04703542746246419, 0.043896967884726704, -0.09346305707025976, -0.09918314763016893, -0.008729793427399178, -0.011931556316503654, -0.0321993391887285)"}, {"name": "latents_std", "val": ": typing.Tuple[float, ...] = (0.9263795028493863, 0.9248894543193766, 0.9393059390890617, 0.959253732819592, 0.8244560132752793, 0.917259975397747, 0.9294154431013696, 1.3720942357788521, 0.881393668867029, 0.9168315692124348, 0.9185249279345552, 0.9274757570805041)"}, {"name": "scaling_factor", "val": ": float = 1.0"}]</parameters><paramsdesc>- **in_channels** (int, *optional*, defaults to 3) -- Number of channels in the input image. | |
| - **out_channels** (int, *optional*, defaults to 3) -- Number of channels in the output. | |
| - **block_out_channels** (`Tuple[int]`, *optional*, defaults to `(64,)`) -- | |
| Tuple of block output channels. | |
| - **act_fn** (`str`, *optional*, defaults to `"silu"`) -- The activation function to use. | |
| - **scaling_factor** (`float`, *optional*, defaults to `1.15258426`) -- | |
| The component-wise standard deviation of the trained latent space computed using the first batch of the | |
| training set. This is used to scale the latent space to have unit variance when training the diffusion | |
| model. The latents are scaled with the formula `z = z * scaling_factor` before being passed to the | |
| diffusion model. When decoding, the latents are scaled back to the original scale with the formula: `z = 1 | |
| / scaling_factor * z`. For more details, refer to sections 4.3.2 and D.1 of the [High-Resolution Image | |
| Synthesis with Latent Diffusion Models](https://huggingface.co/papers/2112.10752) paper.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| A VAE model with KL loss for encoding images into latents and decoding latent representations into images. Used in | |
| [Mochi 1 preview](https://github.com/genmoai/models). | |
| This model inherits from [ModelMixin](/docs/diffusers/pr_12229/en/api/models/overview#diffusers.ModelMixin). Check the superclass documentation for it's generic methods implemented | |
| for all models (such as downloading or saving). | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>wrapper</name><anchor>diffusers.AutoencoderKLMochi.decode</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/utils/accelerate_utils.py#L43</source><parameters>[{"name": "*args", "val": ""}, {"name": "**kwargs", "val": ""}]</parameters></docstring> | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>disable_slicing</name><anchor>diffusers.AutoencoderKLMochi.disable_slicing</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/models/autoencoders/autoencoder_kl_mochi.py#L835</source><parameters>[]</parameters></docstring> | |
| Disable sliced VAE decoding. If `enable_slicing` was previously enabled, this method will go back to computing | |
| decoding in one step. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>disable_tiling</name><anchor>diffusers.AutoencoderKLMochi.disable_tiling</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/models/autoencoders/autoencoder_kl_mochi.py#L821</source><parameters>[]</parameters></docstring> | |
| Disable tiled VAE decoding. If `enable_tiling` was previously enabled, this method will go back to computing | |
| decoding in one step. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>enable_slicing</name><anchor>diffusers.AutoencoderKLMochi.enable_slicing</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/models/autoencoders/autoencoder_kl_mochi.py#L828</source><parameters>[]</parameters></docstring> | |
| Enable sliced VAE decoding. When this option is enabled, the VAE will split the input tensor in slices to | |
| compute decoding in several steps. This is useful to save some memory and allow larger batch sizes. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>enable_tiling</name><anchor>diffusers.AutoencoderKLMochi.enable_tiling</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/models/autoencoders/autoencoder_kl_mochi.py#L791</source><parameters>[{"name": "tile_sample_min_height", "val": ": typing.Optional[int] = None"}, {"name": "tile_sample_min_width", "val": ": typing.Optional[int] = None"}, {"name": "tile_sample_stride_height", "val": ": typing.Optional[float] = None"}, {"name": "tile_sample_stride_width", "val": ": typing.Optional[float] = None"}]</parameters><paramsdesc>- **tile_sample_min_height** (`int`, *optional*) -- | |
| The minimum height required for a sample to be separated into tiles across the height dimension. | |
| - **tile_sample_min_width** (`int`, *optional*) -- | |
| The minimum width required for a sample to be separated into tiles across the width dimension. | |
| - **tile_sample_stride_height** (`int`, *optional*) -- | |
| The minimum amount of overlap between two consecutive vertical tiles. This is to ensure that there are | |
| no tiling artifacts produced across the height dimension. | |
| - **tile_sample_stride_width** (`int`, *optional*) -- | |
| The stride between two consecutive horizontal tiles. This is to ensure that there are no tiling | |
| artifacts produced across the width dimension.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Enable tiled VAE decoding. When this option is enabled, the VAE will split the input tensor into tiles to | |
| compute decoding and encoding in several steps. This is useful for saving a large amount of memory and to allow | |
| processing larger images. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>tiled_decode</name><anchor>diffusers.AutoencoderKLMochi.tiled_decode</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/models/autoencoders/autoencoder_kl_mochi.py#L1037</source><parameters>[{"name": "z", "val": ": Tensor"}, {"name": "return_dict", "val": ": bool = True"}]</parameters><paramsdesc>- **z** (`torch.Tensor`) -- Input batch of latent vectors. | |
| - **return_dict** (`bool`, *optional*, defaults to `True`) -- | |
| Whether or not to return a `~models.vae.DecoderOutput` instead of a plain tuple.</paramsdesc><paramgroups>0</paramgroups><rettype>`~models.vae.DecoderOutput` or `tuple`</rettype><retdesc>If return_dict is True, a `~models.vae.DecoderOutput` is returned, otherwise a plain `tuple` is | |
| returned.</retdesc></docstring> | |
| Decode a batch of images using a tiled decoder. | |
| </div> | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>tiled_encode</name><anchor>diffusers.AutoencoderKLMochi.tiled_encode</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/models/autoencoders/autoencoder_kl_mochi.py#L980</source><parameters>[{"name": "x", "val": ": Tensor"}]</parameters><paramsdesc>- **x** (`torch.Tensor`) -- Input batch of videos.</paramsdesc><paramgroups>0</paramgroups><rettype>`torch.Tensor`</rettype><retdesc>The latent representation of the encoded videos.</retdesc></docstring> | |
| Encode a batch of images using a tiled encoder. | |
| </div></div> | |
| ## DecoderOutput[[diffusers.models.autoencoders.vae.DecoderOutput]] | |
| <div class="docstring border-l-2 border-t-2 pl-4 pt-3.5 border-gray-100 rounded-tl-xl mb-6 mt-8"> | |
| <docstring><name>class diffusers.models.autoencoders.vae.DecoderOutput</name><anchor>diffusers.models.autoencoders.vae.DecoderOutput</anchor><source>https://github.com/huggingface/diffusers/blob/vr_12229/src/diffusers/models/autoencoders/vae.py#L47</source><parameters>[{"name": "sample", "val": ": Tensor"}, {"name": "commit_loss", "val": ": typing.Optional[torch.FloatTensor] = None"}]</parameters><paramsdesc>- **sample** (`torch.Tensor` of shape `(batch_size, num_channels, height, width)`) -- | |
| The decoded output sample from the last layer of the model.</paramsdesc><paramgroups>0</paramgroups></docstring> | |
| Output of decoding method. | |
| </div> | |
| <EditOnGithub source="https://github.com/huggingface/diffusers/blob/main/docs/source/en/api/models/autoencoderkl_mochi.md" /> |
Xet Storage Details
- Size:
- 10.1 kB
- Xet hash:
- f8278ce565897a6a8c6a61d287622dbe08260ddb09d1969cf91c458174deecaf
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.