| --- |
| license: openrail |
| tags: |
| - stable-diffusion |
| - vae |
| - autoencoder-kl |
| - ferrotorch |
| --- |
| |
| # `ferrotorch/sd-v1-5-vae-decoder` |
|
|
| Stable Diffusion 1.5 VAE decoder (runwayml/stable-diffusion-v1-5, vae/ subfolder). post_quant_conv (Conv2d 4->4, k=1) + Decoder (conv_in 4->512, UNetMidBlock2D with 1-head attention at 512ch, 4× UpDecoderBlock2D with 3 resnets each and nearest-2x upsample on all but the last block, GroupNorm32 + SiLU + conv_out 128->3). ~50M-param decoder slice of AutoencoderKL. RAIL-M licensed. Pinned decoder-only — encoder + quant_conv keys are dropped from this mirror. Real-artifact baseline for SD VAE decoder parity vs diffusers (#1150). |
| |
| ## Provenance |
| |
| * Upstream: `runwayml/stable-diffusion-v1-5` (subfolder `vae/`), |
| openrail. |
| * Conversion script: |
| [`ferrotorch/scripts/pin_pretrained_diffusion_weights.py`](https://github.com/dollspace-gay/ferrotorch/blob/main/scripts/pin_pretrained_diffusion_weights.py). |
| * Ferrotorch issue: <https://github.com/dollspace-gay/ferrotorch/issues/1150>. |
| * SHA-256 of `model.safetensors` (this file is pinned in |
| `ferrotorch-hub/src/registry.rs`): `5210b518f8d4e829355197aa79855c206678e91d13467a580123222c75c5a131`. |
| * Number of trainable parameters in the decoder slice: |
| **49,490,199**. |
| * Config snapshot: |
| block_out_channels=[128, 256, 512, 512], |
| layers_per_block=2, |
| norm_num_groups=32, |
| sample_size=512, |
| latent_channels=4, |
| scaling_factor=0.18215, |
| act_fn='silu'. |
| * Non-decoder keys dropped from the upstream checkpoint (this |
| mirror is decoder-only): 108 total, first few: |
| `['encoder.conv_in.bias', 'encoder.conv_in.weight', 'encoder.conv_norm_out.bias']`. |
| |
| ## Value-parity probe |
| |
| Two extra files are uploaded so the ferrotorch-side harness can |
| reproduce the parity verdict without re-running the upstream |
| AutoencoderKL.decode: |
| |
| * `_value_parity_latent.bin` — deterministic latent |
| `torch.manual_seed(42); torch.randn(1, 4, 64, 64) * 0.18215`, |
| float32, shape `[1, 4, 64, 64]`. This is the *post-scaling* |
| latent the SD pipeline feeds to `vae.decode` (which itself |
| divides by `scaling_factor` internally). |
| * `_value_parity_image.bin` — float32 decoded image |
| `[1, 3, 512, 512]` from |
| `AutoencoderKL.decode(latent, return_dict=False)[0]` on |
| float32 weights in eval mode. Same dump format as every other |
| ferrotorch artifact: |
| `[u32 ndim][u32 × ndim shape][f32 × prod(shape)]` little-endian. |
|
|
| ## How to load |
|
|
| ```rust |
| use ferrotorch_diffusion::{VaeDecoderConfig, load_vae_decoder}; |
| use ferrotorch_hub::{HubCache, hf_download_model}; |
| |
| let cache = HubCache::with_default_dir(); |
| let repo_dir = hf_download_model("ferrotorch/sd-v1-5-vae-decoder", "main", &cache)?; |
| let cfg = VaeDecoderConfig::from_file(&repo_dir.join("config.json"))?; |
| let (decoder, _drop_report) = load_vae_decoder::<f32>( |
| &repo_dir.join("model.safetensors"), |
| cfg, |
| /* strict = */ false, |
| )?; |
| ``` |
|
|
| ## Upstream license |
|
|
| Stable Diffusion v1.5 is distributed under the CreativeML Open RAIL-M license. The decoder slice mirrored here inherits that license — see https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/LICENSE for the full terms. |
|
|