ferrotorch/sd-v1-5-vae-decoder

Stable Diffusion 1.5 VAE decoder (runwayml/stable-diffusion-v1-5, vae/ subfolder). post_quant_conv (Conv2d 4->4, k=1) + Decoder (conv_in 4->512, UNetMidBlock2D with 1-head attention at 512ch, 4ร— UpDecoderBlock2D with 3 resnets each and nearest-2x upsample on all but the last block, GroupNorm32 + SiLU + conv_out 128->3). ~50M-param decoder slice of AutoencoderKL. RAIL-M licensed. Pinned decoder-only โ€” encoder + quant_conv keys are dropped from this mirror. Real-artifact baseline for SD VAE decoder parity vs diffusers (#1150).

Provenance

  • Upstream: runwayml/stable-diffusion-v1-5 (subfolder vae/), openrail.
  • Conversion script: ferrotorch/scripts/pin_pretrained_diffusion_weights.py.
  • Ferrotorch issue: https://github.com/dollspace-gay/ferrotorch/issues/1150.
  • SHA-256 of model.safetensors (this file is pinned in ferrotorch-hub/src/registry.rs): 5210b518f8d4e829355197aa79855c206678e91d13467a580123222c75c5a131.
  • Number of trainable parameters in the decoder slice: 49,490,199.
  • Config snapshot: block_out_channels=[128, 256, 512, 512], layers_per_block=2, norm_num_groups=32, sample_size=512, latent_channels=4, scaling_factor=0.18215, act_fn='silu'.
  • Non-decoder keys dropped from the upstream checkpoint (this mirror is decoder-only): 108 total, first few: ['encoder.conv_in.bias', 'encoder.conv_in.weight', 'encoder.conv_norm_out.bias'].

Value-parity probe

Two extra files are uploaded so the ferrotorch-side harness can reproduce the parity verdict without re-running the upstream AutoencoderKL.decode:

  • _value_parity_latent.bin โ€” deterministic latent torch.manual_seed(42); torch.randn(1, 4, 64, 64) * 0.18215, float32, shape [1, 4, 64, 64]. This is the post-scaling latent the SD pipeline feeds to vae.decode (which itself divides by scaling_factor internally).
  • _value_parity_image.bin โ€” float32 decoded image [1, 3, 512, 512] from AutoencoderKL.decode(latent, return_dict=False)[0] on float32 weights in eval mode. Same dump format as every other ferrotorch artifact: [u32 ndim][u32 ร— ndim shape][f32 ร— prod(shape)] little-endian.

How to load

use ferrotorch_diffusion::{VaeDecoderConfig, load_vae_decoder};
use ferrotorch_hub::{HubCache, hf_download_model};

let cache = HubCache::with_default_dir();
let repo_dir = hf_download_model("ferrotorch/sd-v1-5-vae-decoder", "main", &cache)?;
let cfg = VaeDecoderConfig::from_file(&repo_dir.join("config.json"))?;
let (decoder, _drop_report) = load_vae_decoder::<f32>(
    &repo_dir.join("model.safetensors"),
    cfg,
    /* strict = */ false,
)?;

Upstream license

Stable Diffusion v1.5 is distributed under the CreativeML Open RAIL-M license. The decoder slice mirrored here inherits that license โ€” see https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/LICENSE for the full terms.

Downloads last month
94
Safetensors
Model size
49.5M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support