File size: 3,124 Bytes
7896359
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
license: openrail
tags:
  - stable-diffusion
  - vae
  - autoencoder-kl
  - ferrotorch
---

# `ferrotorch/sd-v1-5-vae-decoder`

Stable Diffusion 1.5 VAE decoder (runwayml/stable-diffusion-v1-5, vae/ subfolder). post_quant_conv (Conv2d 4->4, k=1) + Decoder (conv_in 4->512, UNetMidBlock2D with 1-head attention at 512ch, 4× UpDecoderBlock2D with 3 resnets each and nearest-2x upsample on all but the last block, GroupNorm32 + SiLU + conv_out 128->3). ~50M-param decoder slice of AutoencoderKL. RAIL-M licensed. Pinned decoder-only — encoder + quant_conv keys are dropped from this mirror. Real-artifact baseline for SD VAE decoder parity vs diffusers (#1150).

## Provenance

* Upstream: `runwayml/stable-diffusion-v1-5` (subfolder `vae/`),
  openrail.
* Conversion script:
  [`ferrotorch/scripts/pin_pretrained_diffusion_weights.py`](https://github.com/dollspace-gay/ferrotorch/blob/main/scripts/pin_pretrained_diffusion_weights.py).
* Ferrotorch issue: <https://github.com/dollspace-gay/ferrotorch/issues/1150>.
* SHA-256 of `model.safetensors` (this file is pinned in
  `ferrotorch-hub/src/registry.rs`): `5210b518f8d4e829355197aa79855c206678e91d13467a580123222c75c5a131`.
* Number of trainable parameters in the decoder slice:
  **49,490,199**.
* Config snapshot:
  block_out_channels=[128, 256, 512, 512],
  layers_per_block=2,
  norm_num_groups=32,
  sample_size=512,
  latent_channels=4,
  scaling_factor=0.18215,
  act_fn='silu'.
* Non-decoder keys dropped from the upstream checkpoint (this
  mirror is decoder-only): 108 total, first few:
  `['encoder.conv_in.bias', 'encoder.conv_in.weight', 'encoder.conv_norm_out.bias']`.

## Value-parity probe

Two extra files are uploaded so the ferrotorch-side harness can
reproduce the parity verdict without re-running the upstream
AutoencoderKL.decode:

* `_value_parity_latent.bin` — deterministic latent
  `torch.manual_seed(42); torch.randn(1, 4, 64, 64) * 0.18215`,
  float32, shape `[1, 4, 64, 64]`. This is the *post-scaling*
  latent the SD pipeline feeds to `vae.decode` (which itself
  divides by `scaling_factor` internally).
* `_value_parity_image.bin` — float32 decoded image
  `[1, 3, 512, 512]` from
  `AutoencoderKL.decode(latent, return_dict=False)[0]` on
  float32 weights in eval mode. Same dump format as every other
  ferrotorch artifact:
  `[u32 ndim][u32 × ndim shape][f32 × prod(shape)]` little-endian.

## How to load

```rust
use ferrotorch_diffusion::{VaeDecoderConfig, load_vae_decoder};
use ferrotorch_hub::{HubCache, hf_download_model};

let cache = HubCache::with_default_dir();
let repo_dir = hf_download_model("ferrotorch/sd-v1-5-vae-decoder", "main", &cache)?;
let cfg = VaeDecoderConfig::from_file(&repo_dir.join("config.json"))?;
let (decoder, _drop_report) = load_vae_decoder::<f32>(
    &repo_dir.join("model.safetensors"),
    cfg,
    /* strict = */ false,
)?;
```

## Upstream license

Stable Diffusion v1.5 is distributed under the CreativeML Open RAIL-M license. The decoder slice mirrored here inherits that license — see https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/LICENSE for the full terms.