ferrotorch/sd-v1-5-clip-text-encoder
Stable Diffusion 1.5 CLIP text encoder (runwayml/stable-diffusion-v1-5, text_encoder/ subfolder; the text tower of openai/clip-vit-large-patch14). 12 transformer layers, hidden_size=768, intermediate_size=3072, num_attention_heads=12, max_position_embeddings=77, vocab_size=49408, hidden_act=quick_gelu, layer_norm_eps=1e-5. Causal self-attention. ~123M-param text conditioner. RAIL-M licensed. Real-artifact baseline for SD CLIP text encoder parity vs transformers (#1152).
Provenance
- Upstream:
runwayml/stable-diffusion-v1-5(subfoldertext_encoder/), openrail. - Conversion script:
ferrotorch/scripts/pin_pretrained_diffusion_weights.py. - Ferrotorch issue: https://github.com/dollspace-gay/ferrotorch/issues/1152.
- SHA-256 of
model.safetensors(this file is pinned inferrotorch-hub/src/registry.rs):52de4b2426c9e31a63dadec5d111f766af7304b1ab205872b060c274727861de. - Number of trainable parameters in the text encoder: 123,060,480.
- Config snapshot: hidden_size=768, intermediate_size=3072, num_attention_heads=12, num_hidden_layers=12, max_position_embeddings=77, vocab_size=49408, hidden_act='quick_gelu', layer_norm_eps=1e-05.
- Dropped upstream int64 buffer keys (not parameters on either
side):
['text_model.embeddings.position_ids'].
Value-parity probe
Two extra files are uploaded so the ferrotorch-side harness can
reproduce the parity verdict without re-running the upstream
CLIPTextModel:
_value_parity_input_ids.binโ pre-tokenized input ids for the fixed prompt"a photograph of an astronaut riding a horse", padded to[1, 77]with the CLIP pad/eos token. Stored as f32 (every CLIP-BPE id fits in 24 bits so the cast is lossless). Shipped so the Rust side does not need a tokenizer on the parity hot path._value_parity_last_hidden_state.binโ float32last_hidden_state[1, 77, 768]fromCLIPTextModel(input_ids=input_ids, return_dict=True).last_hidden_stateon float32 weights in eval mode. Same dump format as every other ferrotorch artifact:[u32 ndim][u32 ร ndim shape][f32 ร prod(shape)]little-endian.
How to load
use ferrotorch_diffusion::{ClipTextConfig, load_clip_text_encoder};
use ferrotorch_hub::{HubCache, hf_download_model};
let cache = HubCache::with_default_dir();
let repo_dir = hf_download_model("ferrotorch/sd-v1-5-clip-text-encoder", "main", &cache)?;
let cfg = ClipTextConfig::from_file(&repo_dir.join("config.json"))?;
let (encoder, _drop_report) = load_clip_text_encoder::<f32>(
&repo_dir.join("model.safetensors"),
cfg,
/* strict = */ false,
)?;
let ids: Vec<u32> = /* CLIP-BPE tokenized prompt, length max_position_embeddings */;
let last_hidden_state = encoder.forward_from_ids(&ids)?; // [1, 77, 768]
Upstream license
Stable Diffusion v1.5 is distributed under the CreativeML Open RAIL-M license. The decoder slice mirrored here inherits that license โ see https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/LICENSE for the full terms.
- Downloads last month
- 79