Wan 2.2 shared encoders β VAE + UMT5 + CLIP-vision
Organization: WindstormLabs
Used by: SceneMachine
SceneMachine is a sub-project of Windstorm Labs. This repo hosts shared AI/ML infrastructure used by SceneMachine and reusable by future Windstorm Labs sub-projects.
What this is
Shared encoder weights used across the Wan 2.2 T2V / I2V / Animate stacks. Wan 2.1 VAE (still used by 2.2), UMT5-xxl text encoder, SigLIP vision-patch14-384 (for I2V), and CLIP-ViT-H (REQUIRED by Animate's face_adapter β SigLIP triggers a LayerNorm shape mismatch).
Upstream source
Primary distribution: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged + https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged (CLIP-ViT-H)
This repo is a mirror. License terms of the upstream apply unchanged.
Primary license owner(s): Alibaba / Google T5 / Google SigLIP / OpenCLIP-laion
Files
| Filename | Size |
|---|---|
umt5_xxl_bf16_from_pth.safetensors |
11.36 GB |
clip_vision_h.safetensors |
1.26 GB |
sigclip_vision_patch14_384.safetensors |
0.86 GB |
wan_2.1_vae.safetensors |
0.25 GB |
Total: 13.74 GB
Related repos
- The full SceneMachine model stack: search the SceneMachine HF collection.
- Shared encoders and Wan VAE:
WindstormLabs/wan22-encoders. - Speed LoRAs:
WindstormLabs/wan22-loras.
Provenance
Mirror created 2026-05-13 from local working copy on the SceneMachine development rig. Hashes preserved via HF's content-addressed storage.
π€ Repo and README generated by Claude Code during a SceneMachine CTO session.