Wan 2.2 shared encoders — VAE + UMT5 + CLIP-vision

Organization: WindstormLabs
Used by: SceneMachine

SceneMachine is a sub-project of Windstorm Labs. This repo hosts shared AI/ML infrastructure used by SceneMachine and reusable by future Windstorm Labs sub-projects.

What this is

Shared encoder weights used across the Wan 2.2 T2V / I2V / Animate stacks. Wan 2.1 VAE (still used by 2.2), UMT5-xxl text encoder, SigLIP vision-patch14-384 (for I2V), and CLIP-ViT-H (REQUIRED by Animate's face_adapter — SigLIP triggers a LayerNorm shape mismatch).

Upstream source

Primary distribution: https://huggingface.co/Comfy-Org/Wan_2.2_ComfyUI_Repackaged + https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged (CLIP-ViT-H)

This repo is a mirror. License terms of the upstream apply unchanged.

Primary license owner(s): Alibaba / Google T5 / Google SigLIP / OpenCLIP-laion

Files

Filename	Size
`umt5_xxl_bf16_from_pth.safetensors`	11.36 GB
`clip_vision_h.safetensors`	1.26 GB
`sigclip_vision_patch14_384.safetensors`	0.86 GB
`wan_2.1_vae.safetensors`	0.25 GB

Total: 13.74 GB

Related repos

The full SceneMachine model stack: search the SceneMachine HF collection.
Shared encoders and Wan VAE: WindstormLabs/wan22-encoders.
Speed LoRAs: WindstormLabs/wan22-loras.

Provenance

Mirror created 2026-05-13 from local working copy on the SceneMachine development rig. Hashes preserved via HF's content-addressed storage.

🤖 Repo and README generated by Claude Code during a SceneMachine CTO session.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support