AEmotionStudio
/

facecam-wan2.2-14b-bf16

+---
+license: apache-2.0
+base_model:
+  - Wan-AI/Wan2.2-I2V-14B-480P
+tags:
+  - facecam
+  - portrait-video
+  - camera-control
+  - wan2.2
+  - diffusion
+  - safetensors
+  - comfyui
+pipeline_tag: image-to-video
+---
+# FaceCam — Merged bf16 Checkpoints
+**Portrait Video Camera Control via Scale-Aware Conditioning**
+🏔️ **CVPR 2026** 🏔️ | [Paper (arXiv 2603.05506)](https://arxiv.org/abs/2603.05506) | [Code](https://github.com/weijielyu/FaceCam) | [Project Page](https://www.wlyu.me/FaceCam/) | [Original Weights](https://huggingface.co/wlyu/FaceCam)
+> Weijie Lyu, Ming-Hsuan Yang, Zhixin Shu
+> University of California, Merced · Adobe Research
+## What's in This Repo
+Pre-merged single-file bf16 safetensors converted from the upstream sharded checkpoints at [`wlyu/FaceCam`](https://huggingface.co/wlyu/FaceCam). These are **partial fine-tune** checkpoints (self-attention + patch embedding layers only, ~402 keys each) that patch on top of a base **Wan 2.2 14B I2V** model.
+| File | Description | Size |
+|------|-------------|------|
+| `facecam_wan2.2_14b_high_bf16.safetensors` | High-noise stage DiT (camera trajectory) | ~7.9 GB |
+| `facecam_wan2.2_14b_low_bf16.safetensors` | Low-noise stage DiT (detail refinement) | ~7.9 GB |
+| `gaussians.ply` | 3D Gaussian head proxy for camera conditioning | ~43 MB |
+| `face_landmarker_v2_with_blendshapes.task` | MediaPipe face landmarker for conditioning extraction | ~3.6 MB |
+## Usage with ComfyUI-FFMPEGA
+These weights are used by the **FaceCam** node in [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA).
+1. Place `facecam_wan2.2_14b_*_bf16.safetensors` in `ComfyUI/models/diffusion_models/`
+2. Place `gaussians.ply` and `face_landmarker_v2_with_blendshapes.task` in `ComfyUI/models/facecam/`
+3. Load a Wan 2.2 14B I2V base model (e.g. GGUF Q4_K_M) via "Load Diffusion Model"
+4. Connect to the FaceCam node along with the FaceCam checkpoint
+## Pipeline
+FaceCam generates portrait videos with precise camera control from a single input video:
+1. **Face-centered crop** of input video
+2. **3D Gaussian proxy rendering** for camera trajectory conditioning (via `gaussians.ply`)
+3. **MediaPipe face landmark extraction** from proxy → `camera_cond`
+4. **VAE-encode** `video_cond` + `camera_cond`
+5. **Wan 2.2 DiT inference** with FaceCam conditioning:
+   - Two-stage denoising: HIGH model (90%) → LOW model (10%)
+   - Temporal concat: `[noise_latents | video_cond_latents]`
+   - Channel concat: `[camera_cond_latents | i2v_y]`
+## Citation
+```bibtex
+@misc{facecam,
+  title   = {FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning},
+  author  = {Weijie Lyu and Ming-Hsuan Yang and Zhixin Shu},
+  year    = {2026},
+  eprint  = {2603.05506},
+  archivePrefix = {arXiv},
+  primaryClass  = {cs.CV},
+  url     = {https://arxiv.org/abs/2603.05506},
+}
+```
+## License
+These model weights are released under the [Apache License 2.0](./LICENSE), matching the upstream [FaceCam](https://github.com/weijielyu/FaceCam) repository license.
+## Acknowledgements
+- [FaceCam](https://github.com/weijielyu/FaceCam) by Weijie Lyu, Ming-Hsuan Yang, Zhixin Shu
+- [Wan 2.2](https://arxiv.org/abs/2503.20314) by Wan-AI
+- [MediaPipe](https://developers.google.com/mediapipe) by Google