FaceCam — Merged bf16 Checkpoints
Portrait Video Camera Control via Scale-Aware Conditioning
🏔️ CVPR 2026 🏔️ | Paper (arXiv 2603.05506) | Code | Project Page | Original Weights
Weijie Lyu, Ming-Hsuan Yang, Zhixin Shu University of California, Merced · Adobe Research
What's in This Repo
Pre-merged single-file bf16 safetensors converted from the upstream sharded checkpoints at wlyu/FaceCam. These are partial fine-tune checkpoints (self-attention + patch embedding layers only, ~402 keys each) that patch on top of a base Wan 2.2 14B I2V model.
| File | Description | Size |
|---|---|---|
facecam_wan2.2_14b_high_bf16.safetensors |
High-noise stage DiT (camera trajectory) | ~7.9 GB |
facecam_wan2.2_14b_low_bf16.safetensors |
Low-noise stage DiT (detail refinement) | ~7.9 GB |
gaussians.ply |
3D Gaussian head proxy for camera conditioning | ~43 MB |
face_landmarker_v2_with_blendshapes.task |
MediaPipe face landmarker for conditioning extraction | ~3.6 MB |
Usage with ComfyUI-FFMPEGA
These weights are used by the FaceCam node in ComfyUI-FFMPEGA.
- Place
facecam_wan2.2_14b_*_bf16.safetensorsinComfyUI/models/diffusion_models/ - Place
gaussians.plyandface_landmarker_v2_with_blendshapes.taskinComfyUI/models/facecam/ - Load a Wan 2.2 14B I2V base model (e.g. GGUF Q4_K_M) via "Load Diffusion Model"
- Connect to the FaceCam node along with the FaceCam checkpoint
Pipeline
FaceCam generates portrait videos with precise camera control from a single input video:
- Face-centered crop of input video
- 3D Gaussian proxy rendering for camera trajectory conditioning (via
gaussians.ply) - MediaPipe face landmark extraction from proxy →
camera_cond - VAE-encode
video_cond+camera_cond - Wan 2.2 DiT inference with FaceCam conditioning:
- Two-stage denoising: HIGH model (90%) → LOW model (10%)
- Temporal concat:
[noise_latents | video_cond_latents] - Channel concat:
[camera_cond_latents | i2v_y]
Citation
@misc{facecam,
title = {FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning},
author = {Weijie Lyu and Ming-Hsuan Yang and Zhixin Shu},
year = {2026},
eprint = {2603.05506},
archivePrefix = {arXiv},
primaryClass = {cs.CV},
url = {https://arxiv.org/abs/2603.05506},
}
License
These model weights are released under the Apache License 2.0, matching the upstream FaceCam repository license.
Acknowledgements
- Downloads last month
- -