FaceCam — Merged bf16 Checkpoints

Portrait Video Camera Control via Scale-Aware Conditioning

🏔️ CVPR 2026 🏔️ | Paper (arXiv 2603.05506) | Code | Project Page | Original Weights

Weijie Lyu, Ming-Hsuan Yang, Zhixin Shu University of California, Merced · Adobe Research

What's in This Repo

Pre-merged single-file bf16 safetensors converted from the upstream sharded checkpoints at wlyu/FaceCam. These are partial fine-tune checkpoints (self-attention + patch embedding layers only, ~402 keys each) that patch on top of a base Wan 2.2 14B I2V model.

File Description Size
facecam_wan2.2_14b_high_bf16.safetensors High-noise stage DiT (camera trajectory) ~7.9 GB
facecam_wan2.2_14b_low_bf16.safetensors Low-noise stage DiT (detail refinement) ~7.9 GB
gaussians.ply 3D Gaussian head proxy for camera conditioning ~43 MB
face_landmarker_v2_with_blendshapes.task MediaPipe face landmarker for conditioning extraction ~3.6 MB

Usage with ComfyUI-FFMPEGA

These weights are used by the FaceCam node in ComfyUI-FFMPEGA.

  1. Place facecam_wan2.2_14b_*_bf16.safetensors in ComfyUI/models/diffusion_models/
  2. Place gaussians.ply and face_landmarker_v2_with_blendshapes.task in ComfyUI/models/facecam/
  3. Load a Wan 2.2 14B I2V base model (e.g. GGUF Q4_K_M) via "Load Diffusion Model"
  4. Connect to the FaceCam node along with the FaceCam checkpoint

Pipeline

FaceCam generates portrait videos with precise camera control from a single input video:

  1. Face-centered crop of input video
  2. 3D Gaussian proxy rendering for camera trajectory conditioning (via gaussians.ply)
  3. MediaPipe face landmark extraction from proxy → camera_cond
  4. VAE-encode video_cond + camera_cond
  5. Wan 2.2 DiT inference with FaceCam conditioning:
    • Two-stage denoising: HIGH model (90%) → LOW model (10%)
    • Temporal concat: [noise_latents | video_cond_latents]
    • Channel concat: [camera_cond_latents | i2v_y]

Citation

@misc{facecam,
  title   = {FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning},
  author  = {Weijie Lyu and Ming-Hsuan Yang and Zhixin Shu},
  year    = {2026},
  eprint  = {2603.05506},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url     = {https://arxiv.org/abs/2603.05506},
}

License

These model weights are released under the Apache License 2.0, matching the upstream FaceCam repository license.

Acknowledgements

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for AEmotionStudio/facecam-wan2.2-14b-bf16