Upload README.md with huggingface_hub

fde1d4c verified about 1 month ago

3.27 kB

license: apache-2.0
base_model:
  - Wan-AI/Wan2.2-I2V-14B-480P
tags:
  - facecam
  - portrait-video
  - camera-control
  - wan2.2
  - diffusion
  - safetensors
  - comfyui
pipeline_tag: image-to-video

FaceCam — Merged bf16 Checkpoints

Portrait Video Camera Control via Scale-Aware Conditioning

🏔️ CVPR 2026 🏔️ | Paper (arXiv 2603.05506) | Code | Project Page | Original Weights

Weijie Lyu, Ming-Hsuan Yang, Zhixin Shu University of California, Merced · Adobe Research

What's in This Repo

Pre-merged single-file bf16 safetensors converted from the upstream sharded checkpoints at wlyu/FaceCam. These are partial fine-tune checkpoints (self-attention + patch embedding layers only, ~402 keys each) that patch on top of a base Wan 2.2 14B I2V model.

File	Description	Size
`facecam_wan2.2_14b_high_bf16.safetensors`	High-noise stage DiT (camera trajectory)	~7.9 GB
`facecam_wan2.2_14b_low_bf16.safetensors`	Low-noise stage DiT (detail refinement)	~7.9 GB
`gaussians.ply`	3D Gaussian head proxy for camera conditioning	~43 MB
`face_landmarker_v2_with_blendshapes.task`	MediaPipe face landmarker for conditioning extraction	~3.6 MB

Usage with ComfyUI-FFMPEGA

These weights are used by the FaceCam node in ComfyUI-FFMPEGA.

Place facecam_wan2.2_14b_*_bf16.safetensors in ComfyUI/models/diffusion_models/
Place gaussians.ply and face_landmarker_v2_with_blendshapes.task in ComfyUI/models/facecam/
Load a Wan 2.2 14B I2V base model (e.g. GGUF Q4_K_M) via "Load Diffusion Model"
Connect to the FaceCam node along with the FaceCam checkpoint

Pipeline

FaceCam generates portrait videos with precise camera control from a single input video:

Face-centered crop of input video
3D Gaussian proxy rendering for camera trajectory conditioning (via gaussians.ply)
MediaPipe face landmark extraction from proxy → camera_cond
VAE-encode video_cond + camera_cond
Wan 2.2 DiT inference with FaceCam conditioning:
- Two-stage denoising: HIGH model (90%) → LOW model (10%)
- Temporal concat: [noise_latents | video_cond_latents]
- Channel concat: [camera_cond_latents | i2v_y]

Citation

@misc{facecam,
  title   = {FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning},
  author  = {Weijie Lyu and Ming-Hsuan Yang and Zhixin Shu},
  year    = {2026},
  eprint  = {2603.05506},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  url     = {https://arxiv.org/abs/2603.05506},
}

License

These model weights are released under the Apache License 2.0, matching the upstream FaceCam repository license.

Acknowledgements

FaceCam by Weijie Lyu, Ming-Hsuan Yang, Zhixin Shu
Wan 2.2 by Wan-AI
MediaPipe by Google