Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,80 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
base_model:
|
| 4 |
+
- Wan-AI/Wan2.2-I2V-14B-480P
|
| 5 |
+
tags:
|
| 6 |
+
- facecam
|
| 7 |
+
- portrait-video
|
| 8 |
+
- camera-control
|
| 9 |
+
- wan2.2
|
| 10 |
+
- diffusion
|
| 11 |
+
- safetensors
|
| 12 |
+
- comfyui
|
| 13 |
+
pipeline_tag: image-to-video
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# FaceCam — Merged bf16 Checkpoints
|
| 17 |
+
|
| 18 |
+
**Portrait Video Camera Control via Scale-Aware Conditioning**
|
| 19 |
+
|
| 20 |
+
🏔️ **CVPR 2026** 🏔️ | [Paper (arXiv 2603.05506)](https://arxiv.org/abs/2603.05506) | [Code](https://github.com/weijielyu/FaceCam) | [Project Page](https://www.wlyu.me/FaceCam/) | [Original Weights](https://huggingface.co/wlyu/FaceCam)
|
| 21 |
+
|
| 22 |
+
> Weijie Lyu, Ming-Hsuan Yang, Zhixin Shu
|
| 23 |
+
> University of California, Merced · Adobe Research
|
| 24 |
+
|
| 25 |
+
## What's in This Repo
|
| 26 |
+
|
| 27 |
+
Pre-merged single-file bf16 safetensors converted from the upstream sharded checkpoints at [`wlyu/FaceCam`](https://huggingface.co/wlyu/FaceCam). These are **partial fine-tune** checkpoints (self-attention + patch embedding layers only, ~402 keys each) that patch on top of a base **Wan 2.2 14B I2V** model.
|
| 28 |
+
|
| 29 |
+
| File | Description | Size |
|
| 30 |
+
|------|-------------|------|
|
| 31 |
+
| `facecam_wan2.2_14b_high_bf16.safetensors` | High-noise stage DiT (camera trajectory) | ~7.9 GB |
|
| 32 |
+
| `facecam_wan2.2_14b_low_bf16.safetensors` | Low-noise stage DiT (detail refinement) | ~7.9 GB |
|
| 33 |
+
| `gaussians.ply` | 3D Gaussian head proxy for camera conditioning | ~43 MB |
|
| 34 |
+
| `face_landmarker_v2_with_blendshapes.task` | MediaPipe face landmarker for conditioning extraction | ~3.6 MB |
|
| 35 |
+
|
| 36 |
+
## Usage with ComfyUI-FFMPEGA
|
| 37 |
+
|
| 38 |
+
These weights are used by the **FaceCam** node in [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA).
|
| 39 |
+
|
| 40 |
+
1. Place `facecam_wan2.2_14b_*_bf16.safetensors` in `ComfyUI/models/diffusion_models/`
|
| 41 |
+
2. Place `gaussians.ply` and `face_landmarker_v2_with_blendshapes.task` in `ComfyUI/models/facecam/`
|
| 42 |
+
3. Load a Wan 2.2 14B I2V base model (e.g. GGUF Q4_K_M) via "Load Diffusion Model"
|
| 43 |
+
4. Connect to the FaceCam node along with the FaceCam checkpoint
|
| 44 |
+
|
| 45 |
+
## Pipeline
|
| 46 |
+
|
| 47 |
+
FaceCam generates portrait videos with precise camera control from a single input video:
|
| 48 |
+
|
| 49 |
+
1. **Face-centered crop** of input video
|
| 50 |
+
2. **3D Gaussian proxy rendering** for camera trajectory conditioning (via `gaussians.ply`)
|
| 51 |
+
3. **MediaPipe face landmark extraction** from proxy → `camera_cond`
|
| 52 |
+
4. **VAE-encode** `video_cond` + `camera_cond`
|
| 53 |
+
5. **Wan 2.2 DiT inference** with FaceCam conditioning:
|
| 54 |
+
- Two-stage denoising: HIGH model (90%) → LOW model (10%)
|
| 55 |
+
- Temporal concat: `[noise_latents | video_cond_latents]`
|
| 56 |
+
- Channel concat: `[camera_cond_latents | i2v_y]`
|
| 57 |
+
|
| 58 |
+
## Citation
|
| 59 |
+
|
| 60 |
+
```bibtex
|
| 61 |
+
@misc{facecam,
|
| 62 |
+
title = {FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning},
|
| 63 |
+
author = {Weijie Lyu and Ming-Hsuan Yang and Zhixin Shu},
|
| 64 |
+
year = {2026},
|
| 65 |
+
eprint = {2603.05506},
|
| 66 |
+
archivePrefix = {arXiv},
|
| 67 |
+
primaryClass = {cs.CV},
|
| 68 |
+
url = {https://arxiv.org/abs/2603.05506},
|
| 69 |
+
}
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
## License
|
| 73 |
+
|
| 74 |
+
These model weights are released under the [Apache License 2.0](./LICENSE), matching the upstream [FaceCam](https://github.com/weijielyu/FaceCam) repository license.
|
| 75 |
+
|
| 76 |
+
## Acknowledgements
|
| 77 |
+
|
| 78 |
+
- [FaceCam](https://github.com/weijielyu/FaceCam) by Weijie Lyu, Ming-Hsuan Yang, Zhixin Shu
|
| 79 |
+
- [Wan 2.2](https://arxiv.org/abs/2503.20314) by Wan-AI
|
| 80 |
+
- [MediaPipe](https://developers.google.com/mediapipe) by Google
|