AEmotionStudio commited on
Commit
fde1d4c
·
verified ·
1 Parent(s): d49bf60

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - Wan-AI/Wan2.2-I2V-14B-480P
5
+ tags:
6
+ - facecam
7
+ - portrait-video
8
+ - camera-control
9
+ - wan2.2
10
+ - diffusion
11
+ - safetensors
12
+ - comfyui
13
+ pipeline_tag: image-to-video
14
+ ---
15
+
16
+ # FaceCam — Merged bf16 Checkpoints
17
+
18
+ **Portrait Video Camera Control via Scale-Aware Conditioning**
19
+
20
+ 🏔️ **CVPR 2026** 🏔️ | [Paper (arXiv 2603.05506)](https://arxiv.org/abs/2603.05506) | [Code](https://github.com/weijielyu/FaceCam) | [Project Page](https://www.wlyu.me/FaceCam/) | [Original Weights](https://huggingface.co/wlyu/FaceCam)
21
+
22
+ > Weijie Lyu, Ming-Hsuan Yang, Zhixin Shu
23
+ > University of California, Merced · Adobe Research
24
+
25
+ ## What's in This Repo
26
+
27
+ Pre-merged single-file bf16 safetensors converted from the upstream sharded checkpoints at [`wlyu/FaceCam`](https://huggingface.co/wlyu/FaceCam). These are **partial fine-tune** checkpoints (self-attention + patch embedding layers only, ~402 keys each) that patch on top of a base **Wan 2.2 14B I2V** model.
28
+
29
+ | File | Description | Size |
30
+ |------|-------------|------|
31
+ | `facecam_wan2.2_14b_high_bf16.safetensors` | High-noise stage DiT (camera trajectory) | ~7.9 GB |
32
+ | `facecam_wan2.2_14b_low_bf16.safetensors` | Low-noise stage DiT (detail refinement) | ~7.9 GB |
33
+ | `gaussians.ply` | 3D Gaussian head proxy for camera conditioning | ~43 MB |
34
+ | `face_landmarker_v2_with_blendshapes.task` | MediaPipe face landmarker for conditioning extraction | ~3.6 MB |
35
+
36
+ ## Usage with ComfyUI-FFMPEGA
37
+
38
+ These weights are used by the **FaceCam** node in [ComfyUI-FFMPEGA](https://github.com/AEmotionStudio/ComfyUI-FFMPEGA).
39
+
40
+ 1. Place `facecam_wan2.2_14b_*_bf16.safetensors` in `ComfyUI/models/diffusion_models/`
41
+ 2. Place `gaussians.ply` and `face_landmarker_v2_with_blendshapes.task` in `ComfyUI/models/facecam/`
42
+ 3. Load a Wan 2.2 14B I2V base model (e.g. GGUF Q4_K_M) via "Load Diffusion Model"
43
+ 4. Connect to the FaceCam node along with the FaceCam checkpoint
44
+
45
+ ## Pipeline
46
+
47
+ FaceCam generates portrait videos with precise camera control from a single input video:
48
+
49
+ 1. **Face-centered crop** of input video
50
+ 2. **3D Gaussian proxy rendering** for camera trajectory conditioning (via `gaussians.ply`)
51
+ 3. **MediaPipe face landmark extraction** from proxy → `camera_cond`
52
+ 4. **VAE-encode** `video_cond` + `camera_cond`
53
+ 5. **Wan 2.2 DiT inference** with FaceCam conditioning:
54
+ - Two-stage denoising: HIGH model (90%) → LOW model (10%)
55
+ - Temporal concat: `[noise_latents | video_cond_latents]`
56
+ - Channel concat: `[camera_cond_latents | i2v_y]`
57
+
58
+ ## Citation
59
+
60
+ ```bibtex
61
+ @misc{facecam,
62
+ title = {FaceCam: Portrait Video Camera Control via Scale-Aware Conditioning},
63
+ author = {Weijie Lyu and Ming-Hsuan Yang and Zhixin Shu},
64
+ year = {2026},
65
+ eprint = {2603.05506},
66
+ archivePrefix = {arXiv},
67
+ primaryClass = {cs.CV},
68
+ url = {https://arxiv.org/abs/2603.05506},
69
+ }
70
+ ```
71
+
72
+ ## License
73
+
74
+ These model weights are released under the [Apache License 2.0](./LICENSE), matching the upstream [FaceCam](https://github.com/weijielyu/FaceCam) repository license.
75
+
76
+ ## Acknowledgements
77
+
78
+ - [FaceCam](https://github.com/weijielyu/FaceCam) by Weijie Lyu, Ming-Hsuan Yang, Zhixin Shu
79
+ - [Wan 2.2](https://arxiv.org/abs/2503.20314) by Wan-AI
80
+ - [MediaPipe](https://developers.google.com/mediapipe) by Google