File size: 4,879 Bytes
daedcec
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ef1507b
daedcec
 
 
 
 
 
 
 
 
ef1507b
daedcec
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ef1507b
daedcec
 
 
 
ef1507b
daedcec
 
 
ef1507b
daedcec
ef1507b
 
daedcec
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
license: apache-2.0
tags:
- portrait-animation
- real-time
- diffusion
pipeline_tag: image-to-video
library_name: diffusers
---

<div align="center">



### ⏬ Download weights
Option 1: Download pre-trained weights of base models and other components ([sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers) and [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse)). You can run the following command to download weights automatically:
    
```bash
python tools/download_weights.py
```

Option 2: Download pre-trained weights into the `./pretrained_weights` folder from one of the below URLs:
    
<a href='https://drive.google.com/drive/folders/1GOhDBKIeowkMpBnKhGB8jgEhJt_--vbT?usp=drive_link'><img src='https://img.shields.io/badge/Google%20Drive-5B8DEF?style=for-the-badge&logo=googledrive&logoColor=white'></a> <a href='https://pan.baidu.com/s/1DCv4NvUy_z7Gj2xCGqRMkQ?pwd=gj64'><img src='https://img.shields.io/badge/Baidu%20Netdisk-3E4A89?style=for-the-badge&logo=baidu&logoColor=white'></a> <a href='https://modelscope.cn/models/huaichang/SuperCam'><img src='https://img.shields.io/badge/ModelScope-624AFF?style=for-the-badge&logo=alibabacloud&logoColor=white'></a> <a href='https://huggingface.co/huaichang/SuperCam'><img src='https://img.shields.io/badge/HuggingFace-E67E22?style=for-the-badge&logo=huggingface&logoColor=white'></a>

Finally, these weights should be organized as follows:
```
pretrained_weights
β”œβ”€β”€ onnx
β”‚   β”œβ”€β”€ unet_opt
β”‚   β”‚   β”œβ”€β”€ unet_opt.onnx
β”‚   β”‚   └── unet_opt.onnx.data
β”‚   └── unet
β”œβ”€β”€ SuperCam
β”‚   β”œβ”€β”€ denoising_unet.pth
β”‚   β”œβ”€β”€ motion_encoder.pth
β”‚   β”œβ”€β”€ motion_extractor.pth
β”‚   β”œβ”€β”€ pose_guider.pth
β”‚   β”œβ”€β”€ reference_unet.pth
β”‚   └── temporal_module.pth
β”œβ”€β”€ sd-vae-ft-mse
β”‚   β”œβ”€β”€ diffusion_pytorch_model.bin
β”‚   └── config.json
β”œβ”€β”€ sd-image-variations-diffusers
β”‚   β”œβ”€β”€ image_encoder
β”‚   β”‚   β”œβ”€β”€ pytorch_model.bin
β”‚   β”‚   └── config.json
β”‚   β”œβ”€β”€ unet
β”‚   β”‚   β”œβ”€β”€ diffusion_pytorch_model.bin
β”‚   β”‚   └── config.json
β”‚   └── model_index.json
└── tensorrt
    └── unet_work.engine
```

### 🎞️ Offline Inference
```
python inference_offline.py
```
⚠️ Note for RTX 50-Series (Blackwell) Users: xformers is not yet fully compatible with the new architecture. To avoid crashes, please disable it by running:
```
python inference_offline.py --use_xformers False
```

### πŸ“Έ Online Inference
#### πŸ“¦ Setup Web UI
```
# install Node.js 18+
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
nvm install 18

cd webcam
source start.sh
```

#### 🏎️ Acceleration (Optional)
Converting the model to TensorRT can significantly speed up inference (~ 2x ⚑️). Building the engine may take about `20 minutes` depending on your device. Note that TensorRT optimizations may lead to slight variations or a small drop in output quality.
```
pip install -r requirements_trt.txt

python torch2trt.py
```
*The provided TensorRT model is from an `H100`. We recommend `ALL users` (including H100 users) re-run `python torch2trt.py` locally to ensure best compatibility.*

#### ▢️ Start Streaming
```
python inference_online.py --acceleration none (for RTX 50-Series) or xformers or tensorrt
```
Then open `http://0.0.0.0:7860` in your browser. (*If `http://0.0.0.0:7860` does not work well, try `http://localhost:7860`)

**How to use**: Upload Image ➑️ Fuse Reference ➑️ Start Animation ➑️ Enjoy! πŸŽ‰


**Regarding Latency**: Latency varies depending on your device's computing power. You can try the following methods to optimize it:

1. Lower the "Driving FPS" setting in the WebUI to reduce the computational workload.
2. You can increase the multiplier (e.g., set to `num_frames_needed * 4` or higher) to better match your device's inference speed. https://github.com/GVCLab/SuperCam/blob/6953d1a8b409f360a3ee1d7325093622b29f1e22/webcam/util.py#L73


## ⭐ Citation
If you find SuperCam useful for your research, welcome to cite our work using the following BibTeX:
```bibtex
@article{li2025SuperCam,
  title={SuperCam! Expressive Portrait Image Animation for Live Streaming},
  author={Li, Zhiyuan and Pun, Chi-Man and Fang, Chen and Wang, Jue and Cun, Xiaodong},
  journal={arXiv preprint arXiv:2512.11253},
  year={2025}
}
```

## ❀️ Acknowledgement
This code is mainly built upon [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone), [X-NeMo](https://byteaigc.github.io/X-Portrait2/), [StreamDiffusion](https://github.com/cumulo-autumn/StreamDiffusion), [RAIN](https://pscgylotti.github.io/pages/RAIN/) and [LivePortrait](https://github.com/KlingTeam/LivePortrait), thanks to their invaluable contributions.