| | --- |
| | license: apache-2.0 |
| | tags: |
| | - portrait-animation |
| | - real-time |
| | - diffusion |
| | pipeline_tag: image-to-video |
| | library_name: diffusers |
| | --- |
| | |
| | <div align="center"> |
| |
|
| |
|
| |
|
| | ### β¬ Download weights |
| | Option 1: Download pre-trained weights of base models and other components ([sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers) and [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse)). You can run the following command to download weights automatically: |
| | |
| | ```bash |
| | python tools/download_weights.py |
| | ``` |
| | |
| | Option 2: Download pre-trained weights into the `./pretrained_weights` folder from one of the below URLs: |
| | |
| | <a href='https://drive.google.com/drive/folders/1GOhDBKIeowkMpBnKhGB8jgEhJt_--vbT?usp=drive_link'><img src='https://img.shields.io/badge/Google%20Drive-5B8DEF?style=for-the-badge&logo=googledrive&logoColor=white'></a> <a href='https://pan.baidu.com/s/1DCv4NvUy_z7Gj2xCGqRMkQ?pwd=gj64'><img src='https://img.shields.io/badge/Baidu%20Netdisk-3E4A89?style=for-the-badge&logo=baidu&logoColor=white'></a> <a href='https://modelscope.cn/models/huaichang/SuperCam'><img src='https://img.shields.io/badge/ModelScope-624AFF?style=for-the-badge&logo=alibabacloud&logoColor=white'></a> <a href='https://huggingface.co/huaichang/SuperCam'><img src='https://img.shields.io/badge/HuggingFace-E67E22?style=for-the-badge&logo=huggingface&logoColor=white'></a> |
| | |
| | Finally, these weights should be organized as follows: |
| | ``` |
| | pretrained_weights |
| | βββ onnx |
| | β βββ unet_opt |
| | β β βββ unet_opt.onnx |
| | β β βββ unet_opt.onnx.data |
| | β βββ unet |
| | βββ SuperCam |
| | β βββ denoising_unet.pth |
| | β βββ motion_encoder.pth |
| | β βββ motion_extractor.pth |
| | β βββ pose_guider.pth |
| | β βββ reference_unet.pth |
| | β βββ temporal_module.pth |
| | βββ sd-vae-ft-mse |
| | β βββ diffusion_pytorch_model.bin |
| | β βββ config.json |
| | βββ sd-image-variations-diffusers |
| | β βββ image_encoder |
| | β β βββ pytorch_model.bin |
| | β β βββ config.json |
| | β βββ unet |
| | β β βββ diffusion_pytorch_model.bin |
| | β β βββ config.json |
| | β βββ model_index.json |
| | βββ tensorrt |
| | βββ unet_work.engine |
| | ``` |
| |
|
| | ### ποΈ Offline Inference |
| | ``` |
| | python inference_offline.py |
| | ``` |
| | β οΈ Note for RTX 50-Series (Blackwell) Users: xformers is not yet fully compatible with the new architecture. To avoid crashes, please disable it by running: |
| | ``` |
| | python inference_offline.py --use_xformers False |
| | ``` |
| |
|
| | ### πΈ Online Inference |
| | #### π¦ Setup Web UI |
| | ``` |
| | # install Node.js 18+ |
| | curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash |
| | nvm install 18 |
| | |
| | cd webcam |
| | source start.sh |
| | ``` |
| |
|
| | #### ποΈ Acceleration (Optional) |
| | Converting the model to TensorRT can significantly speed up inference (~ 2x β‘οΈ). Building the engine may take about `20 minutes` depending on your device. Note that TensorRT optimizations may lead to slight variations or a small drop in output quality. |
| | ``` |
| | pip install -r requirements_trt.txt |
| | |
| | python torch2trt.py |
| | ``` |
| | *The provided TensorRT model is from an `H100`. We recommend `ALL users` (including H100 users) re-run `python torch2trt.py` locally to ensure best compatibility.* |
| |
|
| | #### βΆοΈ Start Streaming |
| | ``` |
| | python inference_online.py --acceleration none (for RTX 50-Series) or xformers or tensorrt |
| | ``` |
| | Then open `http://0.0.0.0:7860` in your browser. (*If `http://0.0.0.0:7860` does not work well, try `http://localhost:7860`) |
| | |
| | **How to use**: Upload Image β‘οΈ Fuse Reference β‘οΈ Start Animation β‘οΈ Enjoy! π |
| | |
| | |
| | **Regarding Latency**: Latency varies depending on your device's computing power. You can try the following methods to optimize it: |
| | |
| | 1. Lower the "Driving FPS" setting in the WebUI to reduce the computational workload. |
| | 2. You can increase the multiplier (e.g., set to `num_frames_needed * 4` or higher) to better match your device's inference speed. https://github.com/GVCLab/SuperCam/blob/6953d1a8b409f360a3ee1d7325093622b29f1e22/webcam/util.py#L73 |
| |
|
| |
|
| | ## β Citation |
| | If you find SuperCam useful for your research, welcome to cite our work using the following BibTeX: |
| | ```bibtex |
| | @article{li2025SuperCam, |
| | title={SuperCam! Expressive Portrait Image Animation for Live Streaming}, |
| | author={Li, Zhiyuan and Pun, Chi-Man and Fang, Chen and Wang, Jue and Cun, Xiaodong}, |
| | journal={arXiv preprint arXiv:2512.11253}, |
| | year={2025} |
| | } |
| | ``` |
| |
|
| | ## β€οΈ Acknowledgement |
| | This code is mainly built upon [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone), [X-NeMo](https://byteaigc.github.io/X-Portrait2/), [StreamDiffusion](https://github.com/cumulo-autumn/StreamDiffusion), [RAIN](https://pscgylotti.github.io/pages/RAIN/) and [LivePortrait](https://github.com/KlingTeam/LivePortrait), thanks to their invaluable contributions. |