--- license: apache-2.0 tags: - portrait-animation - real-time - diffusion pipeline_tag: image-to-video library_name: diffusers ---
### ⏬ Download weights Option 1: Download pre-trained weights of base models and other components ([sd-image-variations-diffusers](https://huggingface.co/lambdalabs/sd-image-variations-diffusers) and [sd-vae-ft-mse](https://huggingface.co/stabilityai/sd-vae-ft-mse)). You can run the following command to download weights automatically: ```bash python tools/download_weights.py ``` Option 2: Download pre-trained weights into the `./pretrained_weights` folder from one of the below URLs: Finally, these weights should be organized as follows: ``` pretrained_weights ├── onnx │ ├── unet_opt │ │ ├── unet_opt.onnx │ │ └── unet_opt.onnx.data │ └── unet ├── SuperCam │ ├── denoising_unet.pth │ ├── motion_encoder.pth │ ├── motion_extractor.pth │ ├── pose_guider.pth │ ├── reference_unet.pth │ └── temporal_module.pth ├── sd-vae-ft-mse │ ├── diffusion_pytorch_model.bin │ └── config.json ├── sd-image-variations-diffusers │ ├── image_encoder │ │ ├── pytorch_model.bin │ │ └── config.json │ ├── unet │ │ ├── diffusion_pytorch_model.bin │ │ └── config.json │ └── model_index.json └── tensorrt └── unet_work.engine ``` ### 🎞️ Offline Inference ``` python inference_offline.py ``` ⚠️ Note for RTX 50-Series (Blackwell) Users: xformers is not yet fully compatible with the new architecture. To avoid crashes, please disable it by running: ``` python inference_offline.py --use_xformers False ``` ### 📸 Online Inference #### 📦 Setup Web UI ``` # install Node.js 18+ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash nvm install 18 cd webcam source start.sh ``` #### 🏎️ Acceleration (Optional) Converting the model to TensorRT can significantly speed up inference (~ 2x ⚡️). Building the engine may take about `20 minutes` depending on your device. Note that TensorRT optimizations may lead to slight variations or a small drop in output quality. ``` pip install -r requirements_trt.txt python torch2trt.py ``` *The provided TensorRT model is from an `H100`. We recommend `ALL users` (including H100 users) re-run `python torch2trt.py` locally to ensure best compatibility.* #### ▶️ Start Streaming ``` python inference_online.py --acceleration none (for RTX 50-Series) or xformers or tensorrt ``` Then open `http://0.0.0.0:7860` in your browser. (*If `http://0.0.0.0:7860` does not work well, try `http://localhost:7860`) **How to use**: Upload Image ➡️ Fuse Reference ➡️ Start Animation ➡️ Enjoy! 🎉 **Regarding Latency**: Latency varies depending on your device's computing power. You can try the following methods to optimize it: 1. Lower the "Driving FPS" setting in the WebUI to reduce the computational workload. 2. You can increase the multiplier (e.g., set to `num_frames_needed * 4` or higher) to better match your device's inference speed. https://github.com/GVCLab/SuperCam/blob/6953d1a8b409f360a3ee1d7325093622b29f1e22/webcam/util.py#L73 ## ⭐ Citation If you find SuperCam useful for your research, welcome to cite our work using the following BibTeX: ```bibtex @article{li2025SuperCam, title={SuperCam! Expressive Portrait Image Animation for Live Streaming}, author={Li, Zhiyuan and Pun, Chi-Man and Fang, Chen and Wang, Jue and Cun, Xiaodong}, journal={arXiv preprint arXiv:2512.11253}, year={2025} } ``` ## ❤️ Acknowledgement This code is mainly built upon [Moore-AnimateAnyone](https://github.com/MooreThreads/Moore-AnimateAnyone), [X-NeMo](https://byteaigc.github.io/X-Portrait2/), [StreamDiffusion](https://github.com/cumulo-autumn/StreamDiffusion), [RAIN](https://pscgylotti.github.io/pages/RAIN/) and [LivePortrait](https://github.com/KlingTeam/LivePortrait), thanks to their invaluable contributions.