Proteus-ID: ID-Consistent and Motion-Coherent Video Customization
Paper
β’
2506.23729
β’
Published
Authors: Guiyu Zhang1, Chen Shi1, Zijian Jiang1, Xunzhi Xiang2, Jingjing Qian1, Shaoshuai Shi3, Li Jiangβ 1
1 The Chinese University of Hong Kong, Shenzhenβ2 Nanjing Universityβ 3 Voyager Research, Didi Chuxing
# 0. Clone the repo
git clone --depth=1 https://github.com/grenoble-zhang/Proteus-ID.git
cd /nfs/dataset-ofs-voyager-research/guiyuzhang/Opensource/code/Proteus-ID-main
# 1. Create conda environment
conda create -n proteusid python=3.11.0
conda activate proteusid
# 3. Install PyTorch and other dependencies
# CUDA 12.6
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
# 4. Install pip dependencies
pip install -r requirements.txt
cd util
python download_weights.py
python down_raft.py
Once ready, the weights will be organized in this format:
π¦ ckpts/
βββ π face_encoder/
βββ π scheduler/
βββ π text_encoder/
βββ π tokenizer/
βββ π transformer/
βββ π vae/
βββ π configuration.json
βββ π model_index.json
# For single rank
bash train_single_rank.sh
# For multi rank
bash train_multi_rank.sh
python inference.py --img_file_path assets/example_images/1.png --json_file_path assets/example_images/1.json
If you find our work useful in your research, please consider citing our paper:
@article{zhang2025proteus,
title={Proteus-ID: ID-Consistent and Motion-Coherent Video Customization},
author={Zhang, Guiyu and Shi, Chen and Jiang, Zijian and Xiang, Xunzhi and Qian, Jingjing and Shi, Shaoshuai and Jiang, Li},
journal={arXiv preprint arXiv:2506.23729},
year={2025}
}
Thansk for these excellent opensource works and models: CogVideoX; ConsisID; diffusers.