| <div align ="center"> | |
| <h1> Proteus-ID </h1> | |
| <h3> Proteus-ID: ID-Consistent and Motion-Coherent Video Customization </h3> | |
| <div align="center"> | |
| </div> | |
| [](https://grenoble-zhang.github.io/Proteus-ID/) | |
| [](https://arxiv.org/abs/2506.23729) | |
| </div> | |
| Authors: [Guiyu Zhang](https://grenoble-zhang.github.io/)<sup>1</sup>, [Chen Shi](https://scholar.google.com.hk/citations?user=o-K_AoYAAAAJ&hl=en)<sup>1</sup>, Zijian Jiang<sup>1</sup>, Xunzhi Xiang<sup>2</sup>, Jingjing Qian<sup>1</sup>, [Shaoshuai Shi](https://shishaoshuai.com/)<sup>3</sup>, [Li Jiangβ ](https://llijiang.github.io/)<sup>1</sup> | |
| <sup>1</sup> The Chinese University of Hong Kong, Shenzhen <sup>2</sup> Nanjing University  | |
| <sup>3</sup> Voyager Research, Didi Chuxing | |
| ## TODO | |
| - [x] Release arXiv technique report | |
| - [x] Release full codes | |
| - [ ] Release dataset (coming soon) | |
| ## π οΈ Requirements and Installation | |
| ### Environment | |
| ```bash | |
| # 0. Clone the repo | |
| git clone --depth=1 https://github.com/grenoble-zhang/Proteus-ID.git | |
| cd /nfs/dataset-ofs-voyager-research/guiyuzhang/Opensource/code/Proteus-ID-main | |
| # 1. Create conda environment | |
| conda create -n proteusid python=3.11.0 | |
| conda activate proteusid | |
| # 3. Install PyTorch and other dependencies | |
| # CUDA 12.6 | |
| pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126 | |
| # 4. Install pip dependencies | |
| pip install -r requirements.txt | |
| ``` | |
| ### Download Model | |
| ```bash | |
| cd util | |
| python download_weights.py | |
| python down_raft.py | |
| ``` | |
| Once ready, the weights will be organized in this format: | |
| ``` | |
| π¦ ckpts/ | |
| βββ π face_encoder/ | |
| βββ π scheduler/ | |
| βββ π text_encoder/ | |
| βββ π tokenizer/ | |
| βββ π transformer/ | |
| βββ π vae/ | |
| βββ π configuration.json | |
| βββ π model_index.json | |
| ``` | |
| ## ποΈ Training | |
| ```bash | |
| # For single rank | |
| bash train_single_rank.sh | |
| # For multi rank | |
| bash train_multi_rank.sh | |
| ``` | |
| ## ποΈ Inference | |
| ```bash | |
| python inference.py --img_file_path assets/example_images/1.png --json_file_path assets/example_images/1.json | |
| ``` | |
| ## BibTeX | |
| If you find our work useful in your research, please consider citing our paper: | |
| ```bibtex | |
| @article{zhang2025proteus, | |
| title={Proteus-ID: ID-Consistent and Motion-Coherent Video Customization}, | |
| author={Zhang, Guiyu and Shi, Chen and Jiang, Zijian and Xiang, Xunzhi and Qian, Jingjing and Shi, Shaoshuai and Jiang, Li}, | |
| journal={arXiv preprint arXiv:2506.23729}, | |
| year={2025} | |
| } | |
| ``` | |
| ## Acknowledgement | |
| Thansk for these excellent opensource works and models: [CogVideoX](https://github.com/THUDM/CogVideo); [ConsisID](https://github.com/PKU-YuanGroup/ConsisID); [diffusers](https://github.com/huggingface/diffusers). | |