File size: 2,855 Bytes
e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c 768a7ae e97f66c |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
<div align ="center">
<h1> Proteus-ID </h1>
<h3> Proteus-ID: ID-Consistent and Motion-Coherent Video Customization </h3>
<div align="center">
</div>
[](https://grenoble-zhang.github.io/Proteus-ID/)
[](https://arxiv.org/abs/2506.23729)
</div>
Authors: [Guiyu Zhang](https://grenoble-zhang.github.io/)<sup>1</sup>, [Chen Shi](https://scholar.google.com.hk/citations?user=o-K_AoYAAAAJ&hl=en)<sup>1</sup>, Zijian Jiang<sup>1</sup>, Xunzhi Xiang<sup>2</sup>, Jingjing Qian<sup>1</sup>, [Shaoshuai Shi](https://shishaoshuai.com/)<sup>3</sup>, [Li Jiangβ ](https://llijiang.github.io/)<sup>1</sup>
<sup>1</sup> The Chinese University of Hong Kong, Shenzhen <sup>2</sup> Nanjing University 
<sup>3</sup> Voyager Research, Didi Chuxing
## TODO
- [x] Release arXiv technique report
- [x] Release full codes
- [ ] Release dataset (coming soon)
## π οΈ Requirements and Installation
### Environment
```bash
# 0. Clone the repo
git clone --depth=1 https://github.com/grenoble-zhang/Proteus-ID.git
cd /nfs/dataset-ofs-voyager-research/guiyuzhang/Opensource/code/Proteus-ID-main
# 1. Create conda environment
conda create -n proteusid python=3.11.0
conda activate proteusid
# 3. Install PyTorch and other dependencies
# CUDA 12.6
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
# 4. Install pip dependencies
pip install -r requirements.txt
```
### Download Model
```bash
cd util
python download_weights.py
python down_raft.py
```
Once ready, the weights will be organized in this format:
```
π¦ ckpts/
βββ π face_encoder/
βββ π scheduler/
βββ π text_encoder/
βββ π tokenizer/
βββ π transformer/
βββ π vae/
βββ π configuration.json
βββ π model_index.json
```
## ποΈ Training
```bash
# For single rank
bash train_single_rank.sh
# For multi rank
bash train_multi_rank.sh
```
## ποΈ Inference
```bash
python inference.py --img_file_path assets/example_images/1.png --json_file_path assets/example_images/1.json
```
## BibTeX
If you find our work useful in your research, please consider citing our paper:
```bibtex
@article{zhang2025proteus,
title={Proteus-ID: ID-Consistent and Motion-Coherent Video Customization},
author={Zhang, Guiyu and Shi, Chen and Jiang, Zijian and Xiang, Xunzhi and Qian, Jingjing and Shi, Shaoshuai and Jiang, Li},
journal={arXiv preprint arXiv:2506.23729},
year={2025}
}
```
## Acknowledgement
Thansk for these excellent opensource works and models: [CogVideoX](https://github.com/THUDM/CogVideo); [ConsisID](https://github.com/PKU-YuanGroup/ConsisID); [diffusers](https://github.com/huggingface/diffusers).
|