Update README.md
Browse files
README.md
CHANGED
|
@@ -1,163 +1,92 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
-
|
| 4 |
-
|
| 5 |
-
datasets:
|
| 6 |
-
- BestWishYsh/ConsisID-preview-Data
|
| 7 |
-
language:
|
| 8 |
-
- en
|
| 9 |
-
library_name: diffusers
|
| 10 |
-
license: apache-2.0
|
| 11 |
-
pipeline_tag: text-to-video
|
| 12 |
-
tags:
|
| 13 |
-
- IPT2V
|
| 14 |
-
base_model_relation: finetune
|
| 15 |
-
---
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
<div align=center>
|
| 19 |
-
<img src="https://github.com/PKU-YuanGroup/ConsisID/blob/main/asserts/ConsisID_logo.png?raw=true" width="150px">
|
| 20 |
</div>
|
| 21 |
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
<a href="https://huggingface.co/spaces/BestWishYsh/ConsisID-preview-Space">π€ Huggingface Space</a> |
|
| 26 |
-
<a href="https://pku-yuangroup.github.io/ConsisID">π Page </a> |
|
| 27 |
-
<a href="https://github.com/PKU-YuanGroup/ConsisID">π Github </a> |
|
| 28 |
-
<a href="https://arxiv.org/abs/2411.17440">π arxiv </a> |
|
| 29 |
-
<a href="https://huggingface.co/datasets/BestWishYsh/ConsisID-preview-Data">π³ Dataset</a>
|
| 30 |
-
</p>
|
| 31 |
-
<p align="center">
|
| 32 |
-
<h5 align="center"> If you like our project, please give us a star β on GitHub for the latest update. </h5>
|
| 33 |
|
|
|
|
| 34 |
|
| 35 |
-
|
|
|
|
| 36 |
|
| 37 |
-
|
| 38 |
-
[](https://www.youtube.com/watch?v=PhlgC-bI5SQ)
|
| 39 |
-
or you can click <a href="https://github.com/SHYuanBest/shyuanbest_media/raw/refs/heads/main/ConsisID/showcase_videos.mp4">here</a> to watch the video.
|
| 40 |
|
| 41 |
-
|
|
|
|
|
|
|
| 42 |
|
| 43 |
-
|
|
|
|
| 44 |
|
| 45 |
-
|
| 46 |
-
|
|
|
|
| 47 |
|
| 48 |
-
|
| 49 |
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
pip install git+https://github.com/huggingface/diffusers.git
|
| 54 |
-
```
|
| 55 |
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
from diffusers.pipelines.consisid.consisid_utils import prepare_face_models, process_face_embeddings_infer
|
| 62 |
-
from diffusers.utils import export_to_video
|
| 63 |
-
from huggingface_hub import snapshot_download
|
| 64 |
-
|
| 65 |
-
snapshot_download(repo_id="BestWishYsh/ConsisID-preview", local_dir="BestWishYsh/ConsisID-preview")
|
| 66 |
-
face_helper_1, face_helper_2, face_clip_model, face_main_model, eva_transform_mean, eva_transform_std = (
|
| 67 |
-
prepare_face_models("BestWishYsh/ConsisID-preview", device="cuda", dtype=torch.bfloat16)
|
| 68 |
-
)
|
| 69 |
-
pipe = ConsisIDPipeline.from_pretrained("BestWishYsh/ConsisID-preview", torch_dtype=torch.bfloat16)
|
| 70 |
-
pipe.to("cuda")
|
| 71 |
-
|
| 72 |
-
# ConsisID works well with long and well-described prompts. Make sure the face in the image is clearly visible (e.g., preferably half-body or full-body).
|
| 73 |
-
prompt = "The video captures a boy walking along a city street, filmed in black and white on a classic 35mm camera. His expression is thoughtful, his brow slightly furrowed as if he's lost in contemplation. The film grain adds a textured, timeless quality to the image, evoking a sense of nostalgia. Around him, the cityscape is filled with vintage buildings, cobblestone sidewalks, and softly blurred figures passing by, their outlines faint and indistinct. Streetlights cast a gentle glow, while shadows play across the boy's path, adding depth to the scene. The lighting highlights the boy's subtle smile, hinting at a fleeting moment of curiosity. The overall cinematic atmosphere, complete with classic film still aesthetics and dramatic contrasts, gives the scene an evocative and introspective feel."
|
| 74 |
-
image = "https://github.com/PKU-YuanGroup/ConsisID/blob/main/asserts/example_images/2.png?raw=true"
|
| 75 |
-
|
| 76 |
-
id_cond, id_vit_hidden, image, face_kps = process_face_embeddings_infer(
|
| 77 |
-
face_helper_1,
|
| 78 |
-
face_clip_model,
|
| 79 |
-
face_helper_2,
|
| 80 |
-
eva_transform_mean,
|
| 81 |
-
eva_transform_std,
|
| 82 |
-
face_main_model,
|
| 83 |
-
"cuda",
|
| 84 |
-
torch.bfloat16,
|
| 85 |
-
image,
|
| 86 |
-
is_align_face=True,
|
| 87 |
-
)
|
| 88 |
-
|
| 89 |
-
video = pipe(
|
| 90 |
-
image=image,
|
| 91 |
-
prompt=prompt,
|
| 92 |
-
num_inference_steps=50,
|
| 93 |
-
guidance_scale=6.0,
|
| 94 |
-
use_dynamic_cfg=False,
|
| 95 |
-
id_vit_hidden=id_vit_hidden,
|
| 96 |
-
id_cond=id_cond,
|
| 97 |
-
kps_cond=face_kps,
|
| 98 |
-
generator=torch.Generator("cuda").manual_seed(42),
|
| 99 |
-
)
|
| 100 |
-
export_to_video(video.frames[0], "output.mp4", fps=8)
|
| 101 |
```
|
| 102 |
|
| 103 |
-
|
| 104 |
|
| 105 |
-
ConsisID has high requirements for prompt quality. You can use [GPT-4o](https://chatgpt.com/) to refine the input text prompt, an example is as follows (original prompt: "a man is playing guitar.")
|
| 106 |
```bash
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
The video features a man standing next to an airplane, engaged in a conversation on his cell phone. he is wearing sunglasses and a black top, and he appears to be talking seriously. The airplane has a green stripe running along its side, and there is a large engine visible behind his. The man seems to be standing near the entrance of the airplane, possibly preparing to board or just having disembarked. The setting suggests that he might be at an airport or a private airfield. The overall atmosphere of the video is professional and focused, with the man's attire and the presence of the airplane indicating a business or travel context.
|
| 112 |
```
|
| 113 |
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 119 |
|
| 120 |
-
|
| 121 |
-
| :----------------------------- | :------------------- | :------------------ |
|
| 122 |
-
| - | 37 GB | 44 GB |
|
| 123 |
-
| enable_model_cpu_offload | 22 GB | 25 GB |
|
| 124 |
-
| enable_sequential_cpu_offload | 16 GB | 22 GB |
|
| 125 |
-
| vae.enable_slicing | 16 GB | 22 GB |
|
| 126 |
-
| vae.enable_tiling | 5 GB | 7 GB |
|
| 127 |
|
| 128 |
```bash
|
| 129 |
-
#
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
pipe.vae.enable_tiling()
|
| 134 |
```
|
| 135 |
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
## π Description
|
| 139 |
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
|
| 144 |
-
## βοΈ Citation
|
| 145 |
-
If you find our paper and code useful in your research, please consider giving a star and citation.
|
| 146 |
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
|
|
|
| 153 |
year={2025}
|
| 154 |
}
|
| 155 |
```
|
| 156 |
|
| 157 |
-
##
|
| 158 |
-
|
| 159 |
-
<a href="https://github.com/PKU-YuanGroup/ConsisID/graphs/contributors">
|
| 160 |
-
<img src="https://contrib.rocks/image?repo=PKU-YuanGroup/ConsisID&anon=true" />
|
| 161 |
|
| 162 |
-
|
| 163 |
-
```
|
|
|
|
| 1 |
+
<div align ="center">
|
| 2 |
+
<h1> Proteus-ID </h1>
|
| 3 |
+
<h3> Proteus-ID: ID-Consistent and Motion-Coherent Video Customization </h3>
|
| 4 |
+
<div align="center">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
</div>
|
| 6 |
|
| 7 |
+
[](https://grenoble-zhang.github.io/Proteus-ID/)
|
| 8 |
+
[](https://arxiv.org/abs/2506.23729)
|
| 9 |
+
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
+
Authors: [Guiyu Zhang](https://grenoble-zhang.github.io/)<sup>1</sup>, [Chen Shi](https://scholar.google.com.hk/citations?user=o-K_AoYAAAAJ&hl=en)<sup>1</sup>, Zijian Jiang<sup>1</sup>, Xunzhi Xiang<sup>2</sup>, Jingjing Qian<sup>1</sup>, [Shaoshuai Shi](https://shishaoshuai.com/)<sup>3</sup>, [Li Jiangβ ](https://llijiang.github.io/)<sup>1</sup>
|
| 12 |
|
| 13 |
+
<sup>1</sup> The Chinese University of Hong Kong, Shenzhen <sup>2</sup> Nanjing University 
|
| 14 |
+
<sup>3</sup> Voyager Research, Didi Chuxing
|
| 15 |
|
| 16 |
+
## TODO
|
|
|
|
|
|
|
| 17 |
|
| 18 |
+
- [x] Release arXiv technique report
|
| 19 |
+
- [x] Release full codes
|
| 20 |
+
- [ ] Release dataset (coming soon)
|
| 21 |
|
| 22 |
+
## π οΈ Requirements and Installation
|
| 23 |
+
### Environment
|
| 24 |
|
| 25 |
+
```bash
|
| 26 |
+
# 0. Clone the repo
|
| 27 |
+
git clone --depth=1 https://github.com/grenoble-zhang/Proteus-ID.git
|
| 28 |
|
| 29 |
+
cd /nfs/dataset-ofs-voyager-research/guiyuzhang/Opensource/code/Proteus-ID-main
|
| 30 |
|
| 31 |
+
# 1. Create conda environment
|
| 32 |
+
conda create -n proteusid python=3.11.0
|
| 33 |
+
conda activate proteusid
|
|
|
|
|
|
|
| 34 |
|
| 35 |
+
# 3. Install PyTorch and other dependencies
|
| 36 |
+
# CUDA 12.6
|
| 37 |
+
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
|
| 38 |
+
# 4. Install pip dependencies
|
| 39 |
+
pip install -r requirements.txt
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
```
|
| 41 |
|
| 42 |
+
### Download Model
|
| 43 |
|
|
|
|
| 44 |
```bash
|
| 45 |
+
cd util
|
| 46 |
+
python download_weights.py
|
| 47 |
+
python down_raft.py
|
|
|
|
|
|
|
| 48 |
```
|
| 49 |
|
| 50 |
+
Once ready, the weights will be organized in this format:
|
| 51 |
+
```
|
| 52 |
+
π¦ ckpts/
|
| 53 |
+
βββ π face_encoder/
|
| 54 |
+
βββ π scheduler/
|
| 55 |
+
βββ π text_encoder/
|
| 56 |
+
βββ π tokenizer/
|
| 57 |
+
βββ π transformer/
|
| 58 |
+
βββ π vae/
|
| 59 |
+
βββ π configuration.json
|
| 60 |
+
βββ π model_index.json
|
| 61 |
+
```
|
| 62 |
|
| 63 |
+
## ποΈ Training
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
```bash
|
| 66 |
+
# For single rank
|
| 67 |
+
bash train_single_rank.sh
|
| 68 |
+
# For multi rank
|
| 69 |
+
bash train_multi_rank.sh
|
|
|
|
| 70 |
```
|
| 71 |
|
| 72 |
+
## ποΈ Inference
|
|
|
|
|
|
|
| 73 |
|
| 74 |
+
```bash
|
| 75 |
+
python inference.py --img_file_path assets/example_images/1.png --json_file_path assets/example_images/1.json
|
| 76 |
+
```
|
| 77 |
|
|
|
|
|
|
|
| 78 |
|
| 79 |
+
## BibTeX
|
| 80 |
+
If you find our work useful in your research, please consider citing our paper:
|
| 81 |
+
```bibtex
|
| 82 |
+
@article{zhang2025proteus,
|
| 83 |
+
title={Proteus-ID: ID-Consistent and Motion-Coherent Video Customization},
|
| 84 |
+
author={Zhang, Guiyu and Shi, Chen and Jiang, Zijian and Xiang, Xunzhi and Qian, Jingjing and Shi, Shaoshuai and Jiang, Li},
|
| 85 |
+
journal={arXiv preprint arXiv:2506.23729},
|
| 86 |
year={2025}
|
| 87 |
}
|
| 88 |
```
|
| 89 |
|
| 90 |
+
## Acknowledgement
|
|
|
|
|
|
|
|
|
|
| 91 |
|
| 92 |
+
Thansk for these excellent opensource works and models: [CogVideoX](https://github.com/THUDM/CogVideo); [ConsisID](https://github.com/PKU-YuanGroup/ConsisID); [diffusers](https://github.com/huggingface/diffusers).
|
|
|