csbhr commited on
Commit
874f1ed
Β·
verified Β·
1 Parent(s): 16b37c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +142 -3
README.md CHANGED
@@ -1,3 +1,142 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ <div align="center">
6
+
7
+ <h1>
8
+ Vivid-VR:<br>
9
+ Distilling Concepts from Diffusion Transformer for Photorealistic Video Restoration
10
+ </h1>
11
+
12
+ <div>
13
+ <a href='https://csbhr.github.io/' target='_blank'>Haoran Bai</a>,&emsp;
14
+ <a href='https://github.com/chenxx89' target='_blank'>Xiaoxu Chen</a>,&emsp;
15
+ <a href='https://ieeexplore.ieee.org/author/37088928879' target='_blank'>Canqian Yang</a>,&emsp;
16
+ <a href='https://github.com/HeZongyao' target='_blank'>Zongyao He</a>,&emsp;
17
+ <a href='https://scholar.google.com/citations?user=brmDxnsAAAAJ&hl=zh-CN' target='_blank'>Sibin Deng</a>,&emsp;
18
+ <a href='https://scholar.google.com/citations?user=NpTmcKEAAAAJ&hl=en' target='_blank'>Ying Chen<sup>βˆ—</sup></a>
19
+ </div>
20
+ <div>
21
+ Alibaba Group - Taobao & Tmall Group
22
+ </div>
23
+ <div>
24
+ * Corresponding author
25
+ </div>
26
+
27
+ <a href='#' target='_blank'>Paper (<span style='color:red;'>Coming soon!</span>)</a> |
28
+ <a href='https://csbhr.github.io/projects/vivid-vr/' target='_blank'>Project Page</a>
29
+
30
+
31
+ <div style="width: 100%; text-align: center; margin:auto;">
32
+ <img style="width:100%" src="assets/teaser.png">
33
+ </div>
34
+
35
+ For more quantitative results and visual results, go checkout our <a href="https://csbhr.github.io/projects/vivid-vr/" target="_blank">[project page]</a>
36
+
37
+ ---
38
+ </div>
39
+
40
+
41
+ ## 🎬 Overview
42
+ ![overall_structure](assets/framework.png)
43
+
44
+ ## πŸ”§ Dependencies and Installation
45
+ 1. Clone Repo
46
+ ```bash
47
+ git clone https://github.com/csbhr/Vivid-VR.git
48
+ cd Vivid-VR
49
+ ```
50
+
51
+ 2. Create Conda Environment and Install Dependencies
52
+ ```bash
53
+ # create new conda env
54
+ conda create -n Vivid-VR python=3.10
55
+ conda activate Vivid-VR
56
+
57
+ # install pytorch
58
+ pip install torch==2.2.1 torchvision==0.17.1 torchaudio==2.2.1 --index-url https://download.pytorch.org/whl/cu121
59
+
60
+ # install python dependencies
61
+ pip install -r requirements.txt
62
+
63
+ # install easyocr [Optional, for text fix]
64
+ pip install easyocr
65
+ pip install numpy==1.26.4 # numpy2.x maybe installed when installing easyocr, which will cause conflicts
66
+ ```
67
+
68
+ 3. Download Models
69
+
70
+ - [**Required**] Download CogVideoX1.5-5B checkpoints from [[huggingface]](https://huggingface.co/zai-org/CogVideoX1.5-5B).
71
+ - [**Required**] Download cogvlm2-llama3-caption checkpoints from [[huggingface]](https://huggingface.co/zai-org/cogvlm2-llama3-caption).
72
+ - [**Required**] Download Vivid-VR checkpoints from [[huggingface]](https://huggingface.co/csbhr/Vivid-VR).
73
+ - [**Optional, for text fix**] Download easyocr checkpoints [[english_g2]](https://github.com/JaidedAI/EasyOCR/releases/download/v1.3/english_g2.zip) [[zh_sim_g2]](https://github.com/JaidedAI/EasyOCR/releases/download/v1.3/zh_sim_g2.zip) [[craft_mlt_25k]](https://github.com/JaidedAI/EasyOCR/releases/download/pre-v1.1.6/craft_mlt_25k.zip).
74
+ - [**Optional, for text fix**] Download Real-ESRGAN checkpoints [[RealESRGAN_x2plus]](https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.1/RealESRGAN_x2plus.pth).
75
+ - Put them under the `./ckpts` folder.
76
+
77
+ The [`ckpts`](./ckpts) directory structure should be arranged as:
78
+
79
+ ```
80
+ β”œβ”€β”€ ckpts
81
+ β”‚ β”œβ”€β”€ CogVideoX1.5-5B
82
+ β”‚ β”‚ β”œβ”€β”€ ...
83
+ β”‚ β”œβ”€β”€ cogvlm2-llama3-caption
84
+ β”‚ β”‚ β”œβ”€β”€ ...
85
+ β”‚ β”œβ”€β”€ Vivid-VR
86
+ β”‚ β”‚ β”œβ”€β”€ controlnet
87
+ β”‚ β”‚ β”œβ”€β”€ config.json
88
+ β”‚ β”‚ β”œβ”€β”€ diffusion_pytorch_model.safetensors
89
+ β”‚ β”‚ β”œβ”€β”€ connectors.pt
90
+ β”‚ β”‚ β”œβ”€β”€ control_feat_proj.pt
91
+ β”‚ β”‚ β”œβ”€β”€ control_patch_embed.pt
92
+ β”‚ β”œβ”€β”€ easyocr
93
+ β”‚ β”‚ β”œβ”€β”€ craft_mlt_25k.pth
94
+ β”‚ β”‚ β”œβ”€β”€ english_g2.pth
95
+ β”‚ β”‚ β”œβ”€β”€ zh_sim_g2.pth
96
+ β”‚ β”œβ”€β”€ RealESRGAN
97
+ β”‚ β”‚ β”œβ”€β”€ RealESRGAN_x2plus.pth
98
+ ```
99
+
100
+
101
+ ## β˜•οΈ Quick Inference
102
+
103
+ Run the following commands to try it out:
104
+
105
+ ```shell
106
+ python VRDiT/inference.py \
107
+ --ckpt_dir=./ckpts \
108
+ --cogvideox_ckpt_path=./ckpts/CogVideoX1.5-5B \
109
+ --cogvlm2_ckpt_path=./ckpts/cogvlm2-llama3-caption \
110
+ --input_dir=/dir/to/input/videos \
111
+ --output_dir=/dir/to/output/videos \
112
+ --upscale=0 \ # Optional, if set to 0, the short-size of output videos will be 1024
113
+ --textfix \ # Optional, if given, the text region will be replaced by the output of Real-ESRGAN
114
+ --save_images # Optional, if given, the video frames will be saved
115
+
116
+ ```
117
+
118
+
119
+ ## πŸ“§ Citation
120
+
121
+ If you find our repo useful for your research, please consider citing it:
122
+
123
+ ```bibtex
124
+ @misc{bai2025vividvr,
125
+ title={Vivid-VR: Distilling Concepts from Diffusion Transformer for Photorealistic Video Restoration},
126
+ author={Haoran Bai and Xiaoxu Chen and Canqian Yang and Zongyao He and Sibin Deng and Ying Chen},
127
+ year={2025},
128
+ url={https://github.com/csbhr/Vivid-VR}
129
+ }
130
+ ```
131
+
132
+
133
+ ## πŸ“„ License
134
+ - This repo is built based on [diffusers v0.31.0](https://github.com/huggingface/diffusers/tree/v0.31.0), which is distributed under the terms of the [Apache License 2.0](https://github.com/huggingface/diffusers/blob/main/LICENSE).
135
+ - CogVideoX1.5-5B models are distributed under the terms of the [CogVideoX License](https://huggingface.co/zai-org/CogVideoX1.5-5B/blob/main/LICENSE).
136
+ - cogvlm2-llama3-caption models are distributed under the terms of the [CogVLM2 License](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LICENSE&status=0) and [LLAMA3 License](https://modelscope.cn/models/ZhipuAI/cogvlm2-video-llama3-base/file/view/master?fileName=LLAMA3_LICENSE&status=0).
137
+ - Real-ESRGAN models are distributed under the terms of the [BSD 3-Clause License](https://github.com/xinntao/Real-ESRGAN/blob/master/LICENSE).
138
+ - easyocr models are distributed under the terms of the [JAIDED.AI Terms and Conditions](https://www.jaided.ai/terms/).
139
+
140
+
141
+
142
+