XiangpengYang commited on
Commit
d3fc679
·
1 Parent(s): beb2ec7

clear readme

Browse files
Files changed (1) hide show
  1. README.md +1 -211
README.md CHANGED
@@ -9,214 +9,4 @@ app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
  short_description: Unified Video Editing with Temporal Reasoner
12
- ---
13
-
14
- <div align="center">
15
-
16
- <h1 style="margin: 0; font-size: 2.4em;">
17
- Unified Video Editing with Temporal Reasoner
18
- </h1>
19
-
20
- <h4 style="margin: 15px 0; color: #2c3e50;">
21
- 👁️ See &rarr; 🧠 Reason &rarr; ✏️ Edit
22
- </h4>
23
-
24
- <h4 style="margin: 15px 0; color: #2c3e50;">
25
- 🚀 A Chain of Frames video editing method enbale temporal reasoning and 4x video length extrapolation with just 50k training pairs!
26
- </h4>
27
-
28
- [![Hugging Face Daily Paper](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Daily%20Paper-yellow)](https://huggingface.co/papers/2512.07469)
29
- [![arXiv](https://img.shields.io/badge/arXiv-2512.07469-b31b1b.svg)](https://arxiv.org/abs/2512.07469)
30
- [![Project Page](https://img.shields.io/badge/Project-Page-green)](https://videocof.github.io)
31
- [![Hugging Face Model](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-yellow)](https://huggingface.co/XiangpengYang/VideoCoF)
32
- ![visitors](https://visitor-badge.laobi.icu/badge?page_id=videocof.VideoCoF&left_color=green&right_color=red)
33
-
34
- </div>
35
-
36
- <div align="center">
37
- <b>
38
- <a href="https://scholar.google.com/citations?user=reiIeYMAAAAJ">Xiangpeng Yang</a><sup>1</sup>,
39
- <a href="https://horizonwind2004.github.io/">Ji Xie</a><sup>2</sup>,
40
- <a href="https://scholar.google.com/citations?user=OvfI_HMAAAAJ">Yiyuan Yang</a><sup>1</sup>,
41
- <a href="https://scholar.google.com/citations?user=zfeWd6gAAAAJ">Yan Huang</a><sup>1</sup>,
42
- <a href="https://scholar.google.com/citations?user=sCuACdkAAAAJ">Min Xu</a><sup>1</sup>,
43
- <a href="https://scholar.google.com/citations?user=sCuACdkAAAAJ">Qiang Wu</a><sup>1</sup>
44
- </b>
45
- <br>
46
- <span style="font-size: 1em; color: #555;"><sup>1</sup>University of Technology Sydney, <sup>2</sup>Zhejiang University</span>
47
- </div>
48
-
49
- <br>
50
-
51
- ## 💿 Introduction
52
-
53
- https://github.com/user-attachments/assets/26f7d347-3d6c-43cf-9645-6eb5906f6ad6
54
-
55
- ## 🔥 News
56
-
57
- - **2025.12.09**: Paper available on arXiv.
58
- - **2025.12.08**: Release the inference code and videocof-50k weight.
59
- - **2025.12.06**: 🔥 Project Page and README updated!
60
-
61
-
62
- ## 📑 Table of Contents
63
-
64
- - [🔧 Quick Start](#-quick-start)
65
- - [🏆 Model Zoo](#-model-zoo)
66
- - [🍭 Results](#-results)
67
- - [🎨 Edit Comparison](#-edit-comparison)
68
- - [🚧 TODO](#-todo)
69
- - [🙏 Acknowledgments](#-acknowledgments)
70
- - [📜 License](#-license)
71
- - [📮 Contact](#-contact)
72
- - [📄 Citation](#-citation)
73
-
74
- ## 🔧 Quick Start
75
-
76
- 1. **Clone the repository:**
77
-
78
- ```bash
79
- git clone https://github.com/videocof/VideoCoF.git
80
- cd VideoCoF
81
- ```
82
-
83
- 2. **Install dependencies:**
84
-
85
- ```bash
86
- # 1. Create and activate a conda environment
87
- conda create -n videocof python=3.10
88
- conda activate videocof
89
-
90
- # 2. Install PyTorch (Choose version compatible with your CUDA)
91
- # For standard GPUs (CUDA 12.1):
92
- pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
93
-
94
- # For Hopper GPUs (e.g., H100/H800) requiring fast inference:
95
- # pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128
96
-
97
- # 3. Install other dependencies
98
- pip install -r requirements.txt
99
- ```
100
-
101
- **Note on Flash Attention:**
102
- We recommend using **FlashAttention-3** (currently beta) for optimal performance, especially on NVIDIA H100/H800 GPUs.
103
- If you are using these GPUs, please follow the [official FlashAttention-3 installation guide](https://github.com/Dao-AILab/flash-attention?tab=readme-ov-file#flashattention-3-beta-release) after installing the compatible PyTorch version (e.g., PyTorch 2.8 + CUDA 12.8).
104
-
105
-
106
- 3. **Download Models:**
107
-
108
- **Wan-2.1-T2V-14B Pretrained Weights:**
109
-
110
- ```bash
111
- git lfs install
112
- git clone https://huggingface.co/Wan-AI/Wan2.1-T2V-14B
113
-
114
- # Or using huggingface-cli:
115
- # hf download Wan-AI/Wan2.1-T2V-14B --local-dir Wan2.1-T2V-14B
116
- ```
117
-
118
- **VideoCoF Checkpoint:**
119
-
120
- ```bash
121
- git lfs install
122
- git clone https://huggingface.co/XiangpengYang/VideoCoF videocof_weight
123
-
124
- # Or using huggingface-cli:
125
- # hf download XiangpengYang/VideoCoF --local-dir videocof_weight
126
- ```
127
-
128
- 4. **Inference:**
129
-
130
- For single inference tasks:
131
-
132
- ```bash
133
- # Object Removal
134
- sh scripts/obj_rem.sh
135
-
136
- # Object Addition
137
- sh scripts/obj_add.sh
138
-
139
- # Local Style Transfer
140
- sh scripts/local_style.sh
141
- ```
142
-
143
- For parallel inference:
144
-
145
- ```bash
146
- sh scripts/parallel_infer.sh
147
- ```
148
-
149
- ## 🏆 Model Zoo
150
-
151
- Our models are available on Hugging Face:
152
-
153
- | Model Name | Description | Link |
154
- |------------|-------------|------|
155
- | VideoCoF-Base | Base model trained on 50k video pairs | [Hugging Face](https://huggingface.co/XiangpengYang/VideoCoF) |
156
-
157
- ## 🍭 Results
158
-
159
- ### Why We Need Reasoning Before Editing?
160
- ![](assets/motivation_v2.gif)
161
-
162
- Current video editing methods typically follow two paths:
163
- 1. **Expert models**: Rely on external masks for precision but sacrifice unification.
164
- 2. **Unified in-context learning models**: Mask-free but often struggle with spatial accuracy due to the lack of explicit cues.
165
-
166
- **VideoCoF** bridges this gap by predicting reasoning tokens before generating the target video tokens.
167
-
168
- ### Key Capabilities
169
-
170
- 1. **Seeing, Reasoning, Editing**: VideoCoF adopts a "seeing, reasoning, editing" approach, ensuring edits are applied accurately to the intended targets.
171
- 2. **Length Extrapolation**: Trained on only **50k** data (33 frames), VideoCoF demonstrates robust multi-shot editing and length generalization (e.g., 4&times; length extrapolation).
172
- 3. **Diverse Editing Tasks**: Supports fine-grained (instance and part level, spatial aware) Object Removal, Object Addition, Object Swap, and Local Style Transfer.
173
-
174
- ### Gallery Highlights
175
-
176
- > Please refer to our [Project Page](https://videocof.github.io) for the full gallery.
177
-
178
- * **Object Removal**: Remove people or objects based on text prompts.
179
- * **Object Addition**: Add elements like animals, objects, or people.
180
- * **Object Swap**: Change specific attributes or objects.
181
- * **Local Style Transfer**: Modify texture, materials or colors.
182
-
183
- ## 🚧 TODO
184
-
185
- - [x] Release paper.
186
- - [x] Release inference code and weights.
187
- - [ ] Release training code.
188
- - [ ] Release training data.
189
- - [ ] Add Hugging Face demo.
190
-
191
- ## 🙏 Acknowledgments
192
-
193
- We thank the authors of related works and the open-source community [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun) and [Wan](https://github.com/Wan-Video/Wan2.1) for their contributions.
194
-
195
- ## 📜 License
196
-
197
- This project is licensed under the [Apache License 2.0](LICENSE).
198
-
199
- ## 📮 Contact
200
-
201
- For any questions, please feel free to reach out to the author Xiangpeng Yang [@knightyxp](https://github.com/knightyxp), email: knightyxp@gmail.com/Xiangpeng.Yang@student.uts.edu.au
202
-
203
- ## 📄 Citation
204
-
205
- If you find this work useful for your research, please consider citing:
206
-
207
- ```bibtex
208
- @article{yang2025videocof,
209
- title={Unified Video Editing with Temporal Reasoner},
210
- author={Yang, Xiangpeng and Xie, Ji and Yang, Yiyuan and Huang, Yan and Xu, Min and Wu, Qiang},
211
- journal={arXiv preprint arXiv:2512.07469},
212
- year={2025}
213
- }
214
- ```
215
-
216
- <div align="center">
217
- ⭐ **If you find this project helpful, please consider giving it a star!** ⭐
218
- </div>
219
-
220
- ## ⭐️ Star History
221
-
222
- [![Star History Chart](https://api.star-history.com/svg?repos=knightyxp/VideoCoF&type=Date&legend=top-left)](https://star-history.com/#knightyxp/VideoCoF&Date)
 
9
  pinned: false
10
  license: apache-2.0
11
  short_description: Unified Video Editing with Temporal Reasoner
12
+ ---