leoisufa commited on
Commit
f9908c5
Β·
verified Β·
1 Parent(s): 3175213

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -1 +1,7 @@
1
  diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
1
  diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text
2
+ assets/gifs/output_small.gif filter=lfs diff=lfs merge=lfs -text
3
+ assets/glasses.mp4 filter=lfs diff=lfs merge=lfs -text
4
+ assets/images/Picture[[:space:]]1.png filter=lfs diff=lfs merge=lfs -text
5
+ assets/images/demo.jpg filter=lfs diff=lfs merge=lfs -text
6
+ assets/kid.mp4 filter=lfs diff=lfs merge=lfs -text
7
+ assets/woman.mp4 filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,26 +1,69 @@
1
- # **ICVE: In-Context Learning with Unpaired Clips for Instruction-based Video Editing**
2
- *Arxiv 2025*
3
-
4
- **Xinyao Liao**<sup>1,2</sup>, **Xianfang Zeng**<sup>2</sup>, **Ziye Song**<sup>1</sup>, **Zhoujie Fu**<sup>1,2</sup>, **Gang Yu**<sup>2*</sup>, **Guosheng Lin**<sup>1*</sup>
5
- <sup>1</sup> Nanyang Technological University  <sup>2</sup> StepFun
6
-
7
- **Project Leader:** *Xianfang Zeng*
8
- **Corresponding Authors:** *Gang Yu, Guosheng Lin*
9
-
10
- PyTorch implementation of the paper:
11
- [In-Context Learning with Unpaired Clips for Instruction-based Video Editing](https://arxiv.org/)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
  ## 🧩 Overview
14
- ICVE proposes a low-cost pretraining strategy for instruction-based video editing via in-context learning from unpaired clips. Built upon [HunyuanVideoT2V](https://github.com/Tencent-Hunyuan/HunyuanVideo), it first learns editing concepts from about 1M unpaired videos, then fine-tunes on <150K paired editing data for improved instruction alignment and visual quality β€” enabling general editing operations guided by natural language.
15
 
16
- ## πŸŽ₯ Demo
17
  <p align="center">
18
- <a href="https://youtu.be/fmjmOWqQo88">
19
- <img src="https://img.youtube.com/vi/fmjmOWqQo88/0.jpg"
20
  alt="ICVE Demo Video"
21
  width="80%"
22
- style="max-width:900px;">
23
  </a>
 
 
24
  </p>
25
 
26
  ## πŸ› οΈ Dependencies and Installation
@@ -54,15 +97,15 @@ python -m pip install ninja
54
  python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3
55
  ```
56
 
57
- ## 🧱 Download Pretrained Models
58
  1. **HunyuanVideo Pretrained Weights**
59
  Follow the official HunyuanVideo instructions here:
60
  πŸ‘‰ [Download Pretrained Models](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/ckpts/README.md)
61
  and place the downloaded weights into the `ckpts/` directory as shown above.
62
  2. **ICVE Checkpoint**
63
- Download the our model weights from
64
- πŸ‘‰ [Hugging Face](https://huggingface.co/leoisufa/ICVE)
65
- and place them in the `checkpoint/` directory.
66
 
67
  The folder structure of this project should look like this after setup:
68
  ```shell
@@ -103,16 +146,19 @@ python sample_video.py \
103
  --save-path ./results
104
  ```
105
 
 
 
 
 
 
 
106
  ## πŸ”— BibTeX
107
- If you find [ICEV](https://arxiv.org/) useful for your research and applications, please cite using this BibTeX:
108
  ```BibTeX
109
- @article{liao2025icve,
110
- title = {In-Context Learning with Unpaired Clips for Instruction-based Video Editing},
111
- author = {Liao, Xinyao and Zeng, Xianfang and Song, Ziye and Fu, Zhoujie and Yu, Gang and Lin, Guosheng},
112
- journal = {arXiv preprint arXiv:25xx.xxxx},
113
- year = {2025}
114
  }
115
- ```
116
-
117
- ## πŸ™ Acknowledgements
118
- This work builds upon the open-source efforts of [HunyuanVideo](https://github.com/Tencent-Hunyuan/HunyuanVideo) and [FastVideo](https://github.com/hao-ai-lab/FastVideo).
 
1
+ <div align="center">
2
+ <!-- Project Title -->
3
+ <h1>
4
+ ICVE: In-Context Learning with Unpaired Clips for<br>
5
+ Instruction-based Video Editing
6
+ </h1>
7
+ <!-- Project Badges -->
8
+ <p>
9
+ <a href="https://arxiv.org/abs/2510.14648">
10
+ <img src="https://img.shields.io/badge/arXiv-2510.14648-b31b1b.svg" alt="arXiv"/>
11
+ </a>
12
+ <a href="https://huggingface.co/leoisufa/ICVE">
13
+ <img src="https://img.shields.io/badge/HuggingFace-Model-yellow.svg" alt="HuggingFace"/>
14
+ </a>
15
+ </p>
16
+ </div>
17
+
18
+ <div align="center">
19
+ <strong>Xinyao Liao<sup>1,2</sup></strong>,
20
+ <strong>Xianfang Zeng<sup>2</sup></strong>,
21
+ <strong>Ziye Song<sup>1</sup></strong>,
22
+ <strong>Zhoujie Fu<sup>1,2</sup></strong>,
23
+ <strong>Gang Yu<sup>2*</sup></strong>,
24
+ <strong>Guosheng Lin<sup>1*</sup></strong>
25
+ <br><br>
26
+ <b>
27
+ <sup>1</sup> Nanyang Technological University
28
+ <a href="#">
29
+ <img src="assets/images/Picture 1.png" alt="NTU Logo"
30
+ style="margin-bottom: -4px; height: 22px;">
31
+ </a>
32
+   
33
+ <sup>2</sup> StepFun
34
+ <a href="#">
35
+ <img src="assets/images/Picture 2.png" alt="StepFun Logo"
36
+ style="margin-bottom: -4px; height: 22px;">
37
+ </a>
38
+ </b>
39
+ </div>
40
+
41
+ <div align="center">
42
+ <img src="assets/gifs/output_small.gif"
43
+ alt="Demo GIF"
44
+ width="100%"
45
+ style="max-width:900px;">
46
+ </div>
47
+
48
+ **Star us if you find this project useful! ⭐**
49
+
50
+ ## πŸŽ‰ Updates
51
+ - [10/2025] πŸ”₯ [Model checkpoints](https://huggingface.co/leoisufa/ICVE) is released!
52
+ - [10/2025] πŸ”₯ [Codebase](https://github.com/leoisufa/ICVE) is relased!
53
 
54
  ## 🧩 Overview
55
+ ICVE proposes a low-cost pretraining strategy for instruction-based video editing via in-context learning from unpaired clips. Built upon [HunyuanVideoT2V](https://github.com/Tencent-Hunyuan/HunyuanVideo), it first learns editing concepts from about **1M** unpaired videos, then fine-tunes on **<150K** paired editing data for improved instruction alignment and visual quality β€” enabling general editing operations guided by natural language.
56
 
57
+ ## πŸŽ₯ Video Demo
58
  <p align="center">
59
+ <a href="https://youtu.be/ZPXPMJUJnwU" target="_blank">
60
+ <img src="https://img.youtube.com/vi/ZPXPMJUJnwU/maxresdefault.jpg"
61
  alt="ICVE Demo Video"
62
  width="80%"
63
+ style="max-width:900px; border-radius:10px; box-shadow:0 0 10px rgba(0,0,0,0.15);">
64
  </a>
65
+ <br>
66
+ <em>Click the image above to watch the full video on YouTube 🎬</em>
67
  </p>
68
 
69
  ## πŸ› οΈ Dependencies and Installation
 
97
  python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3
98
  ```
99
 
100
+ ## 🧱 Download Models
101
  1. **HunyuanVideo Pretrained Weights**
102
  Follow the official HunyuanVideo instructions here:
103
  πŸ‘‰ [Download Pretrained Models](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/ckpts/README.md)
104
  and place the downloaded weights into the `ckpts/` directory as shown above.
105
  2. **ICVE Checkpoint**
106
+ Download the our model weights from
107
+ πŸ‘‰ [Hugging Face](https://huggingface.co/leoisufa/ICVE)
108
+ and place them in the `checkpoint/` directory.
109
 
110
  The folder structure of this project should look like this after setup:
111
  ```shell
 
146
  --save-path ./results
147
  ```
148
 
149
+ ## πŸ™ Acknowledgements
150
+ We thank the following prior art for their excellent open source work:
151
+ - [HunyuanVideo](https://github.com/Tencent-Hunyuan/HunyuanVideo)
152
+ - [FastVideo](https://github.com/hao-ai-lab/FastVideo)
153
+ - [VACE](https://github.com/ali-vilab/VACE)
154
+
155
  ## πŸ”— BibTeX
156
+ If you find [ICEV](https://arxiv.org/abs/2510.14648) useful for your research and applications, please cite using this BibTeX:
157
  ```BibTeX
158
+ @article{xu2025withanyone,
159
+ title={In-Context Learning with Unpaired Clips for Instruction-based Video Editing},
160
+ author={Xinyao Liao and Xianfang Zeng and Ziye Song and Zhoujie Fu and Gang Yu and Guosheng Lin},
161
+ journal={arXiv preprint arxiv:2510.14648},
162
+ year={2025}
163
  }
164
+ ```
 
 
 
assets/gifs/output_small.gif ADDED

Git LFS Details

  • SHA256: 2520a98120d51f762ea11a32a9bc9bc58e9ccd93f8815b51e1e72d5e175135eb
  • Pointer size: 132 Bytes
  • Size of remote file: 6.63 MB
assets/glasses.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:be4b3b4dce8edcbe03e580e30a5bea7e186ff682fa919300f32f2dac4e572923
3
+ size 194680
assets/images/Picture 1.png ADDED

Git LFS Details

  • SHA256: f4206d56c91c000a4a5e38367a15081b1663367a308bf07b085a60fe669a9d9f
  • Pointer size: 131 Bytes
  • Size of remote file: 143 kB
assets/images/Picture 2.png ADDED
assets/images/demo.jpg ADDED

Git LFS Details

  • SHA256: 9cc47ec05387959495bebe9e4c75db39acc669e47ac184b359590e1391a2d1b4
  • Pointer size: 132 Bytes
  • Size of remote file: 1.36 MB
assets/kid.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:349af366331630344b73bb196a4ed7e4fb1e8d84732e37cc9eb942fee2eaa096
3
+ size 430271
assets/woman.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:008337b6b3f1c853055def928ccade6f5178048e997baf794208053bbe1c1e4b
3
+ size 187613