Upload folder using huggingface_hub

Browse files

Files changed (9) hide show

.gitattributes +6 -0
README.md +76 -30
assets/gifs/output_small.gif +3 -0
assets/glasses.mp4 +3 -0
assets/images/Picture 1.png +3 -0
assets/images/Picture 2.png +0 -0
assets/images/demo.jpg +3 -0
assets/kid.mp4 +3 -0
assets/woman.mp4 +3 -0

.gitattributes CHANGED Viewed

	@@ -1 +1,7 @@
1	diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text

 diffusion_pytorch_model.safetensors filter=lfs diff=lfs merge=lfs -text
+assets/gifs/output_small.gif filter=lfs diff=lfs merge=lfs -text
+assets/glasses.mp4 filter=lfs diff=lfs merge=lfs -text
+assets/images/Picture[[:space:]]1.png filter=lfs diff=lfs merge=lfs -text
+assets/images/demo.jpg filter=lfs diff=lfs merge=lfs -text
+assets/kid.mp4 filter=lfs diff=lfs merge=lfs -text
+assets/woman.mp4 filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,26 +1,69 @@
-# **ICVE: In-Context Learning with Unpaired Clips for Instruction-based Video Editing**
-*Arxiv 2025*
-**Xinyao Liao**<sup>1,2</sup>, **Xianfang Zeng**<sup>2</sup>, **Ziye Song**<sup>1</sup>, **Zhoujie Fu**<sup>1,2</sup>, **Gang Yu**<sup>2*</sup>, **Guosheng Lin**<sup>1*</sup>
-<sup>1</sup> Nanyang Technological University  <sup>2</sup> StepFun
-**Project Leader:** *Xianfang Zeng*
-**Corresponding Authors:** *Gang Yu, Guosheng Lin*
-PyTorch implementation of the paper:
-[In-Context Learning with Unpaired Clips for Instruction-based Video Editing](https://arxiv.org/)
 ## 🧩 Overview
-ICVE proposes a low-cost pretraining strategy for instruction-based video editing via in-context learning from unpaired clips. Built upon [HunyuanVideoT2V](https://github.com/Tencent-Hunyuan/HunyuanVideo), it first learns editing concepts from about 1M unpaired videos, then fine-tunes on <150K paired editing data for improved instruction alignment and visual quality — enabling general editing operations guided by natural language.
-## 🎥 Demo
 <p align="center">
-  <a href="https://youtu.be/fmjmOWqQo88">
-    <img src="https://img.youtube.com/vi/fmjmOWqQo88/0.jpg"
          alt="ICVE Demo Video"
          width="80%"
-         style="max-width:900px;">
   </a>
 </p>
 ## 🛠️ Dependencies and Installation
@@ -54,15 +97,15 @@ python -m pip install ninja
 python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3
 ```
-## 🧱 Download Pretrained Models
 1. **HunyuanVideo Pretrained Weights**
    Follow the official HunyuanVideo instructions here:
    👉 [Download Pretrained Models](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/ckpts/README.md)
    and place the downloaded weights into the `ckpts/` directory as shown above.
 2. **ICVE Checkpoint**
-   Download the our model weights from
-   👉 [Hugging Face](https://huggingface.co/leoisufa/ICVE)
-   and place them in the `checkpoint/` directory.
 The folder structure of this project should look like this after setup:
 ```shell
@@ -103,16 +146,19 @@ python sample_video.py \
     --save-path ./results
 ```
 ## 🔗 BibTeX
-If you find [ICEV](https://arxiv.org/) useful for your research and applications, please cite using this BibTeX:
 ```BibTeX
-@article{liao2025icve,
-  title   = {In-Context Learning with Unpaired Clips for Instruction-based Video Editing},
-  author  = {Liao, Xinyao and Zeng, Xianfang and Song, Ziye and Fu, Zhoujie and Yu, Gang and Lin, Guosheng},
-  journal = {arXiv preprint arXiv:25xx.xxxx},
-  year    = {2025}
 }
-```
-## 🙏 Acknowledgements
-This work builds upon the open-source efforts of [HunyuanVideo](https://github.com/Tencent-Hunyuan/HunyuanVideo) and [FastVideo](https://github.com/hao-ai-lab/FastVideo).

+<div align="center">
+  <!-- Project Title -->
+  <h1>
+    ICVE: In-Context Learning with Unpaired Clips for<br>
+    Instruction-based Video Editing
+  </h1>
+  <!-- Project Badges -->
+  <p>
+    <a href="https://arxiv.org/abs/2510.14648">
+      <img src="https://img.shields.io/badge/arXiv-2510.14648-b31b1b.svg" alt="arXiv"/>
+    </a>
+    <a href="https://huggingface.co/leoisufa/ICVE">
+      <img src="https://img.shields.io/badge/HuggingFace-Model-yellow.svg" alt="HuggingFace"/>
+    </a>
+  </p>
+</div>
+<div align="center">
+  <strong>Xinyao Liao<sup>1,2</sup></strong>,
+  <strong>Xianfang Zeng<sup>2</sup></strong>,
+  <strong>Ziye Song<sup>1</sup></strong>,
+  <strong>Zhoujie Fu<sup>1,2</sup></strong>,
+  <strong>Gang Yu<sup>2*</sup></strong>,
+  <strong>Guosheng Lin<sup>1*</sup></strong>
+  <br><br>
+  <b>
+    <sup>1</sup> Nanyang Technological University
+    <a href="#">
+      <img src="assets/images/Picture 1.png" alt="NTU Logo"
+           style="margin-bottom: -4px; height: 22px;">
+    </a>
+    <sup>2</sup> StepFun
+    <a href="#">
+      <img src="assets/images/Picture 2.png" alt="StepFun Logo"
+           style="margin-bottom: -4px; height: 22px;">
+    </a>
+  </b>
+</div>
+<div align="center">
+  <img src="assets/gifs/output_small.gif"
+       alt="Demo GIF"
+       width="100%"
+       style="max-width:900px;">
+</div>
+**Star us if you find this project useful! ⭐**
+## 🎉 Updates
+- [10/2025] 🔥 [Model checkpoints](https://huggingface.co/leoisufa/ICVE) is released!
+- [10/2025] 🔥 [Codebase](https://github.com/leoisufa/ICVE) is relased!
 ## 🧩 Overview
+ICVE proposes a low-cost pretraining strategy for instruction-based video editing via in-context learning from unpaired clips. Built upon [HunyuanVideoT2V](https://github.com/Tencent-Hunyuan/HunyuanVideo), it first learns editing concepts from about **1M** unpaired videos, then fine-tunes on **<150K** paired editing data for improved instruction alignment and visual quality — enabling general editing operations guided by natural language.
+## 🎥 Video Demo
 <p align="center">
+  <a href="https://youtu.be/ZPXPMJUJnwU" target="_blank">
+    <img src="https://img.youtube.com/vi/ZPXPMJUJnwU/maxresdefault.jpg"
          alt="ICVE Demo Video"
          width="80%"
+         style="max-width:900px; border-radius:10px; box-shadow:0 0 10px rgba(0,0,0,0.15);">
   </a>
+  <br>
+  <em>Click the image above to watch the full video on YouTube 🎬</em>
 </p>
 ## 🛠️ Dependencies and Installation
 python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3
 ```
+## 🧱 Download Models
 1. **HunyuanVideo Pretrained Weights**
    Follow the official HunyuanVideo instructions here:
    👉 [Download Pretrained Models](https://github.com/Tencent-Hunyuan/HunyuanVideo/blob/main/ckpts/README.md)
    and place the downloaded weights into the `ckpts/` directory as shown above.
 2. **ICVE Checkpoint**
+Download the our model weights from
+👉 [Hugging Face](https://huggingface.co/leoisufa/ICVE)
+and place them in the `checkpoint/` directory.
 The folder structure of this project should look like this after setup:
 ```shell
     --save-path ./results
 ```
+## 🙏 Acknowledgements
+We thank the following prior art for their excellent open source work:
+- [HunyuanVideo](https://github.com/Tencent-Hunyuan/HunyuanVideo)
+- [FastVideo](https://github.com/hao-ai-lab/FastVideo)
+- [VACE](https://github.com/ali-vilab/VACE)
 ## 🔗 BibTeX
+If you find [ICEV](https://arxiv.org/abs/2510.14648) useful for your research and applications, please cite using this BibTeX:
 ```BibTeX
+@article{xu2025withanyone,
+  title={In-Context Learning with Unpaired Clips for Instruction-based Video Editing},
+  author={Xinyao Liao and Xianfang Zeng and Ziye Song and Zhoujie Fu and Gang Yu and Guosheng Lin},
+  journal={arXiv preprint arxiv:2510.14648},
+  year={2025}
 }
+```