ReCo / README.md
zzwustc's picture
Update README.md
838611b verified
---
license: apache-2.0
base_model:
- Wan-AI/Wan2.1-T2V-1.3B
tags:
- Video-Text-to-Video
- Video-to-Video
- Video
- Video Edit
---
# ReCo
<p align="center">
<img src="assets/title_imgs.png" width="400"/>
<p>
<p align="center">
πŸ–₯️ <a href="https://github.com/HiDream-ai/ReCo">GitHub</a> &nbsp&nbsp | &nbsp&nbsp 🌐 <a href="https://zhw-zhang.github.io/ReCo-page/"><b>Project Page</b></a> &nbsp&nbsp | &nbsp&nbspπŸ€— <a href="https://huggingface.co/datasets/HiDream-ai/ReCo-Data">ReCo-Data</a>&nbsp&nbsp | &nbsp&nbsp πŸ“ˆ <a href="https://huggingface.co/datasets/HiDream-ai/ReCo-Bench">ReCo-Bench</a>&nbsp&nbsp | &nbsp&nbsp πŸ€— <a href="https://huggingface.co/HiDream-ai/ReCo">ReCo-Models </a> &nbsp&nbsp | &nbsp&nbsp πŸ“– <a href="https://arxiv.org/abs/2512.17650">Paper</a> &nbsp&nbsp
<br>
[**ReCo: Region-Constraint In-Context Generation for Instructional Video Editing**](https://zhw-zhang.github.io/ReCo-page/) <be>
πŸ”† If you find ReCo useful, please give a ⭐ for this repo, which is important to Open-Source projects. Thanks!
Here, we will gradually release the following resources, including:
* **ReCo training dataset:** ReCo-Data
* **Evaluation code:** ReCo-Bench
* **Model weights, inference code, and training code**
## Video Demos
<div align="center">
<video controls autoplay loop muted playsinline src="https://github.com/user-attachments/assets/ba530e6f-e13b-4d04-ad60-b95277cc38ce"></video>
<p><em>Examples of different video editing tasks by our ReCo.</em></p>
</div>
## πŸ”₯ Updates
- βœ… **\[2025.12.22\]** Upload Our arXiv Paper.
- βœ… **\[2025.12.23\]** Release ReCo-Data and Usage code.
- βœ… **\[2025.12.23\]** Release ReCo-Bench and evaluation code.
- βœ… **\[2026.01.16\]** Release ReCo Model weights(Preview) and inference code.
- βœ… **\[2026.01.16\]** Uploaded raw [video object masks](https://huggingface.co/datasets/HiDream-ai/ReCo-Data/tree/main/video_masks) to ReCo-Data.
- ⬜ Release training code.
## πŸ“Š ReCo-Data Preparation
**ReCo-Data** is a large-scale, high-quality video editing dataset consisting of **500K+ instruction–video pairs**, covering four video editing tasks: **object addition (add)**, **object removal (remove)**, **object replacement (replace)**, and **video stylization (style)**.
<p align="center">
<img src="assets/statistic.png" width="800"/>
<p>
### Downloading ReCo-Data
Please download each task of ReCo-Data into the `./ReCo-Data` directory by running:
```bash
bash ./tools/download_dataset.sh
````
Before downloading the full dataset, you may first browse the
**[visualization examples](https://huggingface.co/datasets/HiDream-ai/ReCo-Data/blob/main/examples.tar)**.
These examples are collected by **randomly sampling 50 instances from each task**
(add, remove, replace, and style), **without any manual curation or cherry-picking**,
and are intended to help users quickly inspect and assess the overall data quality.
Note: The examples are formatted for visualization convenience and do not strictly follow the dataset format.
### Directory Structure
After downloading, please ensure that the dataset follows the directory structure below:
<details open>
<summary>ReCo-Data directory structure</summary>
```text
ReCo-Data/
β”œβ”€β”€ add/
β”‚ β”œβ”€β”€ add_data_configs.json
β”‚ β”œβ”€β”€ src_videos/
β”‚ β”‚ β”œβ”€β”€ video1.mp4
β”‚ β”‚ β”œβ”€β”€ video2.mp4
β”‚ β”‚ └── ...
β”‚ └── tar_videos/
β”‚ β”œβ”€β”€ video1.mp4
β”‚ β”œβ”€β”€ video2.mp4
β”‚ └── ...
β”œβ”€β”€ remove/
β”‚ β”œβ”€β”€ remove_data_configs.json
β”‚ β”œβ”€β”€ src_videos/
β”‚ └── tar_videos/
β”œβ”€β”€ replace/
β”‚ β”œβ”€β”€ replace_data_configs.json
β”‚ β”œβ”€β”€ src_videos/
β”‚ └── tar_videos/
└── style/
β”œβ”€β”€ style_data_configs.json
β”œβ”€β”€ src_videos/
β”‚ β”œβ”€β”€ video1.mp4
β”‚ └── ...
└── tar_videos/
β”œβ”€β”€ video1-a_Van_Gogh_style.mp4
└── ...
```
</details>
### Testing and Visualization
After downloading the dataset, you can directly test and visualize samples from **any single task** using the following script
(taking the **replace** task as an example):
```bash
python reco_data_test_single.py \
--json_path ./ReCo-Data/replace/replace_data_configs.json \
--video_folder ./ReCo-Data \
--debug
```
### Mixed Task Loading
You can also load a **mixed dataset** composed of the four tasks (**add**, **remove**, **replace**, and **style**) with arbitrary ratios by running:
```bash
python reco_data_test_mix_data.py \
--json_folder ./ReCo-Data \
--video_folder ./ReCo-Data \
--debug
```
### Notes
* `src_videos/` contains the original source videos.
* `tar_videos/` contains the edited target videos corresponding to each instruction.
* `*_data_configs.json` stores the instruction–video mappings and metadata for each task.
## πŸ“ˆ Evaluation
### VLLM-based Evaluation Benchmark
<details close>
<summary>ReCo-Bench details</summary>
Traditional video generation metrics often struggle to accurately assess the fidelity and quality of video editing results. Inspired by recent image editing evaluation protocols, we propose a **VLLM-based evaluation benchmark** to comprehensively and effectively evaluate video editing quality.
We collect **480 video–instruction pairs** as the evaluation set, evenly distributed across four tasks: **object addition**, **object removal**, **object replacement**, and **video stylization** (120 pairs per task). All source videos are collected from the **Pexels** video platform.
For local editing tasks (add, remove, and replace), we utilize **Gemini-2.5-Flash-Thinking** to automatically generate diverse editing instructions conditioned on video content. For video stylization, we randomly select **10 source videos** and apply **12 distinct styles** to each, resulting in **120 stylization evaluation pairs**.
</details>
---
### Downloading ReCo-Bench
Please download **ReCo-Bench** into the `./ReCo-Bench` directory by running:
```bash
bash ./tools/download_ReCo-Bench.sh
````
---
### Usage
After downloading the benchmark, you can directly start the evaluation using:
```bash
cd tools
bash run_eval_via_gemini.sh
```
<details close>
<summary>This script performs the evaluation in two stages:</summary>
#### Step 1: Per-dimension Evaluation with Gemini
In the first stage, **Gemini-2.5-Flash-Thinking** is used as a VLLM evaluator to score each edited video across multiple evaluation dimensions.
Key arguments used in this step include:
* `--edited_video_folder`: Path to the folder containing the edited (target) videos generated by the model.
* `--src_video_folder`: Path to the folder containing the original source videos.
* `--base_txt_folder`: Path to the folder containing task-specific instruction configuration files.
* `--task_name`: Name of the evaluation task, one of `{add, remove, replace, style}`.
This step outputs per-video, per-dimension evaluation results in JSON format.
#### Step 2: Final Score Aggregation
After all four tasks have been fully evaluated, the second stage aggregates the evaluation results and computes the final scores.
* `--json_folder`: Path to the JSON output folder generated in Step 1
(default: `all_results/gemini_results`)
* `--base_txt_folder`: Path to the instruction configuration folder
This step produces the final benchmark scores for each task as well as the overall performance.
</details>
## πŸƒ Inference
### 1. Environment Preparation
Create and activate the specialized Conda environment:
```bash
conda create -n reco python=3.11 -y
conda activate reco
pip install -r requirements.txt
```
### 2. Model Weights Setup
You need to prepare both the base model and our specific checkpoints.
<!-- | Model | Source | Description |
| --- | --- | --- |
| **VACE 1.3B** | [πŸ€— Hugging Face](https://huggingface.co/Wan-AI/Wan2.1-VACE-1.3B) | Base VACE weights (Place in `./Wan-AI`) |
| **ReCo** | [πŸ€— Hugging Face](https://huggingface.co/HiDream-ai/ReCo) | Our ReCo Preview checkpoint(Place in `all_ckpts/`). We will update better ckpts progressively afterward | -->
<table>
<thead>
<tr>
<th width="25%" align="center">Model</th>
<th width="25%" align="center">Source</th>
<th align="left">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center"><b>Wan-2.1-VACE-1.3B</b></td>
<td align="center"><a href="https://huggingface.co/Wan-AI/Wan2.1-VACE-1.3B">πŸ€— Hugging Face</a></td>
<td>Base VACE weights. Place in <code>./Wan-AI</code></td>
</tr>
<tr>
<td align="center"><b>ReCo</b></td>
<td align="center"><a href="https://huggingface.co/HiDream-ai/ReCo">πŸ€— Hugging Face</a></td>
<td>Our ReCo Preview checkpoint. Place in <code>all_ckpts/</code>. We will update better ckpts progressively afterward.</td>
</tr>
</tbody>
</table>
**Organize the files as follows:**
```text
.
β”œβ”€β”€ Wan-AI/
β”œβ”€β”€ all_ckpts/
β”‚ └── 2026_01_16_v1_release_preview.ckpt
β”œβ”€β”€ assets/
└── inference_reco_single.py
```
### 3. Running Inference
We provide a bash script to automate the execution of different tasks (Replace, Remove, Style, Add and Propagation). Run the following command:
```bash
bash infer_server_single.sh
```
To run a specific task manually or customize the execution, use the python command directly:
```bash
python inference_reco_single.py \
--task_name replace \
--test_txt_file_name assets/replace_test.txt \
--lora_ckpt all_ckpts/2026_01_16_v1_release_preview.ckpt
```
### 4. Key Arguments Explained
| Argument | Type | Default | Description |
| --- | --- | --- | --- |
| `test_txt_file_name` | `str` | `assets/...` | Path to the `.txt` file containing test prompts/configs. |
| `task_name` | `str` | `replace` | Task type: `remove`, `replace`, `add`, `style`. Use the `_wf` suffix (e.g., `remove_wf`) for **Propagation tasks** given the first frame. |
| `base_video_folder` | `str` | `assets/test_videos` | Directory containing the source videos. |
| `base_wan_folder` | `str` | `./Wan-AI` | Path to the pre-trained Wan-AI model weights. |
| `lora_ckpt` | `str` | `all_ckpts/...` | Path to the specific LoRA checkpoint file. |
## πŸš€ Training
Will be released soon.
## 🌟 Star and Citation
If you find our work helpful for your research, please consider giving a star⭐ on this repository and citing our work.
```
@article{reco,
title={{Region-Constraint In-Context Generation for Instructional Video Editing}},
author={Zhongwei Zhang and Fuchen Long and Wei Li and Zhaofan Qiu and Wu Liu and Ting Yao and Tao Mei},
journal={arXiv preprint arXiv:2512.17650},
year={2025}
}
```
## πŸ’– Acknowledgement
<span id="acknowledgement"></span>
Our code is inspired by several works, including [WAN](https://github.com/Wan-Video/Wan2.1), [ObjectClear](https://github.com/zjx0101/ObjectClear)--a strong object remover, [VACE](https://github.com/ali-vilab/VACE), [Flux-Kontext-dev](https://github.com/black-forest-labs/flux). Thanks to all the contributors!