--- license: apache-2.0 base_model: - Wan-AI/Wan2.1-T2V-1.3B tags: - Video-Text-to-Video - Video-to-Video - Video - Video Edit --- # ReCo
π₯οΈ GitHub    ο½    π Project Page    |   π€ ReCo-Data   |    π ReCo-Bench   |    π€ ReCo-Models    |    π Paper   
Examples of different video editing tasks by our ReCo.
### Downloading ReCo-Data
Please download each task of ReCo-Data into the `./ReCo-Data` directory by running:
```bash
bash ./tools/download_dataset.sh
````
Before downloading the full dataset, you may first browse the
**[visualization examples](https://huggingface.co/datasets/HiDream-ai/ReCo-Data/blob/main/examples.tar)**.
These examples are collected by **randomly sampling 50 instances from each task**
(add, remove, replace, and style), **without any manual curation or cherry-picking**,
and are intended to help users quickly inspect and assess the overall data quality.
Note: The examples are formatted for visualization convenience and do not strictly follow the dataset format.
### Directory Structure
After downloading, please ensure that the dataset follows the directory structure below:
[**ReCo: Region-Constraint In-Context Generation for Instructional Video Editing**](https://zhw-zhang.github.io/ReCo-page/)
ReCo-Data directory structure
```text
ReCo-Data/
βββ add/
β βββ add_data_configs.json
β βββ src_videos/
β β βββ video1.mp4
β β βββ video2.mp4
β β βββ ...
β βββ tar_videos/
β βββ video1.mp4
β βββ video2.mp4
β βββ ...
βββ remove/
β βββ remove_data_configs.json
β βββ src_videos/
β βββ tar_videos/
βββ replace/
β βββ replace_data_configs.json
β βββ src_videos/
β βββ tar_videos/
βββ style/
βββ style_data_configs.json
βββ src_videos/
β βββ video1.mp4
β βββ ...
βββ tar_videos/
βββ video1-a_Van_Gogh_style.mp4
βββ ...
```
ReCo-Bench details
Traditional video generation metrics often struggle to accurately assess the fidelity and quality of video editing results. Inspired by recent image editing evaluation protocols, we propose a **VLLM-based evaluation benchmark** to comprehensively and effectively evaluate video editing quality.
We collect **480 videoβinstruction pairs** as the evaluation set, evenly distributed across four tasks: **object addition**, **object removal**, **object replacement**, and **video stylization** (120 pairs per task). All source videos are collected from the **Pexels** video platform.
For local editing tasks (add, remove, and replace), we utilize **Gemini-2.5-Flash-Thinking** to automatically generate diverse editing instructions conditioned on video content. For video stylization, we randomly select **10 source videos** and apply **12 distinct styles** to each, resulting in **120 stylization evaluation pairs**.
This script performs the evaluation in two stages:
#### Step 1: Per-dimension Evaluation with Gemini
In the first stage, **Gemini-2.5-Flash-Thinking** is used as a VLLM evaluator to score each edited video across multiple evaluation dimensions.
Key arguments used in this step include:
* `--edited_video_folder`: Path to the folder containing the edited (target) videos generated by the model.
* `--src_video_folder`: Path to the folder containing the original source videos.
* `--base_txt_folder`: Path to the folder containing task-specific instruction configuration files.
* `--task_name`: Name of the evaluation task, one of `{add, remove, replace, style}`.
This step outputs per-video, per-dimension evaluation results in JSON format.
#### Step 2: Final Score Aggregation
After all four tasks have been fully evaluated, the second stage aggregates the evaluation results and computes the final scores.
* `--json_folder`: Path to the JSON output folder generated in Step 1
(default: `all_results/gemini_results`)
* `--base_txt_folder`: Path to the instruction configuration folder
This step produces the final benchmark scores for each task as well as the overall performance.
**Organize the files as follows:**
```text
.
βββ Wan-AI/
βββ all_ckpts/
β βββ 2026_01_16_v1_release_preview.ckpt
βββ assets/
βββ inference_reco_single.py
```
### 3. Running Inference
We provide a bash script to automate the execution of different tasks (Replace, Remove, Style, Add and Propagation). Run the following command:
```bash
bash infer_server_single.sh
```
To run a specific task manually or customize the execution, use the python command directly:
```bash
python inference_reco_single.py \
--task_name replace \
--test_txt_file_name assets/replace_test.txt \
--lora_ckpt all_ckpts/2026_01_16_v1_release_preview.ckpt
```
### 4. Key Arguments Explained
| Argument | Type | Default | Description |
| --- | --- | --- | --- |
| `test_txt_file_name` | `str` | `assets/...` | Path to the `.txt` file containing test prompts/configs. |
| `task_name` | `str` | `replace` | Task type: `remove`, `replace`, `add`, `style`. Use the `_wf` suffix (e.g., `remove_wf`) for **Propagation tasks** given the first frame. |
| `base_video_folder` | `str` | `assets/test_videos` | Directory containing the source videos. |
| `base_wan_folder` | `str` | `./Wan-AI` | Path to the pre-trained Wan-AI model weights. |
| `lora_ckpt` | `str` | `all_ckpts/...` | Path to the specific LoRA checkpoint file. |
## π Training
Will be released soon.
## π Star and Citation
If you find our work helpful for your research, please consider giving a starβ on this repository and citing our work.
```
@article{reco,
title={{Region-Constraint In-Context Generation for Instructional Video Editing}},
author={Zhongwei Zhang and Fuchen Long and Wei Li and Zhaofan Qiu and Wu Liu and Ting Yao and Tao Mei},
journal={arXiv preprint arXiv:2512.17650},
year={2025}
}
```
## π Acknowledgement
Our code is inspired by several works, including [WAN](https://github.com/Wan-Video/Wan2.1), [ObjectClear](https://github.com/zjx0101/ObjectClear)--a strong object remover, [VACE](https://github.com/ali-vilab/VACE), [Flux-Kontext-dev](https://github.com/black-forest-labs/flux). Thanks to all the contributors!
Model
Source
Description
Wan-2.1-VACE-1.3B
π€ Hugging Face
Base VACE weights. Place in
./Wan-AI
ReCo
π€ Hugging Face
Our ReCo Preview checkpoint. Place in
all_ckpts/. We will update better ckpts progressively afterward.