--- license: apache-2.0 base_model: - Wan-AI/Wan2.1-T2V-1.3B tags: - Video-Text-to-Video - Video-to-Video - Video - Video Edit --- # ReCo

πŸ–₯️ GitHub    |    🌐 Project Page    |   πŸ€— ReCo-Data   |    πŸ“ˆ ReCo-Bench   |    πŸ€— ReCo-Models    |    πŸ“– Paper   
[**ReCo: Region-Constraint In-Context Generation for Instructional Video Editing**](https://zhw-zhang.github.io/ReCo-page/) πŸ”† If you find ReCo useful, please give a ⭐ for this repo, which is important to Open-Source projects. Thanks! Here, we will gradually release the following resources, including: * **ReCo training dataset:** ReCo-Data * **Evaluation code:** ReCo-Bench * **Model weights, inference code, and training code** ## Video Demos

Examples of different video editing tasks by our ReCo.

## πŸ”₯ Updates - βœ… **\[2025.12.22\]** Upload Our arXiv Paper. - βœ… **\[2025.12.23\]** Release ReCo-Data and Usage code. - βœ… **\[2025.12.23\]** Release ReCo-Bench and evaluation code. - βœ… **\[2026.01.16\]** Release ReCo Model weights(Preview) and inference code. - βœ… **\[2026.01.16\]** Uploaded raw [video object masks](https://huggingface.co/datasets/HiDream-ai/ReCo-Data/tree/main/video_masks) to ReCo-Data. - ⬜ Release training code. ## πŸ“Š ReCo-Data Preparation **ReCo-Data** is a large-scale, high-quality video editing dataset consisting of **500K+ instruction–video pairs**, covering four video editing tasks: **object addition (add)**, **object removal (remove)**, **object replacement (replace)**, and **video stylization (style)**.

### Downloading ReCo-Data Please download each task of ReCo-Data into the `./ReCo-Data` directory by running: ```bash bash ./tools/download_dataset.sh ```` Before downloading the full dataset, you may first browse the **[visualization examples](https://huggingface.co/datasets/HiDream-ai/ReCo-Data/blob/main/examples.tar)**. These examples are collected by **randomly sampling 50 instances from each task** (add, remove, replace, and style), **without any manual curation or cherry-picking**, and are intended to help users quickly inspect and assess the overall data quality. Note: The examples are formatted for visualization convenience and do not strictly follow the dataset format. ### Directory Structure After downloading, please ensure that the dataset follows the directory structure below:

ReCo-Data directory structure ```text ReCo-Data/ β”œβ”€β”€ add/ β”‚ β”œβ”€β”€ add_data_configs.json β”‚ β”œβ”€β”€ src_videos/ β”‚ β”‚ β”œβ”€β”€ video1.mp4 β”‚ β”‚ β”œβ”€β”€ video2.mp4 β”‚ β”‚ └── ... β”‚ └── tar_videos/ β”‚ β”œβ”€β”€ video1.mp4 β”‚ β”œβ”€β”€ video2.mp4 β”‚ └── ... β”œβ”€β”€ remove/ β”‚ β”œβ”€β”€ remove_data_configs.json β”‚ β”œβ”€β”€ src_videos/ β”‚ └── tar_videos/ β”œβ”€β”€ replace/ β”‚ β”œβ”€β”€ replace_data_configs.json β”‚ β”œβ”€β”€ src_videos/ β”‚ └── tar_videos/ └── style/ β”œβ”€β”€ style_data_configs.json β”œβ”€β”€ src_videos/ β”‚ β”œβ”€β”€ video1.mp4 β”‚ └── ... └── tar_videos/ β”œβ”€β”€ video1-a_Van_Gogh_style.mp4 └── ... ```
### Testing and Visualization After downloading the dataset, you can directly test and visualize samples from **any single task** using the following script (taking the **replace** task as an example): ```bash python reco_data_test_single.py \ --json_path ./ReCo-Data/replace/replace_data_configs.json \ --video_folder ./ReCo-Data \ --debug ``` ### Mixed Task Loading You can also load a **mixed dataset** composed of the four tasks (**add**, **remove**, **replace**, and **style**) with arbitrary ratios by running: ```bash python reco_data_test_mix_data.py \ --json_folder ./ReCo-Data \ --video_folder ./ReCo-Data \ --debug ``` ### Notes * `src_videos/` contains the original source videos. * `tar_videos/` contains the edited target videos corresponding to each instruction. * `*_data_configs.json` stores the instruction–video mappings and metadata for each task. ## πŸ“ˆ Evaluation ### VLLM-based Evaluation Benchmark
ReCo-Bench details Traditional video generation metrics often struggle to accurately assess the fidelity and quality of video editing results. Inspired by recent image editing evaluation protocols, we propose a **VLLM-based evaluation benchmark** to comprehensively and effectively evaluate video editing quality. We collect **480 video–instruction pairs** as the evaluation set, evenly distributed across four tasks: **object addition**, **object removal**, **object replacement**, and **video stylization** (120 pairs per task). All source videos are collected from the **Pexels** video platform. For local editing tasks (add, remove, and replace), we utilize **Gemini-2.5-Flash-Thinking** to automatically generate diverse editing instructions conditioned on video content. For video stylization, we randomly select **10 source videos** and apply **12 distinct styles** to each, resulting in **120 stylization evaluation pairs**.
--- ### Downloading ReCo-Bench Please download **ReCo-Bench** into the `./ReCo-Bench` directory by running: ```bash bash ./tools/download_ReCo-Bench.sh ```` --- ### Usage After downloading the benchmark, you can directly start the evaluation using: ```bash cd tools bash run_eval_via_gemini.sh ```
This script performs the evaluation in two stages: #### Step 1: Per-dimension Evaluation with Gemini In the first stage, **Gemini-2.5-Flash-Thinking** is used as a VLLM evaluator to score each edited video across multiple evaluation dimensions. Key arguments used in this step include: * `--edited_video_folder`: Path to the folder containing the edited (target) videos generated by the model. * `--src_video_folder`: Path to the folder containing the original source videos. * `--base_txt_folder`: Path to the folder containing task-specific instruction configuration files. * `--task_name`: Name of the evaluation task, one of `{add, remove, replace, style}`. This step outputs per-video, per-dimension evaluation results in JSON format. #### Step 2: Final Score Aggregation After all four tasks have been fully evaluated, the second stage aggregates the evaluation results and computes the final scores. * `--json_folder`: Path to the JSON output folder generated in Step 1 (default: `all_results/gemini_results`) * `--base_txt_folder`: Path to the instruction configuration folder This step produces the final benchmark scores for each task as well as the overall performance.
## πŸƒ Inference ### 1. Environment Preparation Create and activate the specialized Conda environment: ```bash conda create -n reco python=3.11 -y conda activate reco pip install -r requirements.txt ``` ### 2. Model Weights Setup You need to prepare both the base model and our specific checkpoints.
Model Source Description
Wan-2.1-VACE-1.3B πŸ€— Hugging Face Base VACE weights. Place in ./Wan-AI
ReCo πŸ€— Hugging Face Our ReCo Preview checkpoint. Place in all_ckpts/. We will update better ckpts progressively afterward.
**Organize the files as follows:** ```text . β”œβ”€β”€ Wan-AI/ β”œβ”€β”€ all_ckpts/ β”‚ └── 2026_01_16_v1_release_preview.ckpt β”œβ”€β”€ assets/ └── inference_reco_single.py ``` ### 3. Running Inference We provide a bash script to automate the execution of different tasks (Replace, Remove, Style, Add and Propagation). Run the following command: ```bash bash infer_server_single.sh ``` To run a specific task manually or customize the execution, use the python command directly: ```bash python inference_reco_single.py \ --task_name replace \ --test_txt_file_name assets/replace_test.txt \ --lora_ckpt all_ckpts/2026_01_16_v1_release_preview.ckpt ``` ### 4. Key Arguments Explained | Argument | Type | Default | Description | | --- | --- | --- | --- | | `test_txt_file_name` | `str` | `assets/...` | Path to the `.txt` file containing test prompts/configs. | | `task_name` | `str` | `replace` | Task type: `remove`, `replace`, `add`, `style`. Use the `_wf` suffix (e.g., `remove_wf`) for **Propagation tasks** given the first frame. | | `base_video_folder` | `str` | `assets/test_videos` | Directory containing the source videos. | | `base_wan_folder` | `str` | `./Wan-AI` | Path to the pre-trained Wan-AI model weights. | | `lora_ckpt` | `str` | `all_ckpts/...` | Path to the specific LoRA checkpoint file. | ## πŸš€ Training Will be released soon. ## 🌟 Star and Citation If you find our work helpful for your research, please consider giving a star⭐ on this repository and citing our work. ``` @article{reco, title={{Region-Constraint In-Context Generation for Instructional Video Editing}}, author={Zhongwei Zhang and Fuchen Long and Wei Li and Zhaofan Qiu and Wu Liu and Ting Yao and Tao Mei}, journal={arXiv preprint arXiv:2512.17650}, year={2025} } ``` ## πŸ’– Acknowledgement Our code is inspired by several works, including [WAN](https://github.com/Wan-Video/Wan2.1), [ObjectClear](https://github.com/zjx0101/ObjectClear)--a strong object remover, [VACE](https://github.com/ali-vilab/VACE), [Flux-Kontext-dev](https://github.com/black-forest-labs/flux). Thanks to all the contributors!