File size: 4,761 Bytes
e7cedbd
9af0a4c
db5b29c
9af0a4c
 
 
0f5a322
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c3c096d
9af0a4c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
license:  cc-by-nc-4.0
pipeline_tag: video-to-video
library_name: diffusers
---

<div align="center">
  <h1>EffectErase: Joint Video Object Removal<br />and Insertion for High-Quality Effect Erasing</h1>
  <p><strong>CVPR 2026</strong></p>
  <p>
    <a href="https://www.yangfu.site/" target="_blank" rel="noreferrer">Yang Fu</a>
    &nbsp;·&nbsp;
    <a href="https://henghuiding.com/group/" target="_blank" rel="noreferrer">Yike Zheng</a>
    &nbsp;·&nbsp;
    <a href="https://github.com/oliviadzy" target="_blank" rel="noreferrer">Ziyun Dai</a>
    &nbsp;·&nbsp;
    <a href="https://henghuiding.com/" target="_blank" rel="noreferrer">Henghui Ding</a><span>†</span>
  </p>
  <p>
    Institute of Big Data, College of Computer Science and Artificial Intelligence, Fudan University, China
    <br />
    <span>† Corresponding author</span>
  </p>
  <p>
    <a href="https://henghuiding.com/EffectErase/" target="_blank" rel="noreferrer"><img src="https://img.shields.io/badge/🐳-Project%20Page-blue" alt="Project Page" /></a>
    <a href="https://cvpr.thecvf.com/virtual/2026/papers.html" target="_blank" rel="noreferrer"><img src="https://img.shields.io/badge/Paper-CVPR%202026-green" alt="Paper" /></a>
    <a href="https://github.com/FudanCVL/EffectErase" target="_blank" rel="noreferrer"><img src="https://img.shields.io/badge/GitHub-FudanCVL%2FEffectErase-181717?logo=github" alt="GitHub" /></a>
    <a href="http://arxiv.org/" target="_blank" rel="noreferrer"><img src="https://img.shields.io/badge/arXiv-EffectErase-red" alt="arXiv" /></a>
    <a href="https://huggingface.co/datasets/FudanCVL/EffectErase" target="_blank" rel="noreferrer"><img src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Hugging%20Face-yellow" alt="Dataset" /></a>
  </p>
</div>

This repository provides the checkpoint `EffectErase.ckpt` for **EffectErase**.

<img src="assets/teaser.gif" alt="teaser" />

## Abstract

Video object removal aims to eliminate dynamic target objects and their visual effects, such as deformation, shadows, and reflections, while restoring seamless backgrounds. Recent diffusion-based video inpainting and object removal methods can remove the objects but often struggle to erase these effects and to synthesize coherent backgrounds. Beyond method limitations, progress is further hampered by the lack of a comprehensive dataset that systematically captures common object effects across varied environments for training and evaluation. To address this, we introduce **VOR** (**V**ideo **O**bject **R**emoval), a large-scale dataset that provides diverse paired videos, each consisting of one video where the target object is present with its effects and a counterpart where the object and effects are absent, with corresponding object masks. VOR contains 60k high-quality video pairs from captured and synthetic sources, covers five effects types, and spans a wide range of object categories as well as complex, dynamic multi-object scenes. Building on VOR, we propose ***EffectErase***, an effect-aware video object removal method that treats video object insertion as the inverse auxiliary task within a reciprocal learning scheme. The model includes task-aware region guidance that focuses learning on affected areas and enables flexible task switching. Then, an insertion–removal consistency objective that encourages complementary behaviors and shared localization of effect regions and structural cues. Trained on VOR, EffectErase achieves superior performance in extensive experiments, delivering high-quality video object effect erasing across diverse scenarios.

## Quick Start

1. Setup repository and environment

   ```bash
   git clone git@github.com:FudanCVL/EffectErase.git
   cd EffectErase
   pip install -e .
   ```

2. Download weights

   ```bash
   hf download alibaba-pai/Wan2.1-Fun-1.3B-InP --local-dir Wan-AI/Wan2.1-Fun-1.3B-InP
   hf download FudanCVL/EffectErase EffectErase.ckpt --local-dir ./
   ```

3. Run the script

   ```bash
   bash script/test_remove.sh
   ```

   You can edit `script/test_remove.sh` and change these three paths to use your own data:

   - `--fg_bg_path`
   - `--mask_path`
   - `--output_path`

   `--mask_path` is a mask video generated by SAM2.1 (`sam2.1_hiera_b+`), aligned with `--fg_bg_path`.

## BibTeX

Please consider to cite:

```bibtex
@inproceedings{fu2026effecterase,
  title={EffectErase: Joint Video Object Removal and Insertion for High-Quality Effect Erasing},
  author={Fu, Yang and Zheng, Yike and Dai, Ziyun and Ding, Henghui},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026}
}
```

## Contact

If you have any questions, please feel free to reach me out at aleeyanger@gmail.com.