# E2FGVI (CVPR 2022) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/towards-an-end-to-end-framework-for-flow/video-inpainting-on-davis)](https://paperswithcode.com/sota/video-inpainting-on-davis?p=towards-an-end-to-end-framework-for-flow) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/towards-an-end-to-end-framework-for-flow/video-inpainting-on-youtube-vos)](https://paperswithcode.com/sota/video-inpainting-on-youtube-vos?p=towards-an-end-to-end-framework-for-flow) ![Python 3.7](https://img.shields.io/badge/python-3.7-green.svg?style=plastic) ![pytorch 1.6.0](https://img.shields.io/badge/pytorch-1.5.1-green.svg?style=plastic) English | [简体中文](README_zh-CN.md) This repository contains the official implementation of the following paper: > **Towards An End-to-End Framework for Flow-Guided Video Inpainting**
> Zhen Li#, Cheng-Ze Lu#, Jianhua Qin, Chun-Le Guo*, Ming-Ming Cheng
> IEEE/CVF Conference on Computer Vision and Pattern Recognition (**CVPR**), 2022
[[Paper](https://arxiv.org/abs/2204.02663)] [[Demo Video (Youtube)](https://www.youtube.com/watch?v=N--qC3T2wc4)] [[演示视频 (B站)](https://www.bilibili.com/video/BV1Ta411n7eH?spm_id_from=333.999.0.0)] [[MindSpore Implementation](https://github.com/Dragoniss/minspore-phase2-E2FGVI)] [Project Page (TBD)] [Poster (TBD)] You can try our colab demo here: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/12rwY2gtG8jVWlNx9pjmmM8uGmh5ue18G?usp=sharing) ## :star: News - *2022.05.15:* We release E2FGVI-HQ, which can handle videos with **arbitrary resolution**. This model could generalize well to much higher resolutions, while it only used 432x240 videos for training. Besides, it performs **better** than our original model on both PSNR and SSIM metrics. :link: Download links: [[Google Drive](https://drive.google.com/file/d/10wGdKSUOie0XmCr8SQ2A2FeDe-mfn5w3/view?usp=sharing)] [[Baidu Disk](https://pan.baidu.com/s/1jfm1oFU1eIy-IRfuHP8YXw?pwd=ssb3)] :movie_camera: Demo video: [[Youtube](https://www.youtube.com/watch?v=N--qC3T2wc4)] [[B站](https://www.bilibili.com/video/BV1Ta411n7eH?spm_id_from=333.999.0.0)] - *2022.04.06:* Our code is publicly available. ## Demo ![teaser](./figs/teaser.gif) ### More examples (click for details):
Coco (click me)
Tennis
Space
Motocross
## Overview ![overall_structure](./figs/framework.png) ### :rocket: Highlights: - **SOTA performance**: The proposed E2FGVI achieves significant improvements on all quantitative metrics in comparison with SOTA methods. - **Highly effiency**: Our method processes 432 × 240 videos at 0.12 seconds per frame on a Titan XP GPU, which is nearly 15× faster than previous flow-based methods. Besides, our method has the lowest FLOPs among all compared SOTA methods. ## Work in Progress - [ ] Update website page - [ ] Hugging Face demo - [ ] Efficient inference ## Dependencies and Installation 1. Clone Repo ```bash git clone https://github.com/MCG-NKU/E2FGVI.git ``` 2. Create Conda Environment and Install Dependencies ```bash conda env create -f environment.yml conda activate e2fgvi ``` - Python >= 3.7 - PyTorch >= 1.5 - CUDA >= 9.2 - [mmcv-full](https://github.com/open-mmlab/mmcv#installation) (following the pipeline to install) If the `environment.yml` file does not work for you, please follow [this issue](https://github.com/MCG-NKU/E2FGVI/issues/3) to solve the problem. ## Get Started ### Prepare pretrained models Before performing the following steps, please download our pretrained model first.
Model :link: Download Links Support Arbitrary Resolution ? PSNR / SSIM / VFID (DAVIS)
E2FGVI [Google Drive] [Baidu Disk] :x: 33.01 / 0.9721 / 0.116
E2FGVI-HQ [Google Drive] [Baidu Disk] :o: 33.06 / 0.9722 / 0.117
Then, unzip the file and place the models to `release_model` directory. The directory structure will be arranged as: ``` release_model |- E2FGVI-CVPR22.pth |- E2FGVI-HQ-CVPR22.pth |- i3d_rgb_imagenet.pt (for evaluating VFID metric) |- README.md ``` ### Quick test We provide two examples in the [`examples`](./examples) directory. Run the following command to enjoy them: ```shell # The first example (using split video frames) python test.py --model e2fgvi (or e2fgvi_hq) --video examples/tennis --mask examples/tennis_mask --ckpt release_model/E2FGVI-CVPR22.pth (or release_model/E2FGVI-HQ-CVPR22.pth) # The second example (using mp4 format video) python test.py --model e2fgvi (or e2fgvi_hq) --video examples/schoolgirls.mp4 --mask examples/schoolgirls_mask --ckpt release_model/E2FGVI-CVPR22.pth (or release_model/E2FGVI-HQ-CVPR22.pth) ``` The inpainting video will be saved in the `results` directory. Please prepare your own **mp4 video** (or **split frames**) and **frame-wise masks** if you want to test more cases. *Note:* E2FGVI always rescales the input video to a fixed resolution (432x240), while E2FGVI-HQ does not change the resolution of the input video. If you want to custom the output resolution, please use the `--set_size` flag and set the values of `--width` and `--height`. Example: ```shell # Using this command to output a 720p video python test.py --model e2fgvi_hq --video --mask --ckpt release_model/E2FGVI-HQ-CVPR22.pth --set_size --width 1280 --height 720 ``` ### Prepare dataset for training and evaluation
Dataset YouTube-VOS DAVIS
Details For training (3,471) and evaluation (508) For evaluation (50 in 90)
Images [Official Link] (Download train and test all frames) [Official Link] (2017, 480p, TrainVal)
Masks [Google Drive] [Baidu Disk] (For reproducing paper results)
The training and test split files are provided in `datasets/`. For each dataset, you should place `JPEGImages` to `datasets/`. Then, run `sh datasets/zip_dir.sh` (**Note**: please edit the folder path accordingly) for compressing each video in `datasets//JPEGImages`. Unzip downloaded mask files to `datasets`. The `datasets` directory structure will be arranged as: (**Note**: please check it carefully) ``` datasets |- davis |- JPEGImages |- .zip |- .zip |- test_masks |- |- 00000.png |- 00001.png |- train.json |- test.json |- youtube-vos |- JPEGImages |- .zip |- .zip |- test_masks |- |- 00000.png |- 00001.png |- train.json |- test.json |- zip_file.sh ``` ### Evaluation Run one of the following commands for evaluation: ```shell # For evaluating E2FGVI model python evaluate.py --model e2fgvi --dataset --data_root datasets/ --ckpt release_model/E2FGVI-CVPR22.pth # For evaluating E2FGVI-HQ model python evaluate.py --model e2fgvi_hq --dataset --data_root datasets/ --ckpt release_model/E2FGVI-HQ-CVPR22.pth ``` You will get scores as paper reported if you evaluate E2FGVI. The scores of E2FGVI-HQ can be found in [[Prepare pretrained models](https://github.com/MCG-NKU/E2FGVI#prepare-pretrained-models)]. The scores will also be saved in the `results/_` directory. Please `--save_results` for further [evaluating temporal warping error](https://github.com/phoenix104104/fast_blind_video_consistency#evaluation). ### Training Our training configures are provided in [`train_e2fgvi.json`](./configs/train_e2fgvi.json) (for E2FGVI) and [`train_e2fgvi_hq.json`](./configs/train_e2fgvi_hq.json) (for E2FGVI-HQ). Run one of the following commands for training: ```shell # For training E2FGVI python train.py -c configs/train_e2fgvi.json # For training E2FGVI-HQ python train.py -c configs/train_e2fgvi_hq.json ``` You could run the same command if you want to resume your training. The training loss can be monitored by running: ```shell tensorboard --logdir release_model ``` You could follow [this pipeline](https://github.com/MCG-NKU/E2FGVI#evaluation) to evaluate your model. ## Results ### Quantitative results ![quantitative_results](./figs/quantitative_results.png) ## Citation If you find our repo useful for your research, please consider citing our paper: ```bibtex @inproceedings{liCvpr22vInpainting, title={Towards An End-to-End Framework for Flow-Guided Video Inpainting}, author={Li, Zhen and Lu, Cheng-Ze and Qin, Jianhua and Guo, Chun-Le and Cheng, Ming-Ming}, booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, year={2022} } ``` ## Contact If you have any question, please feel free to contact us via `zhenli1031ATgmail.com` or `czlu919AToutlook.com`. ## License Licensed under a [Creative Commons Attribution-NonCommercial 4.0 International](https://creativecommons.org/licenses/by-nc/4.0/) for Non-commercial use only. Any commercial use should get formal permission first. ## Acknowledgement This repository is maintained by [Zhen Li](https://paper99.github.io) and [Cheng-Ze Lu](https://github.com/LGYoung). This code is based on [STTN](https://github.com/researchmm/STTN), [FuseFormer](https://github.com/ruiliu-ai/FuseFormer), [Focal-Transformer](https://github.com/microsoft/Focal-Transformer), and [MMEditing](https://github.com/open-mmlab/mmediting).