| --- |
| license: cc-by-nc-4.0 |
| task_categories: |
| - image-segmentation |
| tags: |
| - mirror-detection |
| - video-understanding |
| - video-mirror-detection |
| - scene-understanding |
| - pytorch |
| pretty_name: VMD-Net (Video Mirror Detection Network) |
| --- |
| |
| # VMD-Net — Video Mirror Detection Network |
|
|
| Pre-trained weights for **VMD-Net**, introduced in: |
|
|
| > **Learning to Detect Mirrors from Videos via Dual Correspondences** |
| > Jiaying Lin\*, Xin Tan\*, Rynson W. H. Lau |
| > CVPR 2023 |
| > [Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Lin_Learning_To_Detect_Mirrors_From_Videos_via_Dual_Correspondences_CVPR_2023_paper.pdf) · [Project Page](https://jiaying.link/cvpr2023-vmd/) · [Dataset (VMD-D)](https://huggingface.co/datasets/garrying/VMD-D) |
|
|
| ## Model Summary |
|
|
| VMD-Net detects mirrors in video sequences by exploiting **dual correspondences** — both intra-frame (spatial) and inter-frame (temporal) — via a Relation Attention module built on a DeepLabV3 encoder backbone. This design lets the model handle frames where intra-frame mirror cues are weak or absent, producing accurate and temporally consistent segmentation masks. |
|
|
| | File | Description | |
| |------|-------------| |
| | `best.pth` | Best checkpoint (714 MB), saved as `{'model': state_dict, ...}` | |
| | `results/results.zip` | VMD-Net predictions on the VMD-D test set | |
| | `results/baseline_results.zip` | Baseline method predictions for comparison | |
|
|
| ## Loading the Weights |
|
|
| ```python |
| import torch |
| from networks.VMD_network import VMD_Network # from the code release |
| |
| model = VMD_Network() |
| checkpoint = torch.load("best.pth", map_location="cpu") |
| model.load_state_dict(checkpoint["model"]) |
| model.eval() |
| ``` |
|
|
| Download the checkpoint: |
| ```bash |
| huggingface-cli download garrying/VMD-Net best.pth --local-dir ./weights |
| ``` |
|
|
| ## Training Dataset |
|
|
| This model was trained and evaluated on **VMD-D**, the first large-scale video mirror detection dataset: |
| - 14,987 frames from 269 videos with manually annotated binary masks |
| - Available at [garrying/VMD-D](https://huggingface.co/datasets/garrying/VMD-D) |
|
|
| ## Citation |
|
|
| ```bibtex |
| @InProceedings{Lin_2023_CVPR, |
| author = {Lin, Jiaying and Tan, Xin and Lau, Rynson W.H.}, |
| title = {Learning To Detect Mirrors From Videos via Dual Correspondences}, |
| booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, |
| month = {June}, |
| year = {2023}, |
| pages = {9109-9118} |
| } |
| ``` |
|
|
| ## License |
|
|
| Non-commercial use only — [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). |
|
|