garrying
/

VMD-Net

+---
+license: cc-by-nc-4.0
+task_categories:
+  - image-segmentation
+tags:
+  - mirror-detection
+  - video-understanding
+  - video-mirror-detection
+  - scene-understanding
+  - pytorch
+pretty_name: VMD-Net (Video Mirror Detection Network)
+---
+# VMD-Net — Video Mirror Detection Network
+Pre-trained weights for **VMD-Net**, introduced in:
+> **Learning to Detect Mirrors from Videos via Dual Correspondences**
+> Jiaying Lin\*, Xin Tan\*, Rynson W. H. Lau
+> CVPR 2023
+> [Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Lin_Learning_To_Detect_Mirrors_From_Videos_via_Dual_Correspondences_CVPR_2023_paper.pdf) · [Project Page](https://jiaying.link/cvpr2023-vmd/) · [Dataset (VMD-D)](https://huggingface.co/datasets/garrying/VMD-D)
+## Model Summary
+VMD-Net detects mirrors in video sequences by exploiting **dual correspondences** — both intra-frame (spatial) and inter-frame (temporal) — via a Relation Attention module built on a DeepLabV3 encoder backbone. This design lets the model handle frames where intra-frame mirror cues are weak or absent, producing accurate and temporally consistent segmentation masks.
+| File | Description |
+|------|-------------|
+| `best.pth` | Best checkpoint (714 MB), saved as `{'model': state_dict, ...}` |
+| `results/results.zip` | VMD-Net predictions on the VMD-D test set |
+| `results/baseline_results.zip` | Baseline method predictions for comparison |
+## Loading the Weights
+```python
+import torch
+from networks.VMD_network import VMD_Network   # from the code release
+model = VMD_Network()
+checkpoint = torch.load("best.pth", map_location="cpu")
+model.load_state_dict(checkpoint["model"])
+model.eval()
+```
+Download the checkpoint:
+```bash
+huggingface-cli download garrying/VMD-Net best.pth --local-dir ./weights
+```
+## Training Dataset
+This model was trained and evaluated on **VMD-D**, the first large-scale video mirror detection dataset:
+- 14,987 frames from 269 videos with manually annotated binary masks
+- Available at [garrying/VMD-D](https://huggingface.co/datasets/garrying/VMD-D)
+## Citation
+```bibtex
+@InProceedings{Lin_2023_CVPR,
+  author    = {Lin, Jiaying and Tan, Xin and Lau, Rynson W.H.},
+  title     = {Learning To Detect Mirrors From Videos via Dual Correspondences},
+  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
+  month     = {June},
+  year      = {2023},
+  pages     = {9109-9118}
+}
+```
+## License
+Non-commercial use only — [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).