garrying commited on
Commit
8198939
·
verified ·
1 Parent(s): 78a2573

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +71 -0
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ task_categories:
4
+ - image-segmentation
5
+ tags:
6
+ - mirror-detection
7
+ - video-understanding
8
+ - video-mirror-detection
9
+ - scene-understanding
10
+ - pytorch
11
+ pretty_name: VMD-Net (Video Mirror Detection Network)
12
+ ---
13
+
14
+ # VMD-Net — Video Mirror Detection Network
15
+
16
+ Pre-trained weights for **VMD-Net**, introduced in:
17
+
18
+ > **Learning to Detect Mirrors from Videos via Dual Correspondences**
19
+ > Jiaying Lin\*, Xin Tan\*, Rynson W. H. Lau
20
+ > CVPR 2023
21
+ > [Paper](https://openaccess.thecvf.com/content/CVPR2023/papers/Lin_Learning_To_Detect_Mirrors_From_Videos_via_Dual_Correspondences_CVPR_2023_paper.pdf) · [Project Page](https://jiaying.link/cvpr2023-vmd/) · [Dataset (VMD-D)](https://huggingface.co/datasets/garrying/VMD-D)
22
+
23
+ ## Model Summary
24
+
25
+ VMD-Net detects mirrors in video sequences by exploiting **dual correspondences** — both intra-frame (spatial) and inter-frame (temporal) — via a Relation Attention module built on a DeepLabV3 encoder backbone. This design lets the model handle frames where intra-frame mirror cues are weak or absent, producing accurate and temporally consistent segmentation masks.
26
+
27
+ | File | Description |
28
+ |------|-------------|
29
+ | `best.pth` | Best checkpoint (714 MB), saved as `{'model': state_dict, ...}` |
30
+ | `results/results.zip` | VMD-Net predictions on the VMD-D test set |
31
+ | `results/baseline_results.zip` | Baseline method predictions for comparison |
32
+
33
+ ## Loading the Weights
34
+
35
+ ```python
36
+ import torch
37
+ from networks.VMD_network import VMD_Network # from the code release
38
+
39
+ model = VMD_Network()
40
+ checkpoint = torch.load("best.pth", map_location="cpu")
41
+ model.load_state_dict(checkpoint["model"])
42
+ model.eval()
43
+ ```
44
+
45
+ Download the checkpoint:
46
+ ```bash
47
+ huggingface-cli download garrying/VMD-Net best.pth --local-dir ./weights
48
+ ```
49
+
50
+ ## Training Dataset
51
+
52
+ This model was trained and evaluated on **VMD-D**, the first large-scale video mirror detection dataset:
53
+ - 14,987 frames from 269 videos with manually annotated binary masks
54
+ - Available at [garrying/VMD-D](https://huggingface.co/datasets/garrying/VMD-D)
55
+
56
+ ## Citation
57
+
58
+ ```bibtex
59
+ @InProceedings{Lin_2023_CVPR,
60
+ author = {Lin, Jiaying and Tan, Xin and Lau, Rynson W.H.},
61
+ title = {Learning To Detect Mirrors From Videos via Dual Correspondences},
62
+ booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
63
+ month = {June},
64
+ year = {2023},
65
+ pages = {9109-9118}
66
+ }
67
+ ```
68
+
69
+ ## License
70
+
71
+ Non-commercial use only — [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).