sy000 commited on
Commit
ca42394
·
verified ·
1 Parent(s): d5ae4d8

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. DA3_MOG_Sky_LogL2.ckpt +3 -0
  2. README.md +57 -0
  3. VGGT_MOG_LogL2.ckpt +3 -0
DA3_MOG_Sky_LogL2.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d442e856a6be8832aac72a0f17293ae98af75a5bd8d21adfbe337bd915e41d83
3
+ size 5423360059
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: pytorch
4
+ tags:
5
+ - depth-estimation
6
+ - 3d-reconstruction
7
+ - multi-view
8
+ - camera-pose
9
+ - gaussian-splatting
10
+ - depth-anything-3
11
+ - vggt
12
+ pipeline_tag: depth-estimation
13
+ ---
14
+
15
+ # MDA — Multi-view depth & geometry checkpoints
16
+
17
+ These are the official model checkpoints for the paper
18
+ **"Modeling Depth Ambiguity: A Mixture-Density Representation for Flying-Point-Free Depth Estimation"** (MDA).
19
+
20
+ 📄 [arXiv](https://arxiv.org/abs/2606.02552)  |  🌐 [Project page](https://biansy000.github.io/mda-site/)
21
+
22
+ MDA is a mixture-density depth representation that predicts several depth
23
+ hypotheses (with their probabilities) at every pixel instead of forcing a single
24
+ depth, which largely removes the *flying-point* artifacts at object boundaries
25
+ that plague feed-forward depth estimators. See the [Citation](#citation) section
26
+ to cite this work.
27
+
28
+ These two checkpoints are used for multi-view geometry prediction —
29
+ spatially consistent depth and camera pose from a set of input images. They are
30
+ built on two different backbones and trained with a Mixture-of-Gaussians (MoG)
31
+ depth head and a `logl2` objective.
32
+
33
+ | File | Backbone | Wrapper | `model_choice.py` name | Params |
34
+ |---|---|---|---|---|
35
+ | [`DA3_MOG_Sky_LogL2.ckpt`](./DA3_MOG_Sky_LogL2.ckpt) | DA3 Giant | `DA3Wrapper` | `mda_mog_sky_l2` | ~1.36 B |
36
+ | [`VGGT_MOG_LogL2.ckpt`](./VGGT_MOG_LogL2.ckpt) | VGGT-1B | `VGGTWrapper` | `vggt_mog_l2` | ~1.16 B |
37
+
38
+ Both are PyTorch Lightning checkpoints (`save_weights_only=True`, Lightning 2.5.6).
39
+ State-dict keys are prefixed `net.net.*` because the network is wrapped by a
40
+ Lightning module — strip the prefix and load into the bare net. These are **research checkpoints** and are **not** loadable
41
+ through the standard `DepthAnything3.from_pretrained` HF API.
42
+
43
+ ## Citation
44
+
45
+ If you build on **MDA**, please cite:
46
+
47
+ ```bibtex
48
+ @misc{bian2026modeling,
49
+ title = {Modeling Depth Ambiguity: A Mixture-Density Representation for Flying-Point-Free Depth Estimation},
50
+ author = {Siyuan Bian and Congrong Xu and Jun Gao},
51
+ year = {2026},
52
+ eprint = {2606.02552},
53
+ archivePrefix = {arXiv},
54
+ primaryClass = {cs.CV},
55
+ url = {https://arxiv.org/abs/2606.02552}
56
+ }
57
+ ```
VGGT_MOG_LogL2.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d82e60abcf9f26dca7ac0056d5f5db893e6e6827f0dd7e242bcff93c079a4177
3
+ size 4632792671