cherubicxn commited on
Commit
5b24b3e
·
verified ·
1 Parent(s): 2e8c2d4

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +98 -3
README.md CHANGED
@@ -1,3 +1,98 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - depth-estimation
7
+ - depth-completion
8
+ - rgb-d
9
+ - computer-vision
10
+ - robotics
11
+ - 3d-vision
12
+ - pytorch
13
+ - vision-transformer
14
+ datasets:
15
+ - custom
16
+ library_name: pytorch
17
+ pipeline_tag: depth-estimation
18
+ ---
19
+
20
+ # LingBot-Depth: Masked Depth Modeling for Spatial Perception
21
+
22
+ **LingBot-Depth** transforms incomplete and noisy depth sensor data into high-quality, metric-accurate 3D measurements. By jointly aligning RGB appearance and depth geometry in a unified latent space, our model serves as a powerful spatial perception foundation for robot learning and 3D vision applications.
23
+
24
+ ## Available Models
25
+
26
+ | Model | HuggingFace Repository | Description |
27
+ |-------|------------------------|-------------|
28
+ | **LingBot-Depth** | [robbyant/lingbot-depth-pretrain-vitl-14](https://huggingface.co/robbyant/lingbot-depth-pretrain-vitl-14) | General-purpose depth refinement |
29
+ | **LingBot-Depth-DC** | [robbyant/lingbot-depth-postrain-dc-vitl14](https://huggingface.co/robbyant/lingbot-depth-postrain-dc-vitl14) | Optimized for sparse depth completion |
30
+
31
+ ## Quick Start
32
+
33
+ ```python
34
+ import torch
35
+ from mdm.model.v2 import MDMModel
36
+
37
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
38
+
39
+ # For general depth refinement
40
+ model = MDMModel.from_pretrained('robbyant/lingbot-depth-pretrain-vitl-14').to(device)
41
+
42
+ # For sparse depth completion (e.g., SfM inputs)
43
+ model = MDMModel.from_pretrained('robbyant/lingbot-depth-postrain-dc-vitl14').to(device)
44
+ ```
45
+
46
+ ## Model Overview
47
+
48
+ ### LingBot-Depth (Pretrained)
49
+
50
+ The general-purpose model trained on 10M RGB-D samples for:
51
+ - Depth completion from RGB-D sensor inputs
52
+ - Depth refinement for noisy measurements
53
+ - Point cloud generation
54
+
55
+ ### LingBot-Depth-DC (Depth Completion)
56
+
57
+ Post-trained variant optimized for sparse depth completion:
58
+ - Recovering dense depth from SfM/SLAM sparse points
59
+ - Handling extremely sparse inputs (<5% valid pixels)
60
+ - RGB-guided depth densification
61
+
62
+ ## Key Features
63
+
64
+ - **Masked Depth Modeling**: Self-supervised pre-training via depth reconstruction
65
+ - **Cross-Modal Attention**: Joint RGB-Depth alignment in unified latent space
66
+ - **Metric-Scale Preservation**: Maintains real-world measurements for downstream tasks
67
+
68
+ ## Architecture
69
+
70
+ - **Encoder:** ViT-Large/14 (24 layers) with separated patch embeddings for RGB and depth
71
+ - **Decoder:** ConvStack decoder with hierarchical upsampling
72
+ - **Model size:** ~300M parameters
73
+
74
+ ## Links
75
+
76
+ - **GitHub:** https://github.com/robbyant/lingbot-depth
77
+ - **Paper:** [Masked Depth Modeling for Spatial Perception](https://arxiv.org/abs/2601.xxxxx)
78
+ - **Project Page:** https://technology.robbyant.com/lingbot-depth
79
+
80
+ ## Citation
81
+
82
+ ```bibtex
83
+ @article{lingbot-depth2026,
84
+ title={Masked Depth Modeling for Spatial Perception},
85
+ author={Tan, Bin and Sun, Changjiang and Qin, Xiage and Adai, Hanat and Fu, Zelin and Zhou, Tianxiang and Zhang, Han and Xu, Yinghao and Zhu, Xing and Shen, Yujun and Xue, Nan},
86
+ journal={arXiv preprint arXiv:2601.xxxxx},
87
+ year={2026}
88
+ }
89
+ ```
90
+
91
+ ## License
92
+
93
+ Apache License 2.0
94
+
95
+ ## Contact
96
+
97
+ - **Email:** tanbin.tan@antgroup.com, xuenan.xue@antgroup.com
98
+ - **Issues:** https://github.com/robbyant/lingbot-depth/issues