cherubicxn commited on
Commit
0431caf
·
verified ·
1 Parent(s): f0241b9

Upload folder using huggingface_hub

Browse files
Files changed (2) hide show
  1. README.md +95 -3
  2. model.pt +3 -0
README.md CHANGED
@@ -1,3 +1,95 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - depth-estimation
7
+ - depth-completion
8
+ - rgb-d
9
+ - computer-vision
10
+ - robotics
11
+ - 3d-vision
12
+ - pytorch
13
+ - vision-transformer
14
+ datasets:
15
+ - custom
16
+ metrics:
17
+ - rmse
18
+ - mae
19
+ library_name: pytorch
20
+ pipeline_tag: depth-estimation
21
+ ---
22
+
23
+ # LingBot-Depth (Pretrained)
24
+
25
+ **LingBot-Depth** transforms incomplete and noisy depth sensor data into high-quality, metric-accurate 3D measurements. This is the **general-purpose pretrained model** for depth refinement tasks.
26
+
27
+ ## Model Details
28
+
29
+ ### Model Description
30
+
31
+ LingBot-Depth employs a masked depth modeling (MDM) approach that treats missing depth measurements from RGB-D sensors not as noise, but as a natural masking signal that highlights geometric ambiguities. The model learns joint representations from RGB appearance context and valid depth observations, enabling robust depth reasoning under incomplete observations.
32
+
33
+ - **Developed by:** Bin Tan, Changjiang Sun, Xiage Qin, Hanat Adai, Zelin Fu, Tianxiang Zhou, Han Zhang, Yinghao Xu, Xing Zhu, Yujun Shen, Nan Xue
34
+ - **Model type:** Vision Transformer for depth completion and refinement
35
+ - **License:** Apache 2.0
36
+
37
+ ### Model Sources
38
+
39
+ - **Repository:** https://github.com/robbyant/lingbot-depth
40
+ - **Paper:** [Masked Depth Modeling for Spatial Perception](https://arxiv.org/abs/2601.xxxxx)
41
+ - **Project Page:** https://technology.robbyant.com/lingbot-depth
42
+
43
+ ### Related Models
44
+
45
+ | Model | Repository | Description |
46
+ |-------|------------|-------------|
47
+ | LingBot-Depth | [robbyant/lingbot-depth-pretrain-vitl-14](https://huggingface.co/robbyant/lingbot-depth-pretrain-vitl-14) | General-purpose depth refinement (this model) |
48
+ | LingBot-Depth-DC | [robbyant/lingbot-depth-postrain-dc-vitl14](https://huggingface.co/robbyant/lingbot-depth-postrain-dc-vitl14) | Optimized for sparse depth completion |
49
+
50
+ ## Uses
51
+
52
+ ### Direct Use
53
+
54
+ - **Depth Completion**: Filling missing regions in raw RGB-D sensor depth maps with metric accuracy
55
+ - **Depth Refinement**: Improving noisy depth measurements from consumer-grade depth cameras
56
+ - **Point Cloud Generation**: Producing clean 3D point clouds from RGB-D inputs
57
+
58
+ ### Downstream Use
59
+
60
+ - **Scene Reconstruction**: High-fidelity indoor mapping with strong depth priors
61
+ - **4D Point Tracking**: Accurate dynamic tracking in metric space for robot learning
62
+ - **Dexterous Manipulation**: Robust robotic grasping with precise geometric understanding
63
+ - **Monocular Depth Estimation**: As a pretrained backbone for depth estimation models
64
+ - **Stereo Matching**: As a depth prior for stereo matching networks (e.g., FoundationStereo)
65
+
66
+ ## Technical Specifications
67
+
68
+ ### Model Architecture
69
+
70
+ - **Encoder:** ViT-Large/14 (24 layers) with separated patch embeddings for RGB and depth
71
+ - **Decoder:** ConvStack decoder with hierarchical upsampling
72
+ - **Objective:** Masked depth modeling
73
+ - **Model size:** ~300M parameters
74
+
75
+ ### Software Requirements
76
+
77
+ - Python >= 3.9
78
+ - PyTorch >= 2.0.0
79
+ - xformers
80
+
81
+ ## Citation
82
+
83
+ ```bibtex
84
+ @article{lingbot-depth2026,
85
+ title={Masked Depth Modeling for Spatial Perception},
86
+ author={Tan, Bin and Sun, Changjiang and Qin, Xiage and Adai, Hanat and Fu, Zelin and Zhou, Tianxiang and Zhang, Han and Xu, Yinghao and Zhu, Xing and Shen, Yujun and Xue, Nan},
87
+ journal={arXiv preprint arXiv:2601.xxxxx},
88
+ year={2026}
89
+ }
90
+ ```
91
+
92
+ ## Model Card Contact
93
+
94
+ - **Email:** tanbin.tan@antgroup.com, xuenan.xue@antgroup.com
95
+ - **Issues:** https://github.com/robbyant/lingbot-depth/issues
model.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ab1da5822e4fea712202616d1f3b683ce4b2f7f82ea58fb3f5ebd7cfae9c0e0
3
+ size 1284841262