LingBot-Depth
Collection
3 items
•
Updated
LingBot-Depth transforms incomplete and noisy depth sensor data into high-quality, metric-accurate 3D measurements. By jointly aligning RGB appearance and depth geometry in a unified latent space, our model serves as a powerful spatial perception foundation for robot learning and 3D vision applications.
| Model | HuggingFace Repository | Description |
|---|---|---|
| LingBot-Depth | robbyant/lingbot-depth-pretrain-vitl-14 | General-purpose depth refinement |
| LingBot-Depth-DC | robbyant/lingbot-depth-postrain-dc-vitl14 | Optimized for sparse depth completion |
import torch
from mdm.model.v2 import MDMModel
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# For general depth refinement
model = MDMModel.from_pretrained('robbyant/lingbot-depth-pretrain-vitl-14').to(device)
# For sparse depth completion (e.g., SfM inputs)
model = MDMModel.from_pretrained('robbyant/lingbot-depth-postrain-dc-vitl14').to(device)
The general-purpose model trained on 10M RGB-D samples for:
Post-trained variant optimized for sparse depth completion:
@article{lingbot-depth2026,
title={Masked Depth Modeling for Spatial Perception},
author={Tan, Bin and Sun, Changjiang and Qin, Xiage and Adai, Hanat and Fu, Zelin and Zhou, Tianxiang and Zhang, Han and Xu, Yinghao and Zhu, Xing and Shen, Yujun and Xue, Nan},
journal={arXiv preprint arXiv:2601.xxxxx},
year={2026}
}
Apache License 2.0