LingBot-Depth: Masked Depth Modeling for Spatial Perception

LingBot-Depth transforms incomplete and noisy depth sensor data into high-quality, metric-accurate 3D measurements. By jointly aligning RGB appearance and depth geometry in a unified latent space, our model serves as a powerful spatial perception foundation for robot learning and 3D vision applications.

Available Models

Model	Hugging Face Model	ModelScope Model	Description
LingBot-Depth	robbyant/lingbot-depth-pretrain-vitl-14	robbyant/lingbot-depth-pretrain-vitl-14	General-purpose depth refinement
LingBot-Depth-DC	robbyant/lingbot-depth-postrain-dc-vitl14	robbyant/lingbot-depth-postrain-dc-vitl14	Optimized for sparse depth completion

Quick Start

import torch
from mdm.model.v2 import MDMModel

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# For general depth refinement
model = MDMModel.from_pretrained('robbyant/lingbot-depth-pretrain-vitl-14').to(device)

# For sparse depth completion (e.g., SfM inputs)
model = MDMModel.from_pretrained('robbyant/lingbot-depth-postrain-dc-vitl14').to(device)

Model Overview

LingBot-Depth (Pretrained)

The general-purpose model trained on 10M RGB-D samples for:

Depth completion from RGB-D sensor inputs
Depth refinement for noisy measurements
Point cloud generation

LingBot-Depth-DC (Depth Completion)

Post-trained variant optimized for sparse depth completion:

Recovering dense depth from SfM/SLAM sparse points
Handling extremely sparse inputs (<5% valid pixels)
RGB-guided depth densification

Key Features

Masked Depth Modeling: Self-supervised pre-training via depth reconstruction
Cross-Modal Attention: Joint RGB-Depth alignment in unified latent space
Metric-Scale Preservation: Maintains real-world measurements for downstream tasks

Architecture

Encoder: ViT-Large/14 (24 layers) with separated patch embeddings for RGB and depth
Decoder: ConvStack decoder with hierarchical upsampling
Model size: ~300M parameters

Citation

@article{lingbot-depth2026,
  title={Masked Depth Modeling for Spatial Perception},
  author={Tan, Bin and Sun, Changjiang and Qin, Xiage and Adai, Hanat and Fu, Zelin and Zhou, Tianxiang and Zhang, Han and Xu, Yinghao and Zhu, Xing and Shen, Yujun and Xue, Nan},
  journal={arXiv preprint arXiv:2601.17895},
  year={2026}
}

License

Apache License 2.0

Contact

Email: tanbin.tan@antgroup.com, xuenan.xue@antgroup.com
Issues: https://github.com/robbyant/lingbot-depth/issues

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Depth Estimation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including robbyant/lingbot-depth

LingBot-Depth

Collection

5 items • Updated Mar 21 • 7

Paper for robbyant/lingbot-depth

Masked Depth Modeling for Spatial Perception

Paper • 2601.17895 • Published Jan 25 • 30

robbyant
/

lingbot-depth