LingBot-Depth-DC (Depth Completion)

LingBot-Depth-DC is a post-trained variant of LingBot-Depth, specifically optimized for sparse depth completion tasks. This model excels at recovering dense depth maps from highly sparse inputs such as SfM/SLAM point clouds.

Model Details

Model Description

This model builds upon the LingBot-Depth pretrained checkpoint with additional post-training focused on sparse depth completion scenarios. It is particularly effective for:

  • Recovering complete depth from sparse SfM/SLAM observations

  • Handling extremely sparse depth inputs (e.g., <5% valid pixels)

  • Scenarios where depth sensors are unavailable and only sparse geometric cues exist

  • Developed by: Bin Tan, Changjiang Sun, Xiage Qin, Hanat Adai, Zelin Fu, Tianxiang Zhou, Han Zhang, Yinghao Xu, Xing Zhu, Yujun Shen, Nan Xue

  • Model type: Vision Transformer for sparse depth completion

  • License: Apache 2.0

  • Finetuned from model: LingBot-Depth (pretrained)

Model Sources

Related Models

Model Hugging Face Model ModelScope Model Description
LingBot-Depth robbyant/lingbot-depth-pretrain-vitl-14 robbyant/lingbot-depth-pretrain-vitl-14 General-purpose depth refinement
LingBot-Depth-DC robbyant/lingbot-depth-postrain-dc-vitl14 robbyant/lingbot-depth-postrain-dc-vitl14 Optimized for sparse depth completion

Uses

Direct Use

  • Sparse Depth Completion: Recovering dense depth from SfM/SLAM sparse point clouds
  • Extreme Sparsity Handling: Working with <5% valid depth pixels
  • RGB-guided Depth Densification: Using visual context to fill large missing regions

Downstream Use

  • SLAM Enhancement: Densifying sparse SLAM outputs for better scene understanding
  • Novel View Synthesis: Providing dense geometry for view synthesis pipelines
  • 3D Reconstruction: Completing sparse depth for mesh reconstruction
  • Robotics Navigation: Dense depth from sparse sensor observations

Technical Specifications

Model Architecture

  • Encoder: ViT-Large/14 (24 layers) with separated patch embeddings for RGB and depth
  • Decoder: ConvStack decoder with hierarchical upsampling
  • Objective: Masked depth modeling optimized for sparse inputs
  • Model size: ~300M parameters

Software Requirements

  • Python >= 3.9
  • PyTorch >= 2.0.0
  • xformers

Citation

@article{lingbot-depth2026,
  title={Masked Depth Modeling for Spatial Perception},
  author={Tan, Bin and Sun, Changjiang and Qin, Xiage and Adai, Hanat and Fu, Zelin and Zhou, Tianxiang and Zhang, Han and Xu, Yinghao and Zhu, Xing and Shen, Yujun and Xue, Nan},
  journal={arXiv preprint arXiv:2601.17895},
  year={2026}
}

Model Card Contact

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including robbyant/lingbot-depth-postrain-dc-vitl14

Paper for robbyant/lingbot-depth-postrain-dc-vitl14