Update 2026-05-18 (v1.0): Initial release

DepthVLM-4B

DepthVLM serves as a unified foundation model for both low-level dense geometry prediction and high-level multimodal understanding, while achieving substantially faster inference compared with existing VLM-based approaches such as DepthLM and Youtu-VL.

Highlights

  • Native dense metric depth estimation in VLMs
  • Unified multimodal understanding and geometry prediction
  • Full-resolution depth prediction with efficient inference
  • Supports both indoor and outdoor metric depth estimation
  • Improved 3D spatial reasoning capability

Paper

Unlocking Dense Metric Depth Estimation in VLMs

Usage

Please refer to the official repository for detailed instructions on:

  • Data preprocessing
  • Training
  • Evaluation
  • Inference and visualization

Repository: https://github.com/hanxunyu/DepthVLM

Citation

If you find this work useful, please cite:

@article{yu2026unlocking,
  title={Unlocking Dense Metric Depth Estimation in VLMs},
  author={Hanxun Yu and Xuan Qu and Yuxin Wang and Jianke Zhu and Lei Ke},
  journal={arXiv preprint arXiv:2605.15876},
  year={2026}
}
Downloads last month
10
Safetensors
Model size
5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JonnyYu828/DepthVLM-4B

Finetuned
(270)
this model

Paper for JonnyYu828/DepthVLM-4B