6.46 MB
10 files
Updated 11 days ago
README.md

DepthVLM-Bench

DepthVLM-Bench is a unified indoor-outdoor metric depth estimation benchmark designed for vision-language models (VLMs). The benchmark provides diverse indoor and outdoor scenes with metric depth annotations in a unified VLM-compatible format, enabling large multimodal models to jointly learn dense geometry prediction and multimodal understanding.

Features

  • Unified indoor and outdoor metric depth estimation
  • VLM-compatible data format
  • Dense depth supervision for multimodal foundation models
  • Designed for scalable multimodal training

Paper

Unlocking Dense Metric Depth Estimation in VLMs

Usage

Please refer to the official repository for:

  • Data preprocessing
  • Evaluation scripts
  • Visualization examples

Repository: https://github.com/hanxunyu/DepthVLM

Citation

@article{yu2026unlocking,
  title={Unlocking Dense Metric Depth Estimation in VLMs},
  author={Hanxun Yu and Xuan Qu and Yuxin Wang and Jianke Zhu and Lei Ke},
  journal={arXiv preprint arXiv:2605.15876},
  year={2026}
}
Total size
6.46 MB
Files
10
Last updated
May 24
Pre-warmed CDN
US EU US EU

Contributors