joedorfman's picture
|
download
raw
1.26 kB
metadata
license: apache-2.0
task_categories:
  - depth-estimation
tags:
  - depth-estimation
  - 3d-vision
  - multimodal
  - metric-depth
paper:
  - arxiv: 2605.15876

DepthVLM-Bench

DepthVLM-Bench is a unified indoor-outdoor metric depth estimation benchmark designed for vision-language models (VLMs). The benchmark provides diverse indoor and outdoor scenes with metric depth annotations in a unified VLM-compatible format, enabling large multimodal models to jointly learn dense geometry prediction and multimodal understanding.

Features

  • Unified indoor and outdoor metric depth estimation
  • VLM-compatible data format
  • Dense depth supervision for multimodal foundation models
  • Designed for scalable multimodal training

Paper

Unlocking Dense Metric Depth Estimation in VLMs

Usage

Please refer to the official repository for:

  • Data preprocessing
  • Evaluation scripts
  • Visualization examples

Repository: https://github.com/hanxunyu/DepthVLM

Citation

@article{yu2026unlocking,
  title={Unlocking Dense Metric Depth Estimation in VLMs},
  author={Hanxun Yu and Xuan Qu and Yuxin Wang and Jianke Zhu and Lei Ke},
  journal={arXiv preprint arXiv:2605.15876},
  year={2026}
}

Xet Storage Details

Size:
1.26 kB
·
Xet hash:
b1fea33c0abadf4d55e3e86365e807d6482b2ea5e0aa80c952e86c8e6f6a6064

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.