Buckets:
metadata
license: apache-2.0
task_categories:
- depth-estimation
tags:
- depth-estimation
- 3d-vision
- multimodal
- metric-depth
paper:
- arxiv: 2605.15876
DepthVLM-Bench
DepthVLM-Bench is a unified indoor-outdoor metric depth estimation benchmark designed for vision-language models (VLMs). The benchmark provides diverse indoor and outdoor scenes with metric depth annotations in a unified VLM-compatible format, enabling large multimodal models to jointly learn dense geometry prediction and multimodal understanding.
Features
- Unified indoor and outdoor metric depth estimation
- VLM-compatible data format
- Dense depth supervision for multimodal foundation models
- Designed for scalable multimodal training
Paper
Unlocking Dense Metric Depth Estimation in VLMs
Usage
Please refer to the official repository for:
- Data preprocessing
- Evaluation scripts
- Visualization examples
Repository: https://github.com/hanxunyu/DepthVLM
Citation
@article{yu2026unlocking,
title={Unlocking Dense Metric Depth Estimation in VLMs},
author={Hanxun Yu and Xuan Qu and Yuxin Wang and Jianke Zhu and Lei Ke},
journal={arXiv preprint arXiv:2605.15876},
year={2026}
}
Xet Storage Details
- Size:
- 1.26 kB
- Xet hash:
- b1fea33c0abadf4d55e3e86365e807d6482b2ea5e0aa80c952e86c8e6f6a6064
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.