Buckets:
| license: apache-2.0 | |
| task_categories: | |
| - depth-estimation | |
| tags: | |
| - depth-estimation | |
| - 3d-vision | |
| - multimodal | |
| - metric-depth | |
| paper: | |
| - arxiv: 2605.15876 | |
| # DepthVLM-Bench | |
| DepthVLM-Bench is a unified indoor-outdoor metric depth estimation benchmark designed for vision-language models (VLMs). The benchmark provides diverse indoor and outdoor scenes with metric depth annotations in a unified VLM-compatible format, enabling large multimodal models to jointly learn dense geometry prediction and multimodal understanding. | |
| ## Features | |
| - Unified indoor and outdoor metric depth estimation | |
| - VLM-compatible data format | |
| - Dense depth supervision for multimodal foundation models | |
| - Designed for scalable multimodal training | |
| ## Paper | |
| [Unlocking Dense Metric Depth Estimation in VLMs](https://arxiv.org/abs/2605.15876) | |
| ## Usage | |
| Please refer to the official repository for: | |
| - Data preprocessing | |
| - Evaluation scripts | |
| - Visualization examples | |
| Repository: https://github.com/hanxunyu/DepthVLM | |
| ## Citation | |
| ```bibtex id="83r6sk" | |
| @article{yu2026unlocking, | |
| title={Unlocking Dense Metric Depth Estimation in VLMs}, | |
| author={Hanxun Yu and Xuan Qu and Yuxin Wang and Jianke Zhu and Lei Ke}, | |
| journal={arXiv preprint arXiv:2605.15876}, | |
| year={2026} | |
| } |
Xet Storage Details
- Size:
- 1.26 kB
- Xet hash:
- b1fea33c0abadf4d55e3e86365e807d6482b2ea5e0aa80c952e86c8e6f6a6064
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.