Buckets:
6.46 MB
10 files
Updated 11 days ago
Ctrl+K
| Name | Size | Uploaded | Xet hash |
|---|---|---|---|
| README.md | 1.26 kB xet | b1fea33c | |
| argoverse_pixel_depth_test.jsonl | 957 kB xet | fd702afc | |
| ddad_pixel_depth_val.jsonl | 832 kB xet | 7ad582ea | |
| eth3d_pixel_depth_all.jsonl | 545 kB xet | a801a689 | |
| ibims1_pixel_depth_test.jsonl | 263 kB xet | a03d8d40 | |
| nuscenes_pixel_depth_test.jsonl | 856 kB xet | d940b2b2 | |
| nyuv2_pixel_depth_test.jsonl | 526 kB xet | 306a7c6e | |
| scannetpp_pixel_depth_val.jsonl | 725 kB xet | 57ea227c | |
| sunrgbd_pixel_depth_test.jsonl | 839 kB xet | 04891d95 | |
| waymo_pixel_depth_test.jsonl | 909 kB xet | 4f4f692c |
DepthVLM-Bench
DepthVLM-Bench is a unified indoor-outdoor metric depth estimation benchmark designed for vision-language models (VLMs). The benchmark provides diverse indoor and outdoor scenes with metric depth annotations in a unified VLM-compatible format, enabling large multimodal models to jointly learn dense geometry prediction and multimodal understanding.
Features
- Unified indoor and outdoor metric depth estimation
- VLM-compatible data format
- Dense depth supervision for multimodal foundation models
- Designed for scalable multimodal training
Paper
Unlocking Dense Metric Depth Estimation in VLMs
Usage
Please refer to the official repository for:
- Data preprocessing
- Evaluation scripts
- Visualization examples
Repository: https://github.com/hanxunyu/DepthVLM
Citation
@article{yu2026unlocking,
title={Unlocking Dense Metric Depth Estimation in VLMs},
author={Hanxun Yu and Xuan Qu and Yuxin Wang and Jianke Zhu and Lei Ke},
journal={arXiv preprint arXiv:2605.15876},
year={2026}
}
- Total size
- 6.46 MB
- Files
- 10
- Last updated
- May 24
- Pre-warmed CDN
- US EU US EU