Buckets:

joedorfman
/

DepthVLM-Bench-bucket

6.46 MB

10 files

Updated 2 months ago

Ctrl+K

Name	Size	Uploaded	Xet hash
README.md	1.26 kB xet	2 months ago	b1fea33c
argoverse_pixel_depth_test.jsonl	957 kB xet	2 months ago	fd702afc
ddad_pixel_depth_val.jsonl	832 kB xet	2 months ago	7ad582ea
eth3d_pixel_depth_all.jsonl	545 kB xet	2 months ago	a801a689
ibims1_pixel_depth_test.jsonl	263 kB xet	2 months ago	a03d8d40
nuscenes_pixel_depth_test.jsonl	856 kB xet	2 months ago	d940b2b2
nyuv2_pixel_depth_test.jsonl	526 kB xet	2 months ago	306a7c6e
scannetpp_pixel_depth_val.jsonl	725 kB xet	2 months ago	57ea227c
sunrgbd_pixel_depth_test.jsonl	839 kB xet	2 months ago	04891d95
waymo_pixel_depth_test.jsonl	909 kB xet	2 months ago	4f4f692c

README.md

DepthVLM-Bench

DepthVLM-Bench is a unified indoor-outdoor metric depth estimation benchmark designed for vision-language models (VLMs). The benchmark provides diverse indoor and outdoor scenes with metric depth annotations in a unified VLM-compatible format, enabling large multimodal models to jointly learn dense geometry prediction and multimodal understanding.

Features

Unified indoor and outdoor metric depth estimation
VLM-compatible data format
Dense depth supervision for multimodal foundation models
Designed for scalable multimodal training

Paper

Unlocking Dense Metric Depth Estimation in VLMs

Usage

Please refer to the official repository for:

Data preprocessing
Evaluation scripts
Visualization examples

Repository: https://github.com/hanxunyu/DepthVLM

Citation

@article{yu2026unlocking,
  title={Unlocking Dense Metric Depth Estimation in VLMs},
  author={Hanxun Yu and Xuan Qu and Yuxin Wang and Jianke Zhu and Lei Ke},
  journal={arXiv preprint arXiv:2605.15876},
  year={2026}
}

Total size: 6.46 MB

Files: 10

Last updated: May 24

Pre-warmed CDN: US EU US EU

DepthVLM-Bench

Features

Paper

Usage

Citation

Contributors