JonnyYu828
/

DepthVLM-4B

@@ -27,9 +27,9 @@ Unlocking Dense Metric Depth Estimation in VLMs
   <b>GitHub:</b> <a href="https://github.com/hanxunyu/DepthVLM">hanxunyu/DepthVLM</a> |
   <b>arXiv:</b> <a href="https://arxiv.org/abs/2605.15876">2605.15876</a>
   <br><br>
-  <a href="https://depthvlm.github.io/"><img src="https://img.shields.io/badge/Project-Page-green?logo=safari&logoColor=white" alt="Project Page"></a>
-  <a href="https://github.com/hanxunyu/DepthVLM"><img src="https://img.shields.io/badge/GitHub-Repo-blue?logo=github" alt="GitHub Badge"></a>
-  <a href="https://huggingface.co/JonnyYu828/DepthVLM-4B"><img src="https://img.shields.io/badge/HuggingFace-Model-yellow?logo=huggingface" alt="Hugging Face Model"></a>
   <a href="https://arxiv.org/abs/2605.15876"><img src="https://img.shields.io/badge/arXiv-2605.15876-b31b1b.svg?logo=arxiv&logoColor=red" alt="arXiv"></a>
 </h4>
@@ -38,8 +38,8 @@ Unlocking Dense Metric Depth Estimation in VLMs
 ## 📰 News
-* **2026.05** — Released DepthVLM-Bench.
-* **2026.05** — Released DepthVLM-4B.
 ---
@@ -47,7 +47,7 @@ Unlocking Dense Metric Depth Estimation in VLMs
 DepthVLM serves as **a unified foundation model for both low-level dense geometry prediction and high-level multimodal understanding**, while achieving substantially faster inference compared with existing VLM-based approaches such as DepthLM and Youtu-VL.
-By attaching a lightweight depth head to the LLM backbone and training under a unified vision-text supervision paradigm, DepthVLM transforms a single VLM into a native dense geometry predictor while preserving its multimodal capability.
 ### Key Characteristics

   <b>GitHub:</b> <a href="https://github.com/hanxunyu/DepthVLM">hanxunyu/DepthVLM</a> |
   <b>arXiv:</b> <a href="https://arxiv.org/abs/2605.15876">2605.15876</a>
   <br><br>
+  <a href="https://depthvlm.github.io/"><img src="https://img.shields.io/badge/Project-Home Page-green?logo=safari&logoColor=white" alt="Project Home Page"></a>
+  <a href="https://github.com/hanxunyu/DepthVLM"><img src="https://img.shields.io/badge/GitHub-Repository-blue?logo=github" alt="GitHub Badge"></a>
+  <a href="https://huggingface.co/datasets/JonnyYu828/DepthVLM-Bench"><img src="https://img.shields.io/badge/HuggingFace-Benchmark-yellow?logo=huggingface" alt="Hugging Face Benchmark"></a>
   <a href="https://arxiv.org/abs/2605.15876"><img src="https://img.shields.io/badge/arXiv-2605.15876-b31b1b.svg?logo=arxiv&logoColor=red" alt="arXiv"></a>
 </h4>
 ## 📰 News
+* **2026.05** — Released [DepthVLM-Bench](https://huggingface.co/datasets/JonnyYu828/DepthVLM-Bench).
+* **2026.05** — Released [DepthVLM-4B](https://huggingface.co/JonnyYu828/DepthVLM-4B).
 ---
 DepthVLM serves as **a unified foundation model for both low-level dense geometry prediction and high-level multimodal understanding**, while achieving substantially faster inference compared with existing VLM-based approaches such as DepthLM and Youtu-VL.
+By attaching a lightweight depth head to the LLM backbone and adopting a two-stage supervision paradigm, DepthVLM transforms a single VLM into a native dense geometry predictor, while preserving its multimodal capabilities and enhancing its spatial reasoning.
 ### Key Characteristics