Depth Estimation
Transformers
Safetensors
English
qwen3_vl
image-text-to-text
vision-language-model
3d-vision
multimodal
Instructions to use JonnyYu828/DepthVLM-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use JonnyYu828/DepthVLM-4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("depth-estimation", model="JonnyYu828/DepthVLM-4B")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("JonnyYu828/DepthVLM-4B") model = AutoModelForImageTextToText.from_pretrained("JonnyYu828/DepthVLM-4B") - Notebooks
- Google Colab
- Kaggle
Update README.md
Browse files
README.md
CHANGED
|
@@ -49,7 +49,7 @@ DepthVLM serves as **a unified foundation model for both low-level dense geometr
|
|
| 49 |
|
| 50 |
By attaching a lightweight depth head to the LLM backbone and adopting a two-stage supervision paradigm, DepthVLM transforms a single VLM into a native dense geometry predictor, while preserving its multimodal capabilities and enhancing its spatial reasoning.
|
| 51 |
|
| 52 |
-
##
|
| 53 |
|
| 54 |
- **Native dense metric depth estimation in VLMs**: Directly predicts geometry within the VLM framework.
|
| 55 |
|
|
|
|
| 49 |
|
| 50 |
By attaching a lightweight depth head to the LLM backbone and adopting a two-stage supervision paradigm, DepthVLM transforms a single VLM into a native dense geometry predictor, while preserving its multimodal capabilities and enhancing its spatial reasoning.
|
| 51 |
|
| 52 |
+
## 🧠 Key Characteristics
|
| 53 |
|
| 54 |
- **Native dense metric depth estimation in VLMs**: Directly predicts geometry within the VLM framework.
|
| 55 |
|