nielsr HF Staff commited on
Commit
2da79fe
·
verified ·
1 Parent(s): fad61d2

Improve model card: Add pipeline tag, paper link, project page, and sample usage

Browse files

This PR significantly enhances the model card for the DVGT model by:
- Adding the `pipeline_tag: image-to-3d` to the metadata, improving discoverability on the Hugging Face Hub.
- Including direct links to the paper and the official project page.
- Integrating a comprehensive "Quick Start" section with both installation instructions and a runnable Python code snippet, directly sourced from the GitHub repository.
- Incorporating additional descriptive content like "Overview", "Experimental Results", "Acknowledgements", and "Citation" for richer documentation.
- Embedding key visuals (GIFs and images) from the project's GitHub repository to showcase the model's capabilities.

Please review and merge if these enhancements align with the model's documentation needs.

Files changed (1) hide show
  1. README.md +95 -5
README.md CHANGED
@@ -1,12 +1,102 @@
1
  ---
2
  license: apache-2.0
 
3
  ---
 
4
  <div align="center">
5
  <h1>DVGT: Driving Visual Geometry Transformer</h1>
6
  </div>
7
- **DVGT**, a universal visual geometry transformer for autonomous driving, directly predicts metric-scaled global 3D point maps from a sequence of unposed multi-view images, eliminating the need for post-alignment with external data.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
 
9
- ### 🚀 Model Usage
10
- This repository hosts the **pre-trained weights (checkpoints)** for the DVGT model.
11
- For source code, installation guides, and detailed documentation, please visit our GitHub repository:
12
- 👉 **[GitHub: wzzheng/DVGT](https://github.com/wzzheng/DVGT/blob/main)**
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ pipeline_tag: image-to-3d
4
  ---
5
+
6
  <div align="center">
7
  <h1>DVGT: Driving Visual Geometry Transformer</h1>
8
  </div>
9
+ **DVGT**, a universal visual geometry transformer for autonomous driving, directly predicts metric-scaled global 3D point maps from a sequence of unposed multi-view images, eliminating the need for post-alignment with external data. This model offers a robust solution that adapts seamlessly to diverse vehicles and camera configurations by leveraging spatial-temporal attention to process unposed image sequences directly.
10
+
11
+ <p align="center">
12
+ <img src="https://huggingface.co/RainyNight/DVGT/resolve/main/assets/demo.gif" width="100%">
13
+ </p>
14
+
15
+ [\ud83d\udcda Paper](https://huggingface.co/papers/2512.16919) | [\ud83c\udf10 Project Page](https://wzzheng.net/DVGT) | [\ud83d\udcbb Code](https://github.com/wzzheng/DVGT)
16
+
17
+ ## Overview
18
+
19
+ DVGT proposes a universal framework for driving geometry perception. Unlike conventional driving models that are tightly coupled to specific sensor setups or require ground-truth poses, our model leverages spatial-temporal attention to process unposed image sequences directly. By decoding global geometry in the ego-coordinate system, DVGT achieves metric-scaled dense reconstruction without LiDAR alignment, offering a robust solution that adapts seamlessly to diverse vehicles and camera configurations.
20
+
21
+ <p align="center">
22
+ <img src="https://huggingface.co/RainyNight/DVGT/resolve/main/assets/teaser.png" width="100%">
23
+ </p>
24
+
25
+ ## Experimental Results
26
+ DVGT significantly outperforms existing models on various scenarios. As shown below, our method (red) demonstrates superior accuracy ($\delta < 1.25$ for ray depth estimation) on 3D scene reconstruction across all evaluated datasets.
27
+
28
+ <p align="center">
29
+ <img src="https://huggingface.co/RainyNight/DVGT/resolve/main/assets/experiments.jpg" alt="Radar Chart Performance" width="45%">
30
+ </p>
31
+
32
+ ## Quick Start
33
+
34
+ Firstly, clone this repository to your local machine, and install the dependencies (torch, torchvision, numpy, Pillow, and huggingface_hub).
35
+ We tested the code with CUDA 12.8, python3.11 and torch 2.8.0.
36
+
37
+ ```bash
38
+ git clone https://github.com/wzzheng/DVGT.git
39
+ cd dvgt
40
+
41
+ conda create -n dvgt python=3.11
42
+ conda activate dvgt
43
+
44
+ pip install -r requirements.txt
45
+ ```
46
+
47
+ Secondly, download the pretrained [checkpoint](https://huggingface.co/RainyNight/DVGT) and save it to the `./ckpt` directory.
48
+
49
+ Now, try the model with just a few lines of code:
50
+
51
+ ```python
52
+ import torch
53
+ from dvgt.models.dvgt import DVGT
54
+ from dvgt.utils.load_fn import load_and_preprocess_images
55
+ from iopath.common.file_io import g_pathmgr
56
+
57
+ checkpoint_path = 'path to your checkpoint'
58
+
59
+ device = "cuda" if torch.cuda.is_available() else "cpu"
60
+ # bfloat16 is supported on Ampere GPUs (Compute Capability 8.0+)
61
+ dtype = torch.bfloat16 if torch.cuda.get_device_capability()[0] >= 8 else torch.float16
62
+
63
+ # Initialize the model and load the pretrained weights.
64
+ model = DVGT()
65
+ with g_pathmgr.open(checkpoint_path, "rb") as f:
66
+ checkpoint = torch.load(f, map_location="cpu")
67
+ model.load_state_dict(checkpoint)
68
+ model = model.to(device).eval()
69
+
70
+ # Load and preprocess example images (replace with your own image paths)
71
+ image_dir = 'examples/openscene_log-0104-scene-0007'
72
+ images = load_and_preprocess_images(image_dir, start_frame=16, end_frame=24).to(device)
73
+
74
+ with torch.no_grad():
75
+ with torch.amp.autocast(device, dtype=dtype):
76
+ # Predict attributes including cameras, depth maps, and point maps.
77
+ predictions = model(images)
78
+ ```
79
+
80
+ ## Acknowledgements
81
+ Our code is based on the following brilliant repositories:
82
+
83
+ [Moge-2](https://github.com/microsoft/MoGe)
84
+ [CUT3R](https://github.com/CUT3R/CUT3R)
85
+ [Driv3R](https://github.com/Barrybarry-Smith/Driv3R)
86
+ [VGGT](https://github.com/facebookresearch/vggt)
87
+ [MapAnything](https://github.com/facebookresearch/map-anything)
88
+ [Pi3](https://github.com/yyfz/Pi3)
89
+
90
+ Many thanks to these authors!
91
+
92
+ ## Citation
93
 
94
+ If you find this project helpful, please consider citing the following paper:
95
+ ```
96
+ @article{zuo2025dvgt,
97
+ title={DVGT: Driving Visual Geometry Transformer},
98
+ author={Zuo, Sicheng and Xie, Zixun and Zheng, Wenzhao and Xu, Shaoqing and Li, Fang and Jiang, Shengyin and Chen, Long and Yang, Zhi-Xin and Lu, Jiwen},
99
+ journal={arXiv preprint arXiv:2512.16919},
100
+ year={2025}
101
+ }
102
+ ```