pkqbajng commited on
Commit
320bbfa
·
verified ·
1 Parent(s): 9f3b8eb

Add ViGeo model card

Browse files
Files changed (1) hide show
  1. README.md +96 -0
README.md CHANGED
@@ -1,3 +1,99 @@
1
  ---
2
  license: cc-by-nc-4.0
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-4.0
3
+ library_name: pytorch
4
+ pipeline_tag: depth-estimation
5
+ tags:
6
+ - video-depth-estimation
7
+ - geometry-estimation
8
+ - camera-pose-estimation
9
+ - surface-normal-estimation
10
+ - visual-geometry
11
+ - vigeo
12
  ---
13
+
14
+ # ViGeo
15
+
16
+ ViGeo estimates scene geometry from either video clips or single-frame inputs,
17
+ including depth, 3D points, surface normals, confidence, and camera poses for
18
+ sequences.
19
+
20
+ The checkpoint in this repository is `vigeo.pt`.
21
+
22
+ ## Checkpoint Note
23
+
24
+ This repository currently provides a preliminary ViGeo checkpoint. The current
25
+ checkpoint was trained with a known issue in the loss implementation, which may
26
+ cause minor visualization artifacts in camera poses and distant regions. This
27
+ checkpoint is consistent with the results reported in the paper and can be used
28
+ to obtain dense geometry estimation results.
29
+
30
+ We are preparing an updated checkpoint with a sky mask head and will release it
31
+ soon.
32
+
33
+ ## Installation
34
+
35
+ ```bash
36
+ conda create -n vigeo python=3.10 -y
37
+ conda activate vigeo
38
+
39
+ pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu126
40
+
41
+ git clone https://github.com/aigc3d/ViGeo.git
42
+ cd ViGeo
43
+ pip install -r requirements.txt
44
+ pip install -e .
45
+ ```
46
+
47
+ ## Quick Start
48
+
49
+ ```python
50
+ import torch
51
+
52
+ from vigeo import ViGeo
53
+ from utils import load_image_sequence
54
+
55
+ device = torch.device("cuda")
56
+ image_paths = ["path/to/imageA.png", "path/to/imageB.png", "path/to/imageC.png"]
57
+ images = load_image_sequence(image_paths).to(device) # [T, 3, H, W], RGB in [0, 1]
58
+
59
+ model = ViGeo.from_pretrained("pkqbajng/ViGeo").to(device).eval()
60
+
61
+ with torch.inference_mode():
62
+ output = model.infer(images, mode="offline")
63
+
64
+ depth = output["depth_pred"] # [T, 1, H, W]
65
+ points = output["points_pred"] # [T, H, W, 3]
66
+ normals = output["normal_pred"] # [T, H, W, 3], inward normals
67
+ normals_out = -normals # outward normals for visualization/evaluation
68
+ poses = output["pose_pred"] # [T, 3, 4], camera-to-world
69
+ confidence = output["conf_pred"] # [T, 1, H, W]
70
+ ```
71
+
72
+ For batched input `[B, T, 3, H, W]`, tensor outputs keep the leading batch
73
+ dimension.
74
+
75
+ ViGeo uses a right-handed camera coordinate system with `(X, Y, Z) = (right,
76
+ down, front)`. The raw `normal_pred` output follows the inward normal
77
+ convention. Use `normals = -normal_pred` when outward normals are needed for
78
+ visualization or evaluation.
79
+
80
+ ## Inference Modes
81
+
82
+ ViGeo provides `offline`, `chunk`, and `online` inference modes. `offline`
83
+ processes the full input sequence at once and is preferred when the complete
84
+ video or image set is available. For long videos, use `chunk` or `online` mode
85
+ with cached context.
86
+
87
+ See the [ViGeo main branch README](https://github.com/aigc3d/ViGeo/tree/main#inference-modes)
88
+ for examples of all inference modes.
89
+
90
+ ## Links
91
+
92
+ - ViGeo project page: https://pkqbajng.github.io/ViGeo/
93
+ - Paper: https://arxiv.org/abs/2605.30060
94
+ - GitHub repository: https://github.com/aigc3d/ViGeo
95
+ - Corrected paper PDF before the arXiv update is reflected: https://github.com/aigc3d/ViGeo/blob/main/assets/paper.pdf
96
+
97
+ ## License
98
+
99
+ CC BY-NC 4.0.