Add ViGeo model card
Browse files
README.md
CHANGED
|
@@ -1,3 +1,99 @@
|
|
| 1 |
---
|
| 2 |
license: cc-by-nc-4.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: cc-by-nc-4.0
|
| 3 |
+
library_name: pytorch
|
| 4 |
+
pipeline_tag: depth-estimation
|
| 5 |
+
tags:
|
| 6 |
+
- video-depth-estimation
|
| 7 |
+
- geometry-estimation
|
| 8 |
+
- camera-pose-estimation
|
| 9 |
+
- surface-normal-estimation
|
| 10 |
+
- visual-geometry
|
| 11 |
+
- vigeo
|
| 12 |
---
|
| 13 |
+
|
| 14 |
+
# ViGeo
|
| 15 |
+
|
| 16 |
+
ViGeo estimates scene geometry from either video clips or single-frame inputs,
|
| 17 |
+
including depth, 3D points, surface normals, confidence, and camera poses for
|
| 18 |
+
sequences.
|
| 19 |
+
|
| 20 |
+
The checkpoint in this repository is `vigeo.pt`.
|
| 21 |
+
|
| 22 |
+
## Checkpoint Note
|
| 23 |
+
|
| 24 |
+
This repository currently provides a preliminary ViGeo checkpoint. The current
|
| 25 |
+
checkpoint was trained with a known issue in the loss implementation, which may
|
| 26 |
+
cause minor visualization artifacts in camera poses and distant regions. This
|
| 27 |
+
checkpoint is consistent with the results reported in the paper and can be used
|
| 28 |
+
to obtain dense geometry estimation results.
|
| 29 |
+
|
| 30 |
+
We are preparing an updated checkpoint with a sky mask head and will release it
|
| 31 |
+
soon.
|
| 32 |
+
|
| 33 |
+
## Installation
|
| 34 |
+
|
| 35 |
+
```bash
|
| 36 |
+
conda create -n vigeo python=3.10 -y
|
| 37 |
+
conda activate vigeo
|
| 38 |
+
|
| 39 |
+
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu126
|
| 40 |
+
|
| 41 |
+
git clone https://github.com/aigc3d/ViGeo.git
|
| 42 |
+
cd ViGeo
|
| 43 |
+
pip install -r requirements.txt
|
| 44 |
+
pip install -e .
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
+
## Quick Start
|
| 48 |
+
|
| 49 |
+
```python
|
| 50 |
+
import torch
|
| 51 |
+
|
| 52 |
+
from vigeo import ViGeo
|
| 53 |
+
from utils import load_image_sequence
|
| 54 |
+
|
| 55 |
+
device = torch.device("cuda")
|
| 56 |
+
image_paths = ["path/to/imageA.png", "path/to/imageB.png", "path/to/imageC.png"]
|
| 57 |
+
images = load_image_sequence(image_paths).to(device) # [T, 3, H, W], RGB in [0, 1]
|
| 58 |
+
|
| 59 |
+
model = ViGeo.from_pretrained("pkqbajng/ViGeo").to(device).eval()
|
| 60 |
+
|
| 61 |
+
with torch.inference_mode():
|
| 62 |
+
output = model.infer(images, mode="offline")
|
| 63 |
+
|
| 64 |
+
depth = output["depth_pred"] # [T, 1, H, W]
|
| 65 |
+
points = output["points_pred"] # [T, H, W, 3]
|
| 66 |
+
normals = output["normal_pred"] # [T, H, W, 3], inward normals
|
| 67 |
+
normals_out = -normals # outward normals for visualization/evaluation
|
| 68 |
+
poses = output["pose_pred"] # [T, 3, 4], camera-to-world
|
| 69 |
+
confidence = output["conf_pred"] # [T, 1, H, W]
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
For batched input `[B, T, 3, H, W]`, tensor outputs keep the leading batch
|
| 73 |
+
dimension.
|
| 74 |
+
|
| 75 |
+
ViGeo uses a right-handed camera coordinate system with `(X, Y, Z) = (right,
|
| 76 |
+
down, front)`. The raw `normal_pred` output follows the inward normal
|
| 77 |
+
convention. Use `normals = -normal_pred` when outward normals are needed for
|
| 78 |
+
visualization or evaluation.
|
| 79 |
+
|
| 80 |
+
## Inference Modes
|
| 81 |
+
|
| 82 |
+
ViGeo provides `offline`, `chunk`, and `online` inference modes. `offline`
|
| 83 |
+
processes the full input sequence at once and is preferred when the complete
|
| 84 |
+
video or image set is available. For long videos, use `chunk` or `online` mode
|
| 85 |
+
with cached context.
|
| 86 |
+
|
| 87 |
+
See the [ViGeo main branch README](https://github.com/aigc3d/ViGeo/tree/main#inference-modes)
|
| 88 |
+
for examples of all inference modes.
|
| 89 |
+
|
| 90 |
+
## Links
|
| 91 |
+
|
| 92 |
+
- ViGeo project page: https://pkqbajng.github.io/ViGeo/
|
| 93 |
+
- Paper: https://arxiv.org/abs/2605.30060
|
| 94 |
+
- GitHub repository: https://github.com/aigc3d/ViGeo
|
| 95 |
+
- Corrected paper PDF before the arXiv update is reflected: https://github.com/aigc3d/ViGeo/blob/main/assets/paper.pdf
|
| 96 |
+
|
| 97 |
+
## License
|
| 98 |
+
|
| 99 |
+
CC BY-NC 4.0.
|