pkqbajng
/

ViGeo

 ---
 license: cc-by-nc-4.0
+library_name: pytorch
+pipeline_tag: depth-estimation
+tags:
+- video-depth-estimation
+- geometry-estimation
+- camera-pose-estimation
+- surface-normal-estimation
+- visual-geometry
+- vigeo
 ---
+# ViGeo
+ViGeo estimates scene geometry from either video clips or single-frame inputs,
+including depth, 3D points, surface normals, confidence, and camera poses for
+sequences.
+The checkpoint in this repository is `vigeo.pt`.
+## Checkpoint Note
+This repository currently provides a preliminary ViGeo checkpoint. The current
+checkpoint was trained with a known issue in the loss implementation, which may
+cause minor visualization artifacts in camera poses and distant regions. This
+checkpoint is consistent with the results reported in the paper and can be used
+to obtain dense geometry estimation results.
+We are preparing an updated checkpoint with a sky mask head and will release it
+soon.
+## Installation
+```bash
+conda create -n vigeo python=3.10 -y
+conda activate vigeo
+pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu126
+git clone https://github.com/aigc3d/ViGeo.git
+cd ViGeo
+pip install -r requirements.txt
+pip install -e .
+```
+## Quick Start
+```python
+import torch
+from vigeo import ViGeo
+from utils import load_image_sequence
+device = torch.device("cuda")
+image_paths = ["path/to/imageA.png", "path/to/imageB.png", "path/to/imageC.png"]
+images = load_image_sequence(image_paths).to(device)  # [T, 3, H, W], RGB in [0, 1]
+model = ViGeo.from_pretrained("pkqbajng/ViGeo").to(device).eval()
+with torch.inference_mode():
+    output = model.infer(images, mode="offline")
+depth = output["depth_pred"]      # [T, 1, H, W]
+points = output["points_pred"]    # [T, H, W, 3]
+normals = output["normal_pred"]   # [T, H, W, 3], inward normals
+normals_out = -normals            # outward normals for visualization/evaluation
+poses = output["pose_pred"]       # [T, 3, 4], camera-to-world
+confidence = output["conf_pred"]  # [T, 1, H, W]
+```
+For batched input `[B, T, 3, H, W]`, tensor outputs keep the leading batch
+dimension.
+ViGeo uses a right-handed camera coordinate system with `(X, Y, Z) = (right,
+down, front)`. The raw `normal_pred` output follows the inward normal
+convention. Use `normals = -normal_pred` when outward normals are needed for
+visualization or evaluation.
+## Inference Modes
+ViGeo provides `offline`, `chunk`, and `online` inference modes. `offline`
+processes the full input sequence at once and is preferred when the complete
+video or image set is available. For long videos, use `chunk` or `online` mode
+with cached context.
+See the [ViGeo main branch README](https://github.com/aigc3d/ViGeo/tree/main#inference-modes)
+for examples of all inference modes.
+## Links
+- ViGeo project page: https://pkqbajng.github.io/ViGeo/
+- Paper: https://arxiv.org/abs/2605.30060
+- GitHub repository: https://github.com/aigc3d/ViGeo
+- Corrected paper PDF before the arXiv update is reflected: https://github.com/aigc3d/ViGeo/blob/main/assets/paper.pdf
+## License
+CC BY-NC 4.0.