ViGeo
ViGeo estimates scene geometry from either video clips or single-frame inputs, including depth, 3D points, surface normals, confidence, and camera poses for sequences.
The checkpoint in this repository is vigeo.pt.
Checkpoint Note
This repository currently provides a preliminary ViGeo checkpoint. The current checkpoint was trained with a known issue in the loss implementation, which may cause minor visualization artifacts in camera poses and distant regions. This checkpoint is consistent with the results reported in the paper and can be used to obtain dense geometry estimation results.
We are preparing an updated checkpoint with a sky mask head and will release it soon.
Installation
conda create -n vigeo python=3.10 -y
conda activate vigeo
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu126
git clone https://github.com/aigc3d/ViGeo.git
cd ViGeo
pip install -r requirements.txt
pip install -e .
Quick Start
import torch
from vigeo import ViGeo
from utils import load_image_sequence
device = torch.device("cuda")
image_paths = ["path/to/imageA.png", "path/to/imageB.png", "path/to/imageC.png"]
images = load_image_sequence(image_paths).to(device) # [T, 3, H, W], RGB in [0, 1]
model = ViGeo.from_pretrained("pkqbajng/ViGeo").to(device).eval()
with torch.inference_mode():
output = model.infer(images, mode="offline")
depth = output["depth_pred"] # [T, 1, H, W]
points = output["points_pred"] # [T, H, W, 3]
normals = output["normal_pred"] # [T, H, W, 3], inward normals
normals_out = -normals # outward normals for visualization/evaluation
poses = output["pose_pred"] # [T, 3, 4], camera-to-world
confidence = output["conf_pred"] # [T, 1, H, W]
For batched input [B, T, 3, H, W], tensor outputs keep the leading batch
dimension.
ViGeo uses a right-handed camera coordinate system with (X, Y, Z) = (right, down, front). The raw normal_pred output follows the inward normal
convention. Use normals = -normal_pred when outward normals are needed for
visualization or evaluation.
Inference Modes
ViGeo provides offline, chunk, and online inference modes. offline
processes the full input sequence at once and is preferred when the complete
video or image set is available. For long videos, use chunk or online mode
with cached context.
See the ViGeo main branch README for examples of all inference modes.
Links
- ViGeo project page: https://pkqbajng.github.io/ViGeo/
- Paper: https://arxiv.org/abs/2605.30060
- GitHub repository: https://github.com/aigc3d/ViGeo
- Corrected paper PDF before the arXiv update is reflected: https://github.com/aigc3d/ViGeo/blob/main/assets/paper.pdf
License
CC BY-NC 4.0.