Towards Consistent Video Geometry Estimation
Paper • 2605.30060 • Published
VideoLDCM is the sparse-depth completion and refinement model used for ViGeo
data refinement. It takes an RGB image sequence and sparse depth maps, then
runs MoGe, Poisson completion, and VideoLDCM refinement through the
videoldcm.infer interface.
The checkpoint in this repository is videoldcm.pt.
conda create -n vigeo python=3.10 -y
conda activate vigeo
pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu126
pip install xformers==0.0.31 --index-url https://download.pytorch.org/whl/cu126
git clone https://github.com/aigc3d/ViGeo.git
cd ViGeo
pip install -r requirements.txt
pip install -r requirements_refine.txt
pip install -e .
import torch
from videoldcm import videoldcm
from utils import load_depth_sequence, load_image_sequence
image_paths = ["path/to/image_000.png", "path/to/image_001.png"]
sparse_depth_paths = ["path/to/sparse_depth_000.npy", "path/to/sparse_depth_001.npy"]
device = torch.device("cuda")
image = load_image_sequence(image_paths).to(device) # [S, 3, H, W]
sparse_depth = load_depth_sequence(sparse_depth_paths).to(device) # [S, 1, H, W]
completion_model = videoldcm.from_pretrained("pkqbajng/VideoLDCM").eval().to(device)
with torch.inference_mode():
output = completion_model.infer(image=image, sparse_depth=sparse_depth)
refined_depth = output["depth_pred"] # [S, 1, H, W]
points = output["points_pred"] # [S, H, W, 3]
confidence = output["conf_pred"] # [S, 1, H, W]
infer does not run the sparse-depth mismatch filter. For the explicit data
refinement pipeline with mismatch filtering and Poisson completion, see the
ViGeo main branch README.
Apache License 2.0.