MapVGGT โ€” Map-grounded feed-forward 3DGS on a VGGT-Omega backbone (PRIVATE)

PRIVATE research artifact. Non-commercial, research-only. See License below before any use.

MapVGGT is a feed-forward novel-view-synthesis model for driving scenes: VGGT-Omega (1B) predicts per-pixel metric depth; each input pixel is lifted to a world-space 3D Gaussian (positions from depth + known camera poses); a per-pixel head predicts opacity/scale/rotation; the union is rendered with gsplat; a small 2D UNet refines the rendered image. MapGS components (HD-mapโ€“anchored tokens, scene-graph dynamics, map-depth / free-space losses) are included but โ€” see results โ€” found neutral.

Honest results (held-out-SCENE, segment-disjoint Waymo, 40 distinct scenes, 256ร—448, n_in=8)

model PSNR SSIM notes
VGGT-Omega + gentle finetune (backbone) 21.7 0.66 abl_base_best
+ MAGT map tokens + scene-graph dynamics 21.7 0.66 abl_full_best โ€” neutral (ablation)
+ UNet render-refine 22.67 0.689 mapvggt_refine_best โ€” headline

Be candid about scope. This is a research/system artifact, not SOTA: 22.7 dB is **5 dB below** published feed-forward driving NVS (DGGT 27.4, PointForward 28.5, on different protocols). Established by clean ablation: the entire gain over a generic backbone is VGGT-Omega + gentle backbone finetuning; the single extra lever that moved the metric is the UNet refine (+0.85 dB). HD-map tokens, scene-graph dynamics, higher resolution, multi-view color fusion, uncertainty-shaped covariance, and a skybox were all measured neutral on this metric (the image-space UNet subsumes them). The binding constraint is data scale (1/3 Waymo, ~1157 clips; overfits ~step 1000). Per-clip PSNR anti-correlates with view-extrapolation distance (r=-0.57): the model is strong on slow/overlapping scenes, weak on fast ego-motion / disocclusion.

Contents

  • mapvggt/ โ€” model (model.py), heads (heads.py: MAGT map tokens, scene-graph dynamics), refine.py (RefineUNet). crosscolor.py / uncertainty.py are experimental, validated negative (kept for the record; not used in training).
  • mapgs/ โ€” data pipeline (unified clip format, Waymo/AV2 converters), HD-map, losses, metrics.
  • scripts/ โ€” train_mapvggt_refine.py (main trainer), train_mapvggt_full.py (map+dyn), eval_mapvggt.py (canonical loader + held-out eval), data-restore utilities.
  • checkpoints/ โ€” mapvggt_refine_best.safetensors (headline 22.67), abl_base_best, abl_full_best. Each ~4.6 GB and embeds the finetuned VGGT-Omega 1B backbone (keys model.vggt.*, model.head.*, unet.* for the refine ckpt).

NOT included (by design)

  • Base VGGT-Omega weights (vggt_omega_1b_512.pt) โ€” obtain from its FAIR-licensed source; set MAPVGGT_VGGT_CKPT. (Our refine ckpt already contains a finetuned copy of these weights.)
  • Training data โ€” Waymo Open clips (its license forbids redistribution) and AV2 clips (regenerate with mapgs/data/convert/* from your own licensed copies).
  • Vendored clones (_vggt_omega_repo, _tokengs_repo); clone yourself and set VGGT_OMEGA_REPO.

Usage

export VGGT_OMEGA_REPO=/path/to/vggt-omega           # facebookresearch/vggt-omega clone
export MAPVGGT_VGGT_CKPT=/path/to/vggt_omega_1b_512.pt  # base weights (FAIR-licensed)
# eval the released checkpoint on a segment-disjoint Waymo val split:
python -m scripts.eval_mapvggt --ckpt checkpoints/mapvggt_refine_best.safetensors \
       --roots /path/to/data/unified/waymo

The refine checkpoint round-trips to 22.67ยฑ3.76 / 0.689 via scripts/eval_mapvggt.py.

License & provenance (read before use)

  • Derivative of VGGT-Omega (Meta FAIR), under the FAIR Noncommercial Research License. The checkpoints contain finetuned VGGT-Omega weights โ†’ they inherit FAIR terms: non-commercial, research-only; do not redistribute. This repo is PRIVATE for that reason. โš ๏ธ Commercial use (incl. by a commercial org) is not permitted under FAIR terms.
  • Lineage: TokenGS (NVIDIA, research-only) โ€” earlier backbone, code under Apache-2.0; Depth-Anything-V2 (Apache-2.0); PointForward (scene-graph dynamics formulation).
  • Training data: Waymo Open Dataset (subject to Waymo terms, no redistribution) and Argoverse 2 (CC BY-NC-SA 4.0). MapGS code is the authors' own.

Reproducibility note: gsplat + bf16 make runs reproducible at the seed/config level, not bit-exact.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support