MapVGGT โ Map-grounded feed-forward 3DGS on a VGGT-Omega backbone (PRIVATE)
PRIVATE research artifact. Non-commercial, research-only. See License below before any use.
MapVGGT is a feed-forward novel-view-synthesis model for driving scenes:
VGGT-Omega (1B) predicts per-pixel metric depth; each input pixel is lifted to a
world-space 3D Gaussian (positions from depth + known camera poses); a per-pixel head
predicts opacity/scale/rotation; the union is rendered with gsplat; a small 2D UNet
refines the rendered image. MapGS components (HD-mapโanchored tokens, scene-graph
dynamics, map-depth / free-space losses) are included but โ see results โ found neutral.
Honest results (held-out-SCENE, segment-disjoint Waymo, 40 distinct scenes, 256ร448, n_in=8)
| model | PSNR | SSIM | notes |
|---|---|---|---|
| VGGT-Omega + gentle finetune (backbone) | 21.7 | 0.66 | abl_base_best |
| + MAGT map tokens + scene-graph dynamics | 21.7 | 0.66 | abl_full_best โ neutral (ablation) |
| + UNet render-refine | 22.67 | 0.689 | mapvggt_refine_best โ headline |
Be candid about scope. This is a research/system artifact, not SOTA: 22.7 dB is
**5 dB below** published feed-forward driving NVS (DGGT 27.4, PointForward 28.5, on
different protocols). Established by clean ablation: the entire gain over a generic
backbone is VGGT-Omega + gentle backbone finetuning; the single extra lever that
moved the metric is the UNet refine (+0.85 dB). HD-map tokens, scene-graph dynamics,
higher resolution, multi-view color fusion, uncertainty-shaped covariance, and a skybox
were all measured neutral on this metric (the image-space UNet subsumes them). The
binding constraint is data scale (1/3 Waymo, ~1157 clips; overfits ~step 1000). Per-clip
PSNR anti-correlates with view-extrapolation distance (r=-0.57): the model is strong on
slow/overlapping scenes, weak on fast ego-motion / disocclusion.
Contents
mapvggt/โ model (model.py), heads (heads.py: MAGT map tokens, scene-graph dynamics),refine.py(RefineUNet).crosscolor.py/uncertainty.pyare experimental, validated negative (kept for the record; not used in training).mapgs/โ data pipeline (unified clip format, Waymo/AV2 converters), HD-map, losses, metrics.scripts/โtrain_mapvggt_refine.py(main trainer),train_mapvggt_full.py(map+dyn),eval_mapvggt.py(canonical loader + held-out eval), data-restore utilities.checkpoints/โmapvggt_refine_best.safetensors(headline 22.67),abl_base_best,abl_full_best. Each ~4.6 GB and embeds the finetuned VGGT-Omega 1B backbone (keysmodel.vggt.*,model.head.*,unet.*for the refine ckpt).
NOT included (by design)
- Base VGGT-Omega weights (
vggt_omega_1b_512.pt) โ obtain from its FAIR-licensed source; setMAPVGGT_VGGT_CKPT. (Our refine ckpt already contains a finetuned copy of these weights.) - Training data โ Waymo Open clips (its license forbids redistribution) and AV2 clips
(regenerate with
mapgs/data/convert/*from your own licensed copies). - Vendored clones (
_vggt_omega_repo,_tokengs_repo); clone yourself and setVGGT_OMEGA_REPO.
Usage
export VGGT_OMEGA_REPO=/path/to/vggt-omega # facebookresearch/vggt-omega clone
export MAPVGGT_VGGT_CKPT=/path/to/vggt_omega_1b_512.pt # base weights (FAIR-licensed)
# eval the released checkpoint on a segment-disjoint Waymo val split:
python -m scripts.eval_mapvggt --ckpt checkpoints/mapvggt_refine_best.safetensors \
--roots /path/to/data/unified/waymo
The refine checkpoint round-trips to 22.67ยฑ3.76 / 0.689 via scripts/eval_mapvggt.py.
License & provenance (read before use)
- Derivative of VGGT-Omega (Meta FAIR), under the FAIR Noncommercial Research License. The checkpoints contain finetuned VGGT-Omega weights โ they inherit FAIR terms: non-commercial, research-only; do not redistribute. This repo is PRIVATE for that reason. โ ๏ธ Commercial use (incl. by a commercial org) is not permitted under FAIR terms.
- Lineage: TokenGS (NVIDIA, research-only) โ earlier backbone, code under Apache-2.0; Depth-Anything-V2 (Apache-2.0); PointForward (scene-graph dynamics formulation).
- Training data: Waymo Open Dataset (subject to Waymo terms, no redistribution) and Argoverse 2 (CC BY-NC-SA 4.0). MapGS code is the authors' own.
Reproducibility note: gsplat + bf16 make runs reproducible at the seed/config level, not bit-exact.