ResilientMap — Stage 1 (Pixel-Gate Fusion), nuScenes old-split

Checkpoints and training artifacts for Stage 1 of the ResilientMap project: camera + satellite BEV fusion via a confidence-weighted pixel gate, trained on the nuScenes old split (full data).

This is the BEV-feature pre-training stage. The vectorized HD map decoder (MMQuery, Stage 2/3) is trained on top of these BEVs and is released separately.

Files

File	Size	Notes
`iter_55932.pth`	897 MB	Final checkpoint (latest, recommended)
`iter_37288.pth`	897 MB	Intermediate checkpoint (4-th of 6 saves)
`satmaptracker_pixelgate_stage1_full.py`	23 KB	Full training/eval config
`20260417_174220.log`	862 KB	Training log (text)
`20260417_174220.log.json`	857 KB	Training log (JSON, per-iter metrics)
`tf_logs/`	—	TensorBoard event files

Earlier intermediate checkpoints (iter_9322, iter_18644, iter_27966, iter_46610) were not uploaded to keep the repo lean. The latest.pth symlink in the original work-dir pointed to iter_55932.pth.

Architecture (one paragraph)

A BEVFormer (ResNet-50 + FPN + TemporalSelfAttention) camera branch produces a (256, 50, 100) BEV. A separate ResNet-50 + FPN satellite encoder produces an ego-aligned satellite BEV at the same shape. Two frozen segmentation decoders output per-pixel confidences for each modality, and the BEVs are combined as

g(x,y)    = cam_conf(x,y) / (cam_conf(x,y) + sat_conf(x,y) + eps)
fused_bev = g * cam_bev + (1 - g) * sat_bev

This residual-free, confidence-weighted gate gives the fused stream a clean fall-back to whichever modality is more reliable per pixel.

Headline numbers (nuScenes new-split, 1/3 data, BEV mIoU; this checkpoint is the old-split / full-data counterpart, retraining of the same recipe)

Condition	Fused	Cam-only	Sat-only
Clean	0.512	0.452	0.403
6 cameras zeroed	0.381	0.013	0.403
Satellite zeroed	0.436	0.452	0.000

Per-camera importance (impact when a single camera is zeroed): FRONT -0.058 >> BACK -0.027 > FRONT_LEFT/RIGHT -0.018/-0.015 > BACK_LEFT/RIGHT -0.006.

Intended use

Research checkpoint for online HD map construction with multi-modal sensor redundancy. Not production-ready.

Citation

Citation entry will be added when the accompanying paper is released.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support