TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization

Paper: arXiv:2603.08096 Project Page: cwru-aism.github.io/triangulang Code: github.com/bryceag11/triangulang

Bryce Grant, Aryeh Rothenberg, Atri Banerjee, Peng Wang — Case Western Reserve University

Overview

TrianguLang is a feed-forward, pose-free method for language-guided 3D localization from multi-view images. Given unposed images and a text query, it produces per-view segmentation masks and camera-relative 3D locations at ~10 FPS.

Checkpoints

Checkpoint Description
mo_v11/best.pt Multi-object (text + spatial), 230 scenes, 8 views, 100 epochs
fullscale_no_qp/best.pt Single-object (text-only), 230 scenes, 100 epochs

Architecture

  • Frozen: SAM3 (841M) + DA3-NESTED-GIANT-LARGE (1.69B) = ~2.5B params
  • Trainable: GASA Decoder (~13.5M params)

Results (ScanNet++)

Setting mIoU mAcc
Text-only (single-object) 62.4% 77.4%
Text-only + CRF 65.2% -

Citation

@article{grant2026triangulang,
  title={TrianguLang: Geometry-Aware Semantic Consensus for Pose-Free 3D Localization},
  author={Grant, Bryce and Rothenberg, Aryeh and Banerjee, Atri and Wang, Peng},
  journal={arXiv preprint arXiv:2603.08096},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for bag100/triangulang