ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association
Paper β’ 2509.01584 β’ Published β’ 8
Project THOR is ANIMA Wave-6's Tier-1 Foundation SLAM module, implementing the Symmetric Two-view Association (STA) frontend from the ViSTA-SLAM paper.
| Property | Value |
|---|---|
| Input | Two RGB frames β (B, 3, 224, 224) each |
| Output | Quaternion (B,4), Translation (B,3), Pointmap (B,H,W,3) |
| Parameters | ~35% fewer than SOTA SLAM frontends |
| Intrinsics | None required β intrinsic-free design |
| Checkpoint epoch | 198 |
| Best val loss | 0.782216 |
The STA model uses a symmetric encoder that processes two consecutive RGB frames through shared weights, producing:
A Sim(3) pose graph backend handles global consistency and scale-drift correction.
import torch
from anima_thor.models.sta_model import STAConfig, STAModel
# Load from this repository
config = STAConfig()
model = STAModel(config)
ckpt = torch.load("pytorch/thor_sta_v1.pth", map_location="cpu", weights_only=False)
model.load_state_dict(ckpt["model"])
model.eval()
# Inference
img_a = torch.randn(1, 3, 224, 224) # current frame
img_b = torch.randn(1, 3, 224, 224) # previous frame
with torch.no_grad():
output = model(img_a, img_b)
print(output.quaternion.shape) # (1, 4)
print(output.translation.shape) # (1, 3)
print(output.pointmap.shape) # (1, H, W, 3)
import onnxruntime as ort
import numpy as np
sess = ort.InferenceSession(
"onnx/thor_sta_v1.onnx",
providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
)
img_a = np.random.randn(1, 3, 224, 224).astype(np.float32)
img_b = np.random.randn(1, 3, 224, 224).astype(np.float32)
quaternion, translation, pointmap = sess.run(
None, {"img_a": img_a, "img_b": img_b}
)
| Module | Dependency | Topic |
|---|---|---|
| BALDUR | Semantic mapping | Pointmap β voxel grid |
| HEIMDALL | Hierarchical planning | Pose stream @ 30 Hz |
| HERMOD | Exploration | Coverage map |
README.md # This file
pytorch/thor_sta_v1.pth # PyTorch state dict
pytorch/thor_sta_v1.safetensors # SafeTensors (if exported)
onnx/thor_sta_v1.onnx # ONNX opset 17
tensorrt/thor_sta_v1_fp16.trt # TensorRT FP16 (if exported)
tensorrt/thor_sta_v1_fp32.trt # TensorRT FP32 (if exported)
configs/training.toml # Training configuration
logs/training_history.json # Epoch-by-epoch metrics
@article{zhang2025vistaslam,
title = {ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association},
author = {Zhang, Ganlin and Qian, Shenhan and Wang, Xi and Cremers, Daniel},
journal = {arXiv preprint arXiv:2509.01584},
year = {2025},
}
MIT License β see LICENSE.