BALDUR β€” Dynamic Visual SLAM

Part of the ANIMA Perception Suite by Robot Flow Labs.

Model Description

BALDUR enhances monocular visual SLAM for dynamic environments (crowds, traffic, moving robots). It combines a feed-forward depth + dynamic mask prediction network with patch-based bundle adjustment.

  • Paper: Dynamic Visual SLAM using a General 3D Prior (Dec 2025, arXiv:2512.06868)
  • Architecture: ResNet-18 encoder β†’ dual U-Net decoders (depth + dynamic mask)
  • Parameters: 15.9M
  • Input: RGB image [B, 3, 480, 640]
  • Output: Depth map [B, 1, 480, 640] (metres) + Dynamic mask [B, 1, 480, 640] (probability)

Training

Setting Value
Dataset TUM-RGBD Dynamic (3,784 frames, 4 sequences)
Split 90/5/5 (train/val/test)
Best val_loss 0.2680 (epoch 183)
Optimizer AdamW (lr=0.0003, wd=0.05)
Scheduler CosineAnnealingWarmRestarts (T_0=20)
Regularization Dropout 0.3, heavy augmentation
Precision bf16 mixed precision
Hardware NVIDIA L4 (23GB)
Training time ~5.3 hours
Date 2026-04-02

Exported Formats

Format File Use Case
PyTorch (.pth) pytorch/baldur_v1.pth Training, fine-tuning
SafeTensors pytorch/baldur_v1.safetensors Fast loading, safe
ONNX onnx/baldur_v1.onnx Cross-platform inference
TensorRT FP32 tensorrt/baldur_v1_fp32.trt Full precision inference
TensorRT FP16 tensorrt/baldur_v1_fp16.trt Edge deployment (Jetson/L4)

Usage

import torch
from anima_baldur.models.feed_forward import FeedForwardNetwork

model = FeedForwardNetwork(pretrained=False, max_depth=10.0)
model.load_state_dict(torch.load("pytorch/baldur_v1.pth"))
model.eval()

# Inference
rgb = torch.randn(1, 3, 480, 640)  # ImageNet-normalized
depth, dynamic_mask = model(rgb)
# depth: metres, dynamic_mask: 0=static, 1=dynamic

Dynamic Mask

The dynamic mask identifies moving objects (people, vehicles) in the scene. Trained self-supervised: the model learns to mask regions where depth prediction is unreliable. Use dynamic_mask > 0.5 to filter dynamic regions from SLAM mapping.

License

Apache 2.0 β€” Robot Flow Labs / AIFLOW LABS LIMITED

Citation

@article{baldur2025,
  title={Dynamic Visual SLAM using a General 3D Prior},
  author={Zhong, Xingguang and Jin, Liren and Popovi\'c, Marija and Behley, Jens and Stachniss, Cyrill},
  year={2025},
  journal={arXiv:2512.06868}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for ilessio-aiflowlab/project_baldur