BALDUR — Dynamic Visual SLAM

Part of the ANIMA Perception Suite by Robot Flow Labs.

Model Description

BALDUR enhances monocular visual SLAM for dynamic environments (crowds, traffic, moving robots). It combines a feed-forward depth + dynamic mask prediction network with patch-based bundle adjustment.

Paper: Dynamic Visual SLAM using a General 3D Prior (Dec 2025, arXiv:2512.06868)
Architecture: ResNet-18 encoder → dual U-Net decoders (depth + dynamic mask)
Parameters: 15.9M
Input: RGB image [B, 3, 480, 640]
Output: Depth map [B, 1, 480, 640] (metres) + Dynamic mask [B, 1, 480, 640] (probability)

Training

Setting	Value
Dataset	TUM-RGBD Dynamic (3,784 frames, 4 sequences)
Split	90/5/5 (train/val/test)
Best val_loss	0.2680 (epoch 183)
Optimizer	AdamW (lr=0.0003, wd=0.05)
Scheduler	CosineAnnealingWarmRestarts (T_0=20)
Regularization	Dropout 0.3, heavy augmentation
Precision	bf16 mixed precision
Hardware	NVIDIA L4 (23GB)
Training time	~5.3 hours
Date	2026-04-02

Exported Formats

Format	File	Use Case
PyTorch (.pth)	`pytorch/baldur_v1.pth`	Training, fine-tuning
SafeTensors	`pytorch/baldur_v1.safetensors`	Fast loading, safe
ONNX	`onnx/baldur_v1.onnx`	Cross-platform inference
TensorRT FP32	`tensorrt/baldur_v1_fp32.trt`	Full precision inference
TensorRT FP16	`tensorrt/baldur_v1_fp16.trt`	Edge deployment (Jetson/L4)

Usage

import torch
from anima_baldur.models.feed_forward import FeedForwardNetwork

model = FeedForwardNetwork(pretrained=False, max_depth=10.0)
model.load_state_dict(torch.load("pytorch/baldur_v1.pth"))
model.eval()

# Inference
rgb = torch.randn(1, 3, 480, 640)  # ImageNet-normalized
depth, dynamic_mask = model(rgb)
# depth: metres, dynamic_mask: 0=static, 1=dynamic

Dynamic Mask

The dynamic mask identifies moving objects (people, vehicles) in the scene. Trained self-supervised: the model learns to mask regions where depth prediction is unreliable. Use dynamic_mask > 0.5 to filter dynamic regions from SLAM mapping.

License

Apache 2.0 — Robot Flow Labs / AIFLOW LABS LIMITED

Citation

@article{baldur2025,
  title={Dynamic Visual SLAM using a General 3D Prior},
  author={Zhong, Xingguang and Jin, Liren and Popovi\'c, Marija and Behley, Jens and Stachniss, Cyrill},
  year={2025},
  journal={arXiv:2512.06868}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Depth Estimation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for ilessio-aiflowlab/project_baldur

Dynamic Visual SLAM using a General 3D Prior

Paper • 2512.06868 • Published Dec 7, 2025