Thermal Image Refinement with Depth Estimation using Recurrent Networks for Monocular ORB-SLAM3
Paper • 2603.14998 • Published
Part of the ANIMA Perception Suite by Robot Flow Labs.
Fine-tuned on robotflowlabs/anima-thermal-synthetic (26K COCO synthetic + 399 VIVID real pairs). 8-GPU DDP with frozen BatchNorm.
Trained on VIVID++ dataset (71,917 thermal/depth paired frames, 24 sequences).
| Model | Format | File | Size |
|---|---|---|---|
| v3_sol | PyTorch | pytorch/nott_v3_sol.pth |
8.2MB |
| v3_sol | SafeTensors | pytorch/nott_v3_sol.safetensors |
8.2MB |
| v3_sol | ONNX | onnx/nott_v3_sol.onnx |
8.2MB |
| v3_sol | TensorRT FP16 | tensorrt/nott_v3_sol_fp16.trt |
4.6MB |
| v3_sol | TensorRT FP32 | tensorrt/nott_v3_sol_fp32.trt |
9.2MB |
| v2 | PyTorch | pytorch/nott_v2.pth |
8.2MB |
| v2 | SafeTensors | pytorch/nott_v2.safetensors |
8.2MB |
| v2 | ONNX | onnx/nott_v2.onnx |
8.2MB |
| v2 | TensorRT FP16 | tensorrt/nott_v2_fp16.trt |
4.5MB |
| v2 | TensorRT FP32 | tensorrt/nott_v2_fp32.trt |
9.1MB |
T-RefNet — U-Net encoder-decoder with ConvGRU recurrent bottleneck (2M params, 8MB).
Thermal Image Refinement with Depth Estimation using Recurrent Networks for Monocular ORB-SLAM3 arXiv:2603.14998 — Şahin, Pham, Dang, Yegenoglu, Kayacan (2026)
VIVID++ (71K real thermal) → pretrain (lr=1e-3) → v2 (ARE=0.106)
↓
SOL synthetic (26K COCO + 399 real) → fine-tune (lr=1e-5, frozen BN, 8 GPU DDP) → v3_sol (ARE=0.174)
import torch
from anima_nott.thermal_refinement import ThermalRefinementNet
model = ThermalRefinementNet(in_channels=1, base_channels=32, num_levels=3, gru_layers=2)
state = torch.load("pytorch/nott_v3_sol.pth", weights_only=True)
model.load_state_dict(state)
model.eval()
thermal = torch.randn(1, 1, 256, 320) # normalized [0,1]
refined, hidden = model(thermal)
Low-cost thermal SLAM for GPS-denied, low-light UAV navigation. Non-radiometric thermal cameras (~$150 FLIR Lepton 3.5). Target: <0.4m trajectory error, 25+ FPS on Jetson Xavier.
Apache-2.0 — Robot Flow Labs / AIFLOW LABS LIMITED