DEF-rtfdnet β€” RTFDNet: Fusion-Decoupling for Robust RGB-T Segmentation

Part of the ANIMA Perception Suite by Robot Flow Labs.

Paper

RTFDNet: Fusion-Decoupling for Robust RGB-T Segmentation (ArXiv 2603.09149) Kunyu Tan, Mingjian Liang (2026)

Architecture

  • Backbone: BIMixVisionTransformer β€” dual-stream SegFormer (MiT-B2, 50.4M params)
  • Fusion: EAEF_clip (CLIP-style cross-modal alignment) + gated feature fusion at 4 stages
  • Losses: CE + Modal CE + AKD (feature distillation) + RegionL1 (logit distillation)
  • Input: 6-channel tensor (RGB + Thermal), flexible resolution
  • Robustness: Graceful degradation when one modality fails (RGB-only, thermal-only)

Results

Dataset Classes Best mIoU Accuracy Epochs
MFNet 9 0.929 97.9% 300
PST900 5 0.836 99.5% 89
FMB 14 0.684 93.1% 74+

Model Variants

MFNet (9-class urban RGB-T)

Format File Size
PyTorch mfnet_b2/pytorch/rtfdnet_mfnet_b2.pth 201.8 MB
SafeTensors mfnet_b2/pytorch/rtfdnet_mfnet_b2.safetensors 201.6 MB
ONNX mfnet_b2/onnx/rtfdnet_mfnet_b2.onnx 340.2 MB
TensorRT FP16 mfnet_b2/tensorrt/rtfdnet_mfnet_b2_fp16.trt 110.9 MB
TensorRT FP32 mfnet_b2/tensorrt/rtfdnet_mfnet_b2_fp32.trt 211.6 MB

PST900 (5-class indoor thermal)

Format File Size
PyTorch pst900_b2/pytorch/rtfdnet_pst900_b2.pth 201.8 MB
SafeTensors pst900_b2/pytorch/rtfdnet_pst900_b2.safetensors 201.6 MB
ONNX pst900_b2/onnx/rtfdnet_pst900_b2.onnx 319.3 MB
TensorRT FP16 pst900_b2/tensorrt/rtfdnet_pst900_b2_fp16.trt 110.0 MB
TensorRT FP32 pst900_b2/tensorrt/rtfdnet_pst900_b2_fp32.trt 210.4 MB

FMB (14-class multi-modal benchmark)

Format File Size
PyTorch fmb_b2/pytorch/rtfdnet_fmb_b2.pth 201.8 MB
SafeTensors fmb_b2/pytorch/rtfdnet_fmb_b2.safetensors 201.6 MB
ONNX fmb_b2/onnx/rtfdnet_fmb_b2.onnx 319.3 MB
TensorRT FP16 fmb_b2/tensorrt/rtfdnet_fmb_b2_fp16.trt 110.0 MB
TensorRT FP32 fmb_b2/tensorrt/rtfdnet_fmb_b2_fp32.trt 210.3 MB

Usage

from def_rtfdnet.model import build_rtfdnet, load_pretrained_mit
import torch

# Build model
model = build_rtfdnet(variant='mit_b2', num_classes=9, channels=256)

# Load trained weights
ckpt = torch.load('mfnet_b2/checkpoints/best.pth', map_location='cpu')
model.load_state_dict(ckpt['model'])
model.eval()

# Inference: 6-channel input (RGB + Thermal)
x = torch.randn(1, 6, 480, 640)
logits = model.forward_inference(x)  # (1, 9, 480, 640)
pred = logits.argmax(dim=1)          # (1, 480, 640)

Training

  • Hardware: NVIDIA L4 (23GB VRAM)
  • Optimizer: AdamW (lr=3e-5, head_lr_mult=10x)
  • Scheduler: Warmup cosine (5% warmup)
  • Batch size: 4 with gradient checkpointing
  • Precision: FP16 (AMP)
  • Config: See configs/ directory

Defense Applications

RGB-Thermal fusion for: nighttime surveillance, through-smoke perception, adverse-weather operations. RTFDNet's robustness to modality degradation ensures graceful failure when one sensor is jammed or obscured.

Citation

@article{tan2026rtfdnet,
  title={RTFDNet: Fusion-Decoupling for Robust RGB-T Segmentation},
  author={Tan, Kunyu and Liang, Mingjian},
  journal={arXiv preprint arXiv:2603.09149},
  year={2026}
}

License

Apache 2.0 β€” Robot Flow Labs / AIFLOW LABS LIMITED

Built with ANIMA by Robot Flow Labs

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for ilessio-aiflowlab/DEF-rtfdnet