VIS-FORESTSIM β ANIMA Off-Road Segmentation Module
Part of the ANIMA Perception Suite by Robot Flow Labs.
Paper
ForestSim: A Synthetic Benchmark for Intelligent Vehicle Perception in Unstructured Forest Environments arXiv: 2603.27923 Authors: Pragat Wagle, Zheng Chen, Lantao Liu
Architecture
DeepLabV3 with ResNet-50 backbone, trained on the ForestSim 24-class synthetic off-road segmentation dataset. 39.6M parameters.
Classes (24)
grass, tree, pole, water, sky, vehicle, container, asphalt, gravel, mulch, rockbed, log, bicycle, person, fence, bush, sign, rock, bridge, concrete, table, building, void, generic ground
Results
| Metric | Paper (m2) | Reproduced | Delta |
|---|---|---|---|
| mIoU | 61.87% | 61.42% | -0.45 |
| aAcc | 89.85% | 89.07% | -0.78 |
| mAcc | β | 70.15% | β |
Reproduced within 0.5 mIoU on single NVIDIA L4 (23GB) with batch size 24.
Exported Formats
| Format | File | Size | Use Case |
|---|---|---|---|
| PyTorch (.pth) | pytorch/vis_forestsim_v1.pth |
317MB | Training, fine-tuning |
| SafeTensors | pytorch/vis_forestsim_v1.safetensors |
159MB | Fast loading, safe |
| ONNX | onnx/vis_forestsim_v1.onnx |
159MB | Cross-platform inference |
| TensorRT FP16 | β | β | Generate on target hardware |
| TensorRT FP32 | β | β | Generate on target hardware |
CUDA Kernels
Three custom CUDA kernels built for this module (shared at anima-cuda-infra):
| Kernel | Speedup | Purpose |
|---|---|---|
seg_argmax_colorize |
2.6x | Fused argmax + palette colorize |
terrain_roughness |
<0.1ms | Gradient-based traversability scoring |
mask_morphology |
5ms | Erosion/dilation/majority vote cleanup |
Usage
import torch
import torchvision.models.segmentation as seg
# Load model
model = seg.deeplabv3_resnet50(num_classes=24)
ckpt = torch.load("pytorch/vis_forestsim_v1.pth", map_location="cpu")
model.load_state_dict(ckpt["model"])
model.eval()
# Inference
image = torch.randn(1, 3, 512, 512)
with torch.no_grad():
logits = model(image)["out"] # [1, 24, 512, 512]
mask = logits.argmax(1) # [1, 512, 512]
Training
- Hardware: NVIDIA L4 (23GB VRAM)
- Batch size: 24 (auto-detected, 68% VRAM)
- Optimizer: SGD (lr=0.01, momentum=0.9, weight_decay=5e-4)
- Scheduler: Warmup (500 steps) + Cosine decay
- Iterations: 38,500 (early stopped from 40,000)
- Precision: FP16 mixed precision
- Dataset: ForestSim 24-class (1884 train / 209 test)
- Config: See
configs/paper_deeplabv3_r50.toml
Docker
docker compose -f docker-compose.serve.yml --profile serve up -d
curl localhost:8080/health
ROS2
Subscribes to /camera/image_raw, publishes:
/vis_forestsim/segmentationβ colorized 24-class mask/vis_forestsim/traversabilityβ 6-class traversability mask
License
Apache 2.0 β Robot Flow Labs / AIFLOW LABS LIMITED