--- license: mit language: - en tags: - remote-sensing - semantic-segmentation - mamba - state-space-model - vmamba - mambavision - spatial-mamba - pytorch - benchmark - loveda - isprs-potsdam - domain-adaptation datasets: - LoveDA - ISPRS-Potsdam pipeline_tag: image-segmentation --- # Mamba-Segmentation **Controlled Visual State-Space Backbone Benchmark with Domain-Shift & Boundary Analysis for Remote-Sensing Segmentation** > *Accepted at IGARSS 2026* One pipeline. One decoder. One loss. One schedule. **Five backbone families.** The only variable is the encoder — so the results finally mean something. --- ## What Is This? Remote-sensing segmentation papers routinely change the backbone *and* the decoder *and* the loss *and* the training schedule all at once. The numbers tell you who tuned harder, not which backbone is better. This repo fixes that. **One shared pipeline — swap the backbone — read the truth.** | Component | Status | |---|---| | Encoder backbone | 🔀 **Swapped** per experiment — the ONLY variable | | Decoder | 🔒 Fixed (lightweight U-Net, 256ch, MambaBlock2d) | | Loss | 🔒 Fixed (Lovász-Softmax + Focal + Boundary) | | Training schedule | 🔒 Fixed (50k iters, AdamW, poly LR decay) | | Augmentations | 🔒 Fixed (random crop, flip, colour jitter) | | Input resolution | 🔒 Fixed (512×512) | | Feature interface | 🔒 Fixed ({F1–F4} at strides {4, 8, 16, 32}) | --- ## Checkpoints in This Repository All checkpoints are `best.pth` files (highest validation mIoU during training) stored with their original directory structure. ### LoveDA Experiments — `Comparison_Experiments/` #### MambaVision (NVIDIA hybrid Mamba-Transformer) | Checkpoint path | Training split | |---|---| | `Comparison_Experiments/mambavision_tiny_512/checkpoints/best.pth` | All→All | | `Comparison_Experiments/mambavision_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | | `Comparison_Experiments/mambavision_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural | | `Comparison_Experiments/mambavision_tiny2_512/checkpoints/best.pth` | All→All (v2) | | `Comparison_Experiments/mambavision_tiny2_ruraltrain_512/checkpoints/best.pth` | Rural→Urban (v2) | | `Comparison_Experiments/mambavision_tiny2_urbantrain_512/checkpoints/best.pth` | Urban→Rural (v2) | | `Comparison_Experiments/mambavision_small_512/checkpoints/best.pth` | All→All | | `Comparison_Experiments/mambavision_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | | `Comparison_Experiments/mambavision_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural | | `Comparison_Experiments/mambavision_base_512/checkpoints/best.pth` | All→All | | `Comparison_Experiments/mambavision_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | | `Comparison_Experiments/mambavision_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural | | `Comparison_Experiments/mambavision_large_512/checkpoints/best.pth` | All→All | | `Comparison_Experiments/mambavision_large_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | | `Comparison_Experiments/mambavision_large_urbantrain_512/checkpoints/best.pth` | Urban→Rural | | `Comparison_Experiments/mambavision_large2_512/checkpoints/best.pth` | All→All | | `Comparison_Experiments/mambavision_large2_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | | `Comparison_Experiments/mambavision_large2_urbantrain_512/checkpoints/best.pth` | Urban→Rural | #### VMamba (cross-scan 2D selective SSM) | Checkpoint path | Training split | |---|---| | `Comparison_Experiments/Vmamb_tiny_512/checkpoints/best.pth` | All→All | | `Comparison_Experiments/vmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | | `Comparison_Experiments/vmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural | | `Comparison_Experiments/Vmamb_small_512/checkpoints/best.pth` | All→All | | `Comparison_Experiments/Vmamb_small_512_2/checkpoints/best.pth` | All→All (run 2) | | `Comparison_Experiments/Vmamb_small_512_3/checkpoints/best.pth` | All→All (run 3) | | `Comparison_Experiments/vmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | | `Comparison_Experiments/vmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural | | `Comparison_Experiments/Vmamb_base_512/checkpoints/best.pth` | All→All | | `Comparison_Experiments/vmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | | `Comparison_Experiments/vmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural | #### VisionMamba / Vim (bidirectional Mamba) | Checkpoint path | Training split | |---|---| | `Comparison_Experiments/VisionMamba_tiny_512/checkpoints/best.pth` | All→All | | `Comparison_Experiments/visionmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | | `Comparison_Experiments/visionmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural | | `Comparison_Experiments/VisionMamba_small_512/checkpoints/best.pth` | All→All | | `Comparison_Experiments/visionmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | | `Comparison_Experiments/visionmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural | | `Comparison_Experiments/VisionMamba_base_512/checkpoints/best.pth` | All→All | | `Comparison_Experiments/visionmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | | `Comparison_Experiments/visionmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural | #### Spatial-Mamba (spatially-aware SSM) | Checkpoint path | Training split | |---|---| | `Comparison_Experiments/spatialmamba_tiny_512/checkpoints/best.pth` | All→All | | `Comparison_Experiments/spatialmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | | `Comparison_Experiments/spatialmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural | | `Comparison_Experiments/spatialmamba_small_512/checkpoints/best.pth` | All→All | | `Comparison_Experiments/spatialmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | | `Comparison_Experiments/spatialmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural | | `Comparison_Experiments/spatialmamba_base_512/checkpoints/best.pth` | All→All | | `Comparison_Experiments/spatialmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | | `Comparison_Experiments/spatialmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural | #### CNN & Transformer Baselines | Checkpoint path | Model | |---|---| | `Comparison_Experiments/cnn_deeplabv3p_r50_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, All→All | | `Comparison_Experiments/cnn_deeplabv3p_resnet50_ruraltrain_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, Rural→Urban | | `Comparison_Experiments/cnn_deeplabv3p_resnet50_urbantrain_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, Urban→Rural | | `Comparison_Experiments/cnn_unet_r50_512/checkpoints/best.pth` | U-Net ResNet-50, All→All | | `Comparison_Experiments/transformer_unetformer_r18_512/checkpoints/best.pth` | UNetFormer ResNet-18, All→All | | `Comparison_Experiments/transformerunetformer_resnet18_ruraltrain_512/checkpoints/best.pth` | UNetFormer ResNet-18, Rural→Urban | | `Comparison_Experiments/transformerunetformer_resnet18_urbantrain_512/checkpoints/best.pth` | UNetFormer ResNet-18, Urban→Rural | --- ### ISPRS Potsdam Experiments — `Comparison_Experiments_ICPRS_potsdam/` | Checkpoint path | Model | |---|---| | `Comparison_Experiments_ICPRS_potsdam/mambavision_tiny_512/checkpoints/best.pth` | MambaVision-Tiny | | `Comparison_Experiments_ICPRS_potsdam/mambavision_tiny2_512/checkpoints/best.pth` | MambaVision-Tiny2 | | `Comparison_Experiments_ICPRS_potsdam/mambavision_small_512/checkpoints/best.pth` | MambaVision-Small | | `Comparison_Experiments_ICPRS_potsdam/mambavision_base_512/checkpoints/best.pth` | MambaVision-Base | | `Comparison_Experiments_ICPRS_potsdam/mambavision_large_512/checkpoints/best.pth` | MambaVision-Large | | `Comparison_Experiments_ICPRS_potsdam/mambavision_large2_512/checkpoints/best.pth` | MambaVision-Large2 | | `Comparison_Experiments_ICPRS_potsdam/vmamba_tiny_512/checkpoints/best.pth` | VMamba-Tiny | | `Comparison_Experiments_ICPRS_potsdam/vmamba_small_512/checkpoints/best.pth` | VMamba-Small | | `Comparison_Experiments_ICPRS_potsdam/vmamba_base_512/checkpoints/best.pth` | VMamba-Base | | `Comparison_Experiments_ICPRS_potsdam/spatialmamba_tiny_512/checkpoints/best.pth` | Spatial-Mamba-Tiny | | `Comparison_Experiments_ICPRS_potsdam/spatialmamba_small_512/checkpoints/best.pth` | Spatial-Mamba-Small | | `Comparison_Experiments_ICPRS_potsdam/spatialmamba_base_512/checkpoints/best.pth` | Spatial-Mamba-Base | | `Comparison_Experiments_ICPRS_potsdam/cnn_deeplabv3p_r50_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50 | | `Comparison_Experiments_ICPRS_potsdam/transformer_unetformer_r18_512/checkpoints/best.pth` | UNetFormer ResNet-18 | --- ### ImageNet Backbone Weights — `weights/imagenet/` | File | Description | |---|---| | `weights/imagenet/resnet50-11ad3fa6.pth` | ResNet-50 ImageNet-1K pretrained | | `weights/imagenet/resnet18-f37072fd.pth` | ResNet-18 ImageNet-1K pretrained | --- ## Results Summary Every row shares the same decoder, loss, optimizer, schedule, and data splits. **The only variable is the encoder.** ### LoveDA | Backbone | mIoU (All→All) | mIoU (U→R) | mIoU (R→U) | |---|---:|---:|---:| | DeepLabv3+ ResNet-50 (CNN) | 43.01 | 30.36 | 39.98 | | UNetFormer ResNet-18 (Transformer) | 48.61 | 34.56 | 44.84 | | VMamba-Small **🥇** | **55.66** | **40.62** | 53.52 | | MambaVision-Large | 55.25 | 38.53 | **54.01** | | Spatial-Mamba-Base | 48.03 | 35.23 | 46.55 | ### ISPRS Potsdam | Backbone | mIoU | |---|---:| | DeepLabv3+ ResNet-50 | 75.09 | | UNetFormer ResNet-18 | 74.99 | | VMamba-Small **🥇** | **77.59** | | MambaVision-Large | 77.07 | | Spatial-Mamba-Base | 70.00 | **Key findings:** - SSMs outperform CNNs and Transformers by a significant margin under identical conditions (+7–12 mIoU on LoveDA). - Scaling the encoder past VMamba-Small yields diminishing returns under a fixed decoder. - Domain transfer is asymmetric across all backbone families (Rural→Urban consistently outperforms Urban→Rural by 10–15 points) — a data distribution property, not a model property. - Boundary accuracy collapses under domain shift while interior accuracy holds — every backbone, every family. --- ## How to Load a Checkpoint ```python import torch # Example: load MambaVision-Base best checkpoint for LoveDA All→All ckpt = torch.load( "Comparison_Experiments/mambavision_base_512/checkpoints/best.pth", map_location="cpu" ) # keys: 'model', 'optimizer', 'scheduler', 'iter', 'best_score' model_state = ckpt["model"] ``` To build the full model and run inference, clone the code repository and follow the setup instructions: ```bash git clone https://github.com/dineth18/Mamba-Segmentation cd Mamba-Segmentation/MambaVision # or VMamba/, spatial-mamba/, etc. pip install -r requirements.txt # Set your dataset path (no need to edit config files) export LOVEDA_ROOT=/path/to/LoveDA export POTSDAM_ROOT=/path/to/ISPRS_Potsdam python eval.py --checkpoint path/to/best.pth ``` --- ## Citation If this benchmark is useful for your research, please cite: ```bibtex @article{wasalathilaka2026controlledbenchmark, title={A Controlled Benchmark of Visual State-Space Backbones with Domain-Shift and Boundary Analysis for Remote-Sensing Segmentation}, author={Wasalathilaka, Nichula and Perea, Dineth and Samarakoon, Oshadha and Wijenayake, Buddhi and Godaliyadda, Roshan and Herath, Vijitha and Ekanayake, Parakrama}, journal={IGARSS 2026}, year={2026} } ``` --- ## Acknowledgements - [VMamba](https://github.com/MzeroMiko/VMamba) — Visual State Space Model - [MambaVision](https://github.com/NVlabs/MambaVision) — NVIDIA hybrid Mamba-Transformer - [Spatial-Mamba](https://github.com/EdwardChaworworrachat/SpatialMamba) — Spatially-aware Mamba - [LoveDA](https://github.com/Junjue-Wang/LoveDA) — Land-cover domain adaptation dataset - [ISPRS Potsdam](https://www.isprs.org/education/benchmarks/UrbanSemLab/) — Urban semantic labeling benchmark Built at the **University of Peradeniya**.