| --- |
| license: mit |
| language: |
| - en |
| tags: |
| - remote-sensing |
| - semantic-segmentation |
| - mamba |
| - state-space-model |
| - vmamba |
| - mambavision |
| - spatial-mamba |
| - pytorch |
| - benchmark |
| - loveda |
| - isprs-potsdam |
| - domain-adaptation |
| datasets: |
| - LoveDA |
| - ISPRS-Potsdam |
| pipeline_tag: image-segmentation |
| --- |
| |
| # Mamba-Segmentation |
|
|
| **Controlled Visual State-Space Backbone Benchmark with Domain-Shift & Boundary Analysis for Remote-Sensing Segmentation** |
|
|
| > *Accepted at IGARSS 2026* |
|
|
| One pipeline. One decoder. One loss. One schedule. **Five backbone families.** The only variable is the encoder — so the results finally mean something. |
|
|
| --- |
|
|
| ## What Is This? |
|
|
| Remote-sensing segmentation papers routinely change the backbone *and* the decoder *and* the loss *and* the training schedule all at once. The numbers tell you who tuned harder, not which backbone is better. |
|
|
| This repo fixes that. **One shared pipeline — swap the backbone — read the truth.** |
|
|
| | Component | Status | |
| |---|---| |
| | Encoder backbone | 🔀 **Swapped** per experiment — the ONLY variable | |
| | Decoder | 🔒 Fixed (lightweight U-Net, 256ch, MambaBlock2d) | |
| | Loss | 🔒 Fixed (Lovász-Softmax + Focal + Boundary) | |
| | Training schedule | 🔒 Fixed (50k iters, AdamW, poly LR decay) | |
| | Augmentations | 🔒 Fixed (random crop, flip, colour jitter) | |
| | Input resolution | 🔒 Fixed (512×512) | |
| | Feature interface | 🔒 Fixed ({F1–F4} at strides {4, 8, 16, 32}) | |
|
|
| --- |
|
|
| ## Checkpoints in This Repository |
|
|
| All checkpoints are `best.pth` files (highest validation mIoU during training) stored with their original directory structure. |
|
|
| ### LoveDA Experiments — `Comparison_Experiments/` |
| |
| #### MambaVision (NVIDIA hybrid Mamba-Transformer) |
| | Checkpoint path | Training split | |
| |---|---| |
| | `Comparison_Experiments/mambavision_tiny_512/checkpoints/best.pth` | All→All | |
| | `Comparison_Experiments/mambavision_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | |
| | `Comparison_Experiments/mambavision_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural | |
| | `Comparison_Experiments/mambavision_tiny2_512/checkpoints/best.pth` | All→All (v2) | |
| | `Comparison_Experiments/mambavision_tiny2_ruraltrain_512/checkpoints/best.pth` | Rural→Urban (v2) | |
| | `Comparison_Experiments/mambavision_tiny2_urbantrain_512/checkpoints/best.pth` | Urban→Rural (v2) | |
| | `Comparison_Experiments/mambavision_small_512/checkpoints/best.pth` | All→All | |
| | `Comparison_Experiments/mambavision_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | |
| | `Comparison_Experiments/mambavision_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural | |
| | `Comparison_Experiments/mambavision_base_512/checkpoints/best.pth` | All→All | |
| | `Comparison_Experiments/mambavision_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | |
| | `Comparison_Experiments/mambavision_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural | |
| | `Comparison_Experiments/mambavision_large_512/checkpoints/best.pth` | All→All | |
| | `Comparison_Experiments/mambavision_large_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | |
| | `Comparison_Experiments/mambavision_large_urbantrain_512/checkpoints/best.pth` | Urban→Rural | |
| | `Comparison_Experiments/mambavision_large2_512/checkpoints/best.pth` | All→All | |
| | `Comparison_Experiments/mambavision_large2_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | |
| | `Comparison_Experiments/mambavision_large2_urbantrain_512/checkpoints/best.pth` | Urban→Rural | |
|
|
| #### VMamba (cross-scan 2D selective SSM) |
| | Checkpoint path | Training split | |
| |---|---| |
| | `Comparison_Experiments/Vmamb_tiny_512/checkpoints/best.pth` | All→All | |
| | `Comparison_Experiments/vmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | |
| | `Comparison_Experiments/vmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural | |
| | `Comparison_Experiments/Vmamb_small_512/checkpoints/best.pth` | All→All | |
| | `Comparison_Experiments/Vmamb_small_512_2/checkpoints/best.pth` | All→All (run 2) | |
| | `Comparison_Experiments/Vmamb_small_512_3/checkpoints/best.pth` | All→All (run 3) | |
| | `Comparison_Experiments/vmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | |
| | `Comparison_Experiments/vmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural | |
| | `Comparison_Experiments/Vmamb_base_512/checkpoints/best.pth` | All→All | |
| | `Comparison_Experiments/vmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | |
| | `Comparison_Experiments/vmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural | |
|
|
| #### VisionMamba / Vim (bidirectional Mamba) |
| | Checkpoint path | Training split | |
| |---|---| |
| | `Comparison_Experiments/VisionMamba_tiny_512/checkpoints/best.pth` | All→All | |
| | `Comparison_Experiments/visionmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | |
| | `Comparison_Experiments/visionmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural | |
| | `Comparison_Experiments/VisionMamba_small_512/checkpoints/best.pth` | All→All | |
| | `Comparison_Experiments/visionmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | |
| | `Comparison_Experiments/visionmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural | |
| | `Comparison_Experiments/VisionMamba_base_512/checkpoints/best.pth` | All→All | |
| | `Comparison_Experiments/visionmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | |
| | `Comparison_Experiments/visionmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural | |
|
|
| #### Spatial-Mamba (spatially-aware SSM) |
| | Checkpoint path | Training split | |
| |---|---| |
| | `Comparison_Experiments/spatialmamba_tiny_512/checkpoints/best.pth` | All→All | |
| | `Comparison_Experiments/spatialmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | |
| | `Comparison_Experiments/spatialmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural | |
| | `Comparison_Experiments/spatialmamba_small_512/checkpoints/best.pth` | All→All | |
| | `Comparison_Experiments/spatialmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | |
| | `Comparison_Experiments/spatialmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural | |
| | `Comparison_Experiments/spatialmamba_base_512/checkpoints/best.pth` | All→All | |
| | `Comparison_Experiments/spatialmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban | |
| | `Comparison_Experiments/spatialmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural | |
|
|
| #### CNN & Transformer Baselines |
| | Checkpoint path | Model | |
| |---|---| |
| | `Comparison_Experiments/cnn_deeplabv3p_r50_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, All→All | |
| | `Comparison_Experiments/cnn_deeplabv3p_resnet50_ruraltrain_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, Rural→Urban | |
| | `Comparison_Experiments/cnn_deeplabv3p_resnet50_urbantrain_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, Urban→Rural | |
| | `Comparison_Experiments/cnn_unet_r50_512/checkpoints/best.pth` | U-Net ResNet-50, All→All | |
| | `Comparison_Experiments/transformer_unetformer_r18_512/checkpoints/best.pth` | UNetFormer ResNet-18, All→All | |
| | `Comparison_Experiments/transformerunetformer_resnet18_ruraltrain_512/checkpoints/best.pth` | UNetFormer ResNet-18, Rural→Urban | |
| | `Comparison_Experiments/transformerunetformer_resnet18_urbantrain_512/checkpoints/best.pth` | UNetFormer ResNet-18, Urban→Rural | |
|
|
| --- |
|
|
| ### ISPRS Potsdam Experiments — `Comparison_Experiments_ICPRS_potsdam/` |
| |
| | Checkpoint path | Model | |
| |---|---| |
| | `Comparison_Experiments_ICPRS_potsdam/mambavision_tiny_512/checkpoints/best.pth` | MambaVision-Tiny | |
| | `Comparison_Experiments_ICPRS_potsdam/mambavision_tiny2_512/checkpoints/best.pth` | MambaVision-Tiny2 | |
| | `Comparison_Experiments_ICPRS_potsdam/mambavision_small_512/checkpoints/best.pth` | MambaVision-Small | |
| | `Comparison_Experiments_ICPRS_potsdam/mambavision_base_512/checkpoints/best.pth` | MambaVision-Base | |
| | `Comparison_Experiments_ICPRS_potsdam/mambavision_large_512/checkpoints/best.pth` | MambaVision-Large | |
| | `Comparison_Experiments_ICPRS_potsdam/mambavision_large2_512/checkpoints/best.pth` | MambaVision-Large2 | |
| | `Comparison_Experiments_ICPRS_potsdam/vmamba_tiny_512/checkpoints/best.pth` | VMamba-Tiny | |
| | `Comparison_Experiments_ICPRS_potsdam/vmamba_small_512/checkpoints/best.pth` | VMamba-Small | |
| | `Comparison_Experiments_ICPRS_potsdam/vmamba_base_512/checkpoints/best.pth` | VMamba-Base | |
| | `Comparison_Experiments_ICPRS_potsdam/spatialmamba_tiny_512/checkpoints/best.pth` | Spatial-Mamba-Tiny | |
| | `Comparison_Experiments_ICPRS_potsdam/spatialmamba_small_512/checkpoints/best.pth` | Spatial-Mamba-Small | |
| | `Comparison_Experiments_ICPRS_potsdam/spatialmamba_base_512/checkpoints/best.pth` | Spatial-Mamba-Base | |
| | `Comparison_Experiments_ICPRS_potsdam/cnn_deeplabv3p_r50_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50 | |
| | `Comparison_Experiments_ICPRS_potsdam/transformer_unetformer_r18_512/checkpoints/best.pth` | UNetFormer ResNet-18 | |
|
|
| --- |
|
|
| ### ImageNet Backbone Weights — `weights/imagenet/` |
|
|
| | File | Description | |
| |---|---| |
| | `weights/imagenet/resnet50-11ad3fa6.pth` | ResNet-50 ImageNet-1K pretrained | |
| | `weights/imagenet/resnet18-f37072fd.pth` | ResNet-18 ImageNet-1K pretrained | |
|
|
| --- |
|
|
| ## Results Summary |
|
|
| Every row shares the same decoder, loss, optimizer, schedule, and data splits. **The only variable is the encoder.** |
|
|
| ### LoveDA |
|
|
| | Backbone | mIoU (All→All) | mIoU (U→R) | mIoU (R→U) | |
| |---|---:|---:|---:| |
| | DeepLabv3+ ResNet-50 (CNN) | 43.01 | 30.36 | 39.98 | |
| | UNetFormer ResNet-18 (Transformer) | 48.61 | 34.56 | 44.84 | |
| | VMamba-Small **🥇** | **55.66** | **40.62** | 53.52 | |
| | MambaVision-Large | 55.25 | 38.53 | **54.01** | |
| | Spatial-Mamba-Base | 48.03 | 35.23 | 46.55 | |
|
|
| ### ISPRS Potsdam |
|
|
| | Backbone | mIoU | |
| |---|---:| |
| | DeepLabv3+ ResNet-50 | 75.09 | |
| | UNetFormer ResNet-18 | 74.99 | |
| | VMamba-Small **🥇** | **77.59** | |
| | MambaVision-Large | 77.07 | |
| | Spatial-Mamba-Base | 70.00 | |
|
|
| **Key findings:** |
| - SSMs outperform CNNs and Transformers by a significant margin under identical conditions (+7–12 mIoU on LoveDA). |
| - Scaling the encoder past VMamba-Small yields diminishing returns under a fixed decoder. |
| - Domain transfer is asymmetric across all backbone families (Rural→Urban consistently outperforms Urban→Rural by 10–15 points) — a data distribution property, not a model property. |
| - Boundary accuracy collapses under domain shift while interior accuracy holds — every backbone, every family. |
|
|
| --- |
|
|
| ## How to Load a Checkpoint |
|
|
| ```python |
| import torch |
| |
| # Example: load MambaVision-Base best checkpoint for LoveDA All→All |
| ckpt = torch.load( |
| "Comparison_Experiments/mambavision_base_512/checkpoints/best.pth", |
| map_location="cpu" |
| ) |
| # keys: 'model', 'optimizer', 'scheduler', 'iter', 'best_score' |
| model_state = ckpt["model"] |
| ``` |
|
|
| To build the full model and run inference, clone the code repository and follow the setup instructions: |
|
|
| ```bash |
| git clone https://github.com/dineth18/Mamba-Segmentation |
| cd Mamba-Segmentation/MambaVision # or VMamba/, spatial-mamba/, etc. |
| pip install -r requirements.txt |
| |
| # Set your dataset path (no need to edit config files) |
| export LOVEDA_ROOT=/path/to/LoveDA |
| export POTSDAM_ROOT=/path/to/ISPRS_Potsdam |
| |
| python eval.py --checkpoint path/to/best.pth |
| ``` |
|
|
| --- |
|
|
| ## Citation |
|
|
| If this benchmark is useful for your research, please cite: |
|
|
| ```bibtex |
| @article{wasalathilaka2026controlledbenchmark, |
| title={A Controlled Benchmark of Visual State-Space Backbones with |
| Domain-Shift and Boundary Analysis for Remote-Sensing Segmentation}, |
| author={Wasalathilaka, Nichula and Perea, Dineth and Samarakoon, Oshadha |
| and Wijenayake, Buddhi and Godaliyadda, Roshan and Herath, Vijitha |
| and Ekanayake, Parakrama}, |
| journal={IGARSS 2026}, |
| year={2026} |
| } |
| ``` |
|
|
| --- |
|
|
| ## Acknowledgements |
|
|
| - [VMamba](https://github.com/MzeroMiko/VMamba) — Visual State Space Model |
| - [MambaVision](https://github.com/NVlabs/MambaVision) — NVIDIA hybrid Mamba-Transformer |
| - [Spatial-Mamba](https://github.com/EdwardChaworworrachat/SpatialMamba) — Spatially-aware Mamba |
| - [LoveDA](https://github.com/Junjue-Wang/LoveDA) — Land-cover domain adaptation dataset |
| - [ISPRS Potsdam](https://www.isprs.org/education/benchmarks/UrbanSemLab/) — Urban semantic labeling benchmark |
|
|
| Built at the **University of Peradeniya**. |
|
|