Mamba-Segmentation / README.md
dineth18's picture
Restore HF model card with YAML frontmatter and full model index
fe4b2ae verified
---
license: mit
language:
- en
tags:
- remote-sensing
- semantic-segmentation
- mamba
- state-space-model
- vmamba
- mambavision
- spatial-mamba
- pytorch
- benchmark
- loveda
- isprs-potsdam
- domain-adaptation
datasets:
- LoveDA
- ISPRS-Potsdam
pipeline_tag: image-segmentation
---
# Mamba-Segmentation
**Controlled Visual State-Space Backbone Benchmark with Domain-Shift & Boundary Analysis for Remote-Sensing Segmentation**
> *Accepted at IGARSS 2026*
One pipeline. One decoder. One loss. One schedule. **Five backbone families.** The only variable is the encoder — so the results finally mean something.
---
## What Is This?
Remote-sensing segmentation papers routinely change the backbone *and* the decoder *and* the loss *and* the training schedule all at once. The numbers tell you who tuned harder, not which backbone is better.
This repo fixes that. **One shared pipeline — swap the backbone — read the truth.**
| Component | Status |
|---|---|
| Encoder backbone | 🔀 **Swapped** per experiment — the ONLY variable |
| Decoder | 🔒 Fixed (lightweight U-Net, 256ch, MambaBlock2d) |
| Loss | 🔒 Fixed (Lovász-Softmax + Focal + Boundary) |
| Training schedule | 🔒 Fixed (50k iters, AdamW, poly LR decay) |
| Augmentations | 🔒 Fixed (random crop, flip, colour jitter) |
| Input resolution | 🔒 Fixed (512×512) |
| Feature interface | 🔒 Fixed ({F1–F4} at strides {4, 8, 16, 32}) |
---
## Checkpoints in This Repository
All checkpoints are `best.pth` files (highest validation mIoU during training) stored with their original directory structure.
### LoveDA Experiments — `Comparison_Experiments/`
#### MambaVision (NVIDIA hybrid Mamba-Transformer)
| Checkpoint path | Training split |
|---|---|
| `Comparison_Experiments/mambavision_tiny_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/mambavision_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/mambavision_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/mambavision_tiny2_512/checkpoints/best.pth` | All→All (v2) |
| `Comparison_Experiments/mambavision_tiny2_ruraltrain_512/checkpoints/best.pth` | Rural→Urban (v2) |
| `Comparison_Experiments/mambavision_tiny2_urbantrain_512/checkpoints/best.pth` | Urban→Rural (v2) |
| `Comparison_Experiments/mambavision_small_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/mambavision_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/mambavision_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/mambavision_base_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/mambavision_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/mambavision_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/mambavision_large_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/mambavision_large_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/mambavision_large_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/mambavision_large2_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/mambavision_large2_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/mambavision_large2_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
#### VMamba (cross-scan 2D selective SSM)
| Checkpoint path | Training split |
|---|---|
| `Comparison_Experiments/Vmamb_tiny_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/vmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/vmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/Vmamb_small_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/Vmamb_small_512_2/checkpoints/best.pth` | All→All (run 2) |
| `Comparison_Experiments/Vmamb_small_512_3/checkpoints/best.pth` | All→All (run 3) |
| `Comparison_Experiments/vmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/vmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/Vmamb_base_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/vmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/vmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
#### VisionMamba / Vim (bidirectional Mamba)
| Checkpoint path | Training split |
|---|---|
| `Comparison_Experiments/VisionMamba_tiny_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/visionmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/visionmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/VisionMamba_small_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/visionmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/visionmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/VisionMamba_base_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/visionmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/visionmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
#### Spatial-Mamba (spatially-aware SSM)
| Checkpoint path | Training split |
|---|---|
| `Comparison_Experiments/spatialmamba_tiny_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/spatialmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/spatialmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/spatialmamba_small_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/spatialmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/spatialmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/spatialmamba_base_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/spatialmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/spatialmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
#### CNN & Transformer Baselines
| Checkpoint path | Model |
|---|---|
| `Comparison_Experiments/cnn_deeplabv3p_r50_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, All→All |
| `Comparison_Experiments/cnn_deeplabv3p_resnet50_ruraltrain_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, Rural→Urban |
| `Comparison_Experiments/cnn_deeplabv3p_resnet50_urbantrain_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, Urban→Rural |
| `Comparison_Experiments/cnn_unet_r50_512/checkpoints/best.pth` | U-Net ResNet-50, All→All |
| `Comparison_Experiments/transformer_unetformer_r18_512/checkpoints/best.pth` | UNetFormer ResNet-18, All→All |
| `Comparison_Experiments/transformerunetformer_resnet18_ruraltrain_512/checkpoints/best.pth` | UNetFormer ResNet-18, Rural→Urban |
| `Comparison_Experiments/transformerunetformer_resnet18_urbantrain_512/checkpoints/best.pth` | UNetFormer ResNet-18, Urban→Rural |
---
### ISPRS Potsdam Experiments — `Comparison_Experiments_ICPRS_potsdam/`
| Checkpoint path | Model |
|---|---|
| `Comparison_Experiments_ICPRS_potsdam/mambavision_tiny_512/checkpoints/best.pth` | MambaVision-Tiny |
| `Comparison_Experiments_ICPRS_potsdam/mambavision_tiny2_512/checkpoints/best.pth` | MambaVision-Tiny2 |
| `Comparison_Experiments_ICPRS_potsdam/mambavision_small_512/checkpoints/best.pth` | MambaVision-Small |
| `Comparison_Experiments_ICPRS_potsdam/mambavision_base_512/checkpoints/best.pth` | MambaVision-Base |
| `Comparison_Experiments_ICPRS_potsdam/mambavision_large_512/checkpoints/best.pth` | MambaVision-Large |
| `Comparison_Experiments_ICPRS_potsdam/mambavision_large2_512/checkpoints/best.pth` | MambaVision-Large2 |
| `Comparison_Experiments_ICPRS_potsdam/vmamba_tiny_512/checkpoints/best.pth` | VMamba-Tiny |
| `Comparison_Experiments_ICPRS_potsdam/vmamba_small_512/checkpoints/best.pth` | VMamba-Small |
| `Comparison_Experiments_ICPRS_potsdam/vmamba_base_512/checkpoints/best.pth` | VMamba-Base |
| `Comparison_Experiments_ICPRS_potsdam/spatialmamba_tiny_512/checkpoints/best.pth` | Spatial-Mamba-Tiny |
| `Comparison_Experiments_ICPRS_potsdam/spatialmamba_small_512/checkpoints/best.pth` | Spatial-Mamba-Small |
| `Comparison_Experiments_ICPRS_potsdam/spatialmamba_base_512/checkpoints/best.pth` | Spatial-Mamba-Base |
| `Comparison_Experiments_ICPRS_potsdam/cnn_deeplabv3p_r50_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50 |
| `Comparison_Experiments_ICPRS_potsdam/transformer_unetformer_r18_512/checkpoints/best.pth` | UNetFormer ResNet-18 |
---
### ImageNet Backbone Weights — `weights/imagenet/`
| File | Description |
|---|---|
| `weights/imagenet/resnet50-11ad3fa6.pth` | ResNet-50 ImageNet-1K pretrained |
| `weights/imagenet/resnet18-f37072fd.pth` | ResNet-18 ImageNet-1K pretrained |
---
## Results Summary
Every row shares the same decoder, loss, optimizer, schedule, and data splits. **The only variable is the encoder.**
### LoveDA
| Backbone | mIoU (All→All) | mIoU (U→R) | mIoU (R→U) |
|---|---:|---:|---:|
| DeepLabv3+ ResNet-50 (CNN) | 43.01 | 30.36 | 39.98 |
| UNetFormer ResNet-18 (Transformer) | 48.61 | 34.56 | 44.84 |
| VMamba-Small **🥇** | **55.66** | **40.62** | 53.52 |
| MambaVision-Large | 55.25 | 38.53 | **54.01** |
| Spatial-Mamba-Base | 48.03 | 35.23 | 46.55 |
### ISPRS Potsdam
| Backbone | mIoU |
|---|---:|
| DeepLabv3+ ResNet-50 | 75.09 |
| UNetFormer ResNet-18 | 74.99 |
| VMamba-Small **🥇** | **77.59** |
| MambaVision-Large | 77.07 |
| Spatial-Mamba-Base | 70.00 |
**Key findings:**
- SSMs outperform CNNs and Transformers by a significant margin under identical conditions (+7–12 mIoU on LoveDA).
- Scaling the encoder past VMamba-Small yields diminishing returns under a fixed decoder.
- Domain transfer is asymmetric across all backbone families (Rural→Urban consistently outperforms Urban→Rural by 10–15 points) — a data distribution property, not a model property.
- Boundary accuracy collapses under domain shift while interior accuracy holds — every backbone, every family.
---
## How to Load a Checkpoint
```python
import torch
# Example: load MambaVision-Base best checkpoint for LoveDA All→All
ckpt = torch.load(
"Comparison_Experiments/mambavision_base_512/checkpoints/best.pth",
map_location="cpu"
)
# keys: 'model', 'optimizer', 'scheduler', 'iter', 'best_score'
model_state = ckpt["model"]
```
To build the full model and run inference, clone the code repository and follow the setup instructions:
```bash
git clone https://github.com/dineth18/Mamba-Segmentation
cd Mamba-Segmentation/MambaVision # or VMamba/, spatial-mamba/, etc.
pip install -r requirements.txt
# Set your dataset path (no need to edit config files)
export LOVEDA_ROOT=/path/to/LoveDA
export POTSDAM_ROOT=/path/to/ISPRS_Potsdam
python eval.py --checkpoint path/to/best.pth
```
---
## Citation
If this benchmark is useful for your research, please cite:
```bibtex
@article{wasalathilaka2026controlledbenchmark,
title={A Controlled Benchmark of Visual State-Space Backbones with
Domain-Shift and Boundary Analysis for Remote-Sensing Segmentation},
author={Wasalathilaka, Nichula and Perea, Dineth and Samarakoon, Oshadha
and Wijenayake, Buddhi and Godaliyadda, Roshan and Herath, Vijitha
and Ekanayake, Parakrama},
journal={IGARSS 2026},
year={2026}
}
```
---
## Acknowledgements
- [VMamba](https://github.com/MzeroMiko/VMamba) — Visual State Space Model
- [MambaVision](https://github.com/NVlabs/MambaVision) — NVIDIA hybrid Mamba-Transformer
- [Spatial-Mamba](https://github.com/EdwardChaworworrachat/SpatialMamba) — Spatially-aware Mamba
- [LoveDA](https://github.com/Junjue-Wang/LoveDA) — Land-cover domain adaptation dataset
- [ISPRS Potsdam](https://www.isprs.org/education/benchmarks/UrbanSemLab/) — Urban semantic labeling benchmark
Built at the **University of Peradeniya**.