File size: 12,280 Bytes
6a46b36
fe4b2ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6a46b36
 
fe4b2ae
6a46b36
fe4b2ae
6a46b36
fe4b2ae
4e30fe8
fe4b2ae
6a46b36
 
 
fe4b2ae
6a46b36
fe4b2ae
6a46b36
fe4b2ae
6a46b36
fe4b2ae
6a46b36
 
fe4b2ae
 
 
 
6a46b36
 
 
 
 
fe4b2ae
6a46b36
fe4b2ae
6a46b36
fe4b2ae
4e30fe8
fe4b2ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6a46b36
 
 
fe4b2ae
6a46b36
fe4b2ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6a46b36
 
 
fe4b2ae
6a46b36
fe4b2ae
 
 
 
6a46b36
fe4b2ae
4e30fe8
fe4b2ae
4e30fe8
fe4b2ae
6a46b36
fe4b2ae
6a46b36
fe4b2ae
 
 
 
 
 
 
6a46b36
fe4b2ae
4e30fe8
fe4b2ae
 
 
 
 
 
 
4e30fe8
fe4b2ae
 
 
 
 
6a46b36
 
 
fe4b2ae
4e30fe8
fe4b2ae
 
6a46b36
fe4b2ae
 
 
 
 
 
 
6a46b36
 
fe4b2ae
6a46b36
 
fe4b2ae
 
 
4e30fe8
fe4b2ae
 
 
4e30fe8
fe4b2ae
4e30fe8
 
 
 
fe4b2ae
4e30fe8
fe4b2ae
6a46b36
 
 
 
fe4b2ae
 
 
 
 
6a46b36
 
 
 
 
 
fe4b2ae
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
---
license: mit
language:
  - en
tags:
  - remote-sensing
  - semantic-segmentation
  - mamba
  - state-space-model
  - vmamba
  - mambavision
  - spatial-mamba
  - pytorch
  - benchmark
  - loveda
  - isprs-potsdam
  - domain-adaptation
datasets:
  - LoveDA
  - ISPRS-Potsdam
pipeline_tag: image-segmentation
---

# Mamba-Segmentation

**Controlled Visual State-Space Backbone Benchmark with Domain-Shift & Boundary Analysis for Remote-Sensing Segmentation**

> *Accepted at IGARSS 2026*

One pipeline. One decoder. One loss. One schedule. **Five backbone families.** The only variable is the encoder — so the results finally mean something.

---

## What Is This?

Remote-sensing segmentation papers routinely change the backbone *and* the decoder *and* the loss *and* the training schedule all at once. The numbers tell you who tuned harder, not which backbone is better.

This repo fixes that. **One shared pipeline — swap the backbone — read the truth.**

| Component | Status |
|---|---|
| Encoder backbone | 🔀 **Swapped** per experiment — the ONLY variable |
| Decoder | 🔒 Fixed (lightweight U-Net, 256ch, MambaBlock2d) |
| Loss | 🔒 Fixed (Lovász-Softmax + Focal + Boundary) |
| Training schedule | 🔒 Fixed (50k iters, AdamW, poly LR decay) |
| Augmentations | 🔒 Fixed (random crop, flip, colour jitter) |
| Input resolution | 🔒 Fixed (512×512) |
| Feature interface | 🔒 Fixed ({F1–F4} at strides {4, 8, 16, 32}) |

---

## Checkpoints in This Repository

All checkpoints are `best.pth` files (highest validation mIoU during training) stored with their original directory structure.

### LoveDA Experiments — `Comparison_Experiments/`

#### MambaVision (NVIDIA hybrid Mamba-Transformer)
| Checkpoint path | Training split |
|---|---|
| `Comparison_Experiments/mambavision_tiny_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/mambavision_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/mambavision_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/mambavision_tiny2_512/checkpoints/best.pth` | All→All (v2) |
| `Comparison_Experiments/mambavision_tiny2_ruraltrain_512/checkpoints/best.pth` | Rural→Urban (v2) |
| `Comparison_Experiments/mambavision_tiny2_urbantrain_512/checkpoints/best.pth` | Urban→Rural (v2) |
| `Comparison_Experiments/mambavision_small_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/mambavision_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/mambavision_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/mambavision_base_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/mambavision_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/mambavision_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/mambavision_large_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/mambavision_large_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/mambavision_large_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/mambavision_large2_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/mambavision_large2_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/mambavision_large2_urbantrain_512/checkpoints/best.pth` | Urban→Rural |

#### VMamba (cross-scan 2D selective SSM)
| Checkpoint path | Training split |
|---|---|
| `Comparison_Experiments/Vmamb_tiny_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/vmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/vmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/Vmamb_small_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/Vmamb_small_512_2/checkpoints/best.pth` | All→All (run 2) |
| `Comparison_Experiments/Vmamb_small_512_3/checkpoints/best.pth` | All→All (run 3) |
| `Comparison_Experiments/vmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/vmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/Vmamb_base_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/vmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/vmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |

#### VisionMamba / Vim (bidirectional Mamba)
| Checkpoint path | Training split |
|---|---|
| `Comparison_Experiments/VisionMamba_tiny_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/visionmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/visionmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/VisionMamba_small_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/visionmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/visionmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/VisionMamba_base_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/visionmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/visionmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |

#### Spatial-Mamba (spatially-aware SSM)
| Checkpoint path | Training split |
|---|---|
| `Comparison_Experiments/spatialmamba_tiny_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/spatialmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/spatialmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/spatialmamba_small_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/spatialmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/spatialmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
| `Comparison_Experiments/spatialmamba_base_512/checkpoints/best.pth` | All→All |
| `Comparison_Experiments/spatialmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
| `Comparison_Experiments/spatialmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |

#### CNN & Transformer Baselines
| Checkpoint path | Model |
|---|---|
| `Comparison_Experiments/cnn_deeplabv3p_r50_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, All→All |
| `Comparison_Experiments/cnn_deeplabv3p_resnet50_ruraltrain_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, Rural→Urban |
| `Comparison_Experiments/cnn_deeplabv3p_resnet50_urbantrain_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, Urban→Rural |
| `Comparison_Experiments/cnn_unet_r50_512/checkpoints/best.pth` | U-Net ResNet-50, All→All |
| `Comparison_Experiments/transformer_unetformer_r18_512/checkpoints/best.pth` | UNetFormer ResNet-18, All→All |
| `Comparison_Experiments/transformerunetformer_resnet18_ruraltrain_512/checkpoints/best.pth` | UNetFormer ResNet-18, Rural→Urban |
| `Comparison_Experiments/transformerunetformer_resnet18_urbantrain_512/checkpoints/best.pth` | UNetFormer ResNet-18, Urban→Rural |

---

### ISPRS Potsdam Experiments — `Comparison_Experiments_ICPRS_potsdam/`

| Checkpoint path | Model |
|---|---|
| `Comparison_Experiments_ICPRS_potsdam/mambavision_tiny_512/checkpoints/best.pth` | MambaVision-Tiny |
| `Comparison_Experiments_ICPRS_potsdam/mambavision_tiny2_512/checkpoints/best.pth` | MambaVision-Tiny2 |
| `Comparison_Experiments_ICPRS_potsdam/mambavision_small_512/checkpoints/best.pth` | MambaVision-Small |
| `Comparison_Experiments_ICPRS_potsdam/mambavision_base_512/checkpoints/best.pth` | MambaVision-Base |
| `Comparison_Experiments_ICPRS_potsdam/mambavision_large_512/checkpoints/best.pth` | MambaVision-Large |
| `Comparison_Experiments_ICPRS_potsdam/mambavision_large2_512/checkpoints/best.pth` | MambaVision-Large2 |
| `Comparison_Experiments_ICPRS_potsdam/vmamba_tiny_512/checkpoints/best.pth` | VMamba-Tiny |
| `Comparison_Experiments_ICPRS_potsdam/vmamba_small_512/checkpoints/best.pth` | VMamba-Small |
| `Comparison_Experiments_ICPRS_potsdam/vmamba_base_512/checkpoints/best.pth` | VMamba-Base |
| `Comparison_Experiments_ICPRS_potsdam/spatialmamba_tiny_512/checkpoints/best.pth` | Spatial-Mamba-Tiny |
| `Comparison_Experiments_ICPRS_potsdam/spatialmamba_small_512/checkpoints/best.pth` | Spatial-Mamba-Small |
| `Comparison_Experiments_ICPRS_potsdam/spatialmamba_base_512/checkpoints/best.pth` | Spatial-Mamba-Base |
| `Comparison_Experiments_ICPRS_potsdam/cnn_deeplabv3p_r50_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50 |
| `Comparison_Experiments_ICPRS_potsdam/transformer_unetformer_r18_512/checkpoints/best.pth` | UNetFormer ResNet-18 |

---

### ImageNet Backbone Weights — `weights/imagenet/`

| File | Description |
|---|---|
| `weights/imagenet/resnet50-11ad3fa6.pth` | ResNet-50 ImageNet-1K pretrained |
| `weights/imagenet/resnet18-f37072fd.pth` | ResNet-18 ImageNet-1K pretrained |

---

## Results Summary

Every row shares the same decoder, loss, optimizer, schedule, and data splits. **The only variable is the encoder.**

### LoveDA

| Backbone | mIoU (All→All) | mIoU (U→R) | mIoU (R→U) |
|---|---:|---:|---:|
| DeepLabv3+ ResNet-50 (CNN) | 43.01 | 30.36 | 39.98 |
| UNetFormer ResNet-18 (Transformer) | 48.61 | 34.56 | 44.84 |
| VMamba-Small **🥇** | **55.66** | **40.62** | 53.52 |
| MambaVision-Large | 55.25 | 38.53 | **54.01** |
| Spatial-Mamba-Base | 48.03 | 35.23 | 46.55 |

### ISPRS Potsdam

| Backbone | mIoU |
|---|---:|
| DeepLabv3+ ResNet-50 | 75.09 |
| UNetFormer ResNet-18 | 74.99 |
| VMamba-Small **🥇** | **77.59** |
| MambaVision-Large | 77.07 |
| Spatial-Mamba-Base | 70.00 |

**Key findings:**
- SSMs outperform CNNs and Transformers by a significant margin under identical conditions (+7–12 mIoU on LoveDA).
- Scaling the encoder past VMamba-Small yields diminishing returns under a fixed decoder.
- Domain transfer is asymmetric across all backbone families (Rural→Urban consistently outperforms Urban→Rural by 10–15 points) — a data distribution property, not a model property.
- Boundary accuracy collapses under domain shift while interior accuracy holds — every backbone, every family.

---

## How to Load a Checkpoint

```python
import torch

# Example: load MambaVision-Base best checkpoint for LoveDA All→All
ckpt = torch.load(
    "Comparison_Experiments/mambavision_base_512/checkpoints/best.pth",
    map_location="cpu"
)
# keys: 'model', 'optimizer', 'scheduler', 'iter', 'best_score'
model_state = ckpt["model"]
```

To build the full model and run inference, clone the code repository and follow the setup instructions:

```bash
git clone https://github.com/dineth18/Mamba-Segmentation
cd Mamba-Segmentation/MambaVision   # or VMamba/, spatial-mamba/, etc.
pip install -r requirements.txt

# Set your dataset path (no need to edit config files)
export LOVEDA_ROOT=/path/to/LoveDA
export POTSDAM_ROOT=/path/to/ISPRS_Potsdam

python eval.py --checkpoint path/to/best.pth
```

---

## Citation

If this benchmark is useful for your research, please cite:

```bibtex
@article{wasalathilaka2026controlledbenchmark,
  title={A Controlled Benchmark of Visual State-Space Backbones with
         Domain-Shift and Boundary Analysis for Remote-Sensing Segmentation},
  author={Wasalathilaka, Nichula and Perea, Dineth and Samarakoon, Oshadha
          and Wijenayake, Buddhi and Godaliyadda, Roshan and Herath, Vijitha
          and Ekanayake, Parakrama},
  journal={IGARSS 2026},
  year={2026}
}
```

---

## Acknowledgements

- [VMamba](https://github.com/MzeroMiko/VMamba) — Visual State Space Model
- [MambaVision](https://github.com/NVlabs/MambaVision) — NVIDIA hybrid Mamba-Transformer
- [Spatial-Mamba](https://github.com/EdwardChaworworrachat/SpatialMamba) — Spatially-aware Mamba
- [LoveDA](https://github.com/Junjue-Wang/LoveDA) — Land-cover domain adaptation dataset
- [ISPRS Potsdam](https://www.isprs.org/education/benchmarks/UrbanSemLab/) — Urban semantic labeling benchmark

Built at the **University of Peradeniya**.