File size: 4,100 Bytes
bebbee0
 
 
 
 
 
 
 
 
 
49d2955
 
 
 
 
 
bebbee0
49d2955
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bebbee0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
---
license: apache-2.0
datasets:
- IGNF/FLAIR-HUB
language:
- en
metrics:
- mean_iou
pipeline_tag: image-segmentation
---
# ISDNet - Standalone PyTorch Implementation

**ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-high Resolution Segmentation**

CVPR 2022 | [Paper](https://openaccess.thecvf.com/content/CVPR2022/html/Guo_ISDNet_Integrating_Shallow_and_Deep_Networks_for_Efficient_Ultra-High_Resolution_CVPR_2022_paper.html)

This is a **standalone PyTorch implementation** of ISDNet, without MMSegmentation dependencies, adapted from [ISDNet](https://github.com/cedricgsh/ISDNet).

## Features

- Pure PyTorch implementation (no MMSeg required for training/inference)
- Multi-GPU training with DistributedDataParallel
- FLAIR French land cover dataset support (15 classes)
- Modern Python packaging with `uv`
- Modular code structure

## Model Architecture

- **Backbone**: ResNet-18 (via timm)
- **Shallow Path**: STDC-like module
- **Deep Path**: ASPP with dilated convolutions
- **Fusion**: Feature pyramid with lateral connections
- **Parameters**: 17.76M
- **FLOPs**: 21.79G @ 512x512

## Installation

```bash
# With uv (recommended)
uv sync

# Or with pip
pip install -e .
```

### Dependencies

- Python >= 3.10
- PyTorch >= 2.0
- timm >= 0.9
- mmcv >= 2.0 (for ConvModule only)

## Project Structure

```
isdnet/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ config.py           # Training configuration
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ isdnet.py       # Main ISDNet model
β”‚   β”œβ”€β”€ modules.py      # STDC blocks, Laplacian pyramid
β”‚   └── heads.py        # ASPP, ISDHead, RefineASPPHead
β”œβ”€β”€ datasets/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── flair.py        # FLAIR dataset class
└── utils/
    β”œβ”€β”€ __init__.py
    └── distributed.py  # DDP utilities
train.py                # Training script
inference.py            # Evaluation script
```

## Training

Multi-GPU training on FLAIR dataset:

```bash
uv run torchrun --nproc_per_node=4 train.py
```

Single GPU:

```bash
uv run python train.py
```

## Inference

Evaluate on test set:

```bash
uv run python inference.py --checkpoint isdnet_flair_best.pth
uv run python inference.py --checkpoint isdnet_flair_best.pth --split valid
```

Configuration (in `isdnet/config.py`):
- Batch size: 16 per GPU (64 total)
- Learning rate: 1e-3
- Optimizer: SGD with momentum
- Scheduler: PolynomialLR
- Epochs: 80
- Crop size: 512x512

## Usage

```python
from isdnet import ISDNet, FLAIRDataset

# Create model
model = ISDNet(
    num_classes=15,
    backbone='resnet18',
    stdc_pretrain='STDCNet813M_73.91.tar'
).cuda()

# Training forward
outputs = model(images, return_loss=True)
# Returns: out, out_deep, out_aux16, out_aux8, aux_out, losses_re, losses_fa

# Inference forward
predictions = model(images, return_loss=False)
# Returns: (B, num_classes, H, W) logits
```

## Results on FLAIR Dataset

| Metric | Value |
|--------|-------|
| Val mIoU | 59.82% |
| Test mIoU | 52.77% |
| Pixel Accuracy | 72.02% |

### Per-class IoU (Test)

| Class | IoU |
|-------|-----|
| water | 81.6% |
| vineyard | 74.7% |
| building | 72.0% |
| deciduous | 66.4% |
| impervious | 66.5% |
| greenhouse | 61.3% |
| bare soil | 56.2% |
| coniferous | 55.1% |
| agricultural | 53.3% |
| snow | 51.9% |
| pervious | 49.0% |
| herbaceous | 46.0% |
| plowed land | 33.4% |
| brushwood | 24.0% |
| swimming_pool | 0.0% |

## STDC Pretrained Weights

Download STDC pretrained weights: [STDCNet813M_73.91.tar](https://drive.google.com/file/d/1FfG-qRlGy-2BsVjN2ZcKTEG9wZeo3sdW/view)

## Citation

```bibtex
@inproceedings{guo2022isdnet,
  title={ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation},
  author={Guo, Shaohua and Liu, Liang and Gan, Zhenye and Wang, Yabiao and Zhang, Wuhao and Wang, Chengjie and Jiang, Guannan and Zhang, Wei and Yi, Ran and Ma, Lizhuang and others},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={4361--4370},
  year={2022}
}
```

## License

Apache-2.0