ISDNet - Standalone PyTorch Implementation
ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-high Resolution Segmentation
CVPR 2022 | Paper
This is a standalone PyTorch implementation of ISDNet, without MMSegmentation dependencies, adapted from ISDNet.
Features
- Pure PyTorch implementation (no MMSeg required for training/inference)
- Multi-GPU training with DistributedDataParallel
- FLAIR French land cover dataset support (15 classes)
- Modern Python packaging with
uv - Modular code structure
Model Architecture
- Backbone: ResNet-18 (via timm)
- Shallow Path: STDC-like module
- Deep Path: ASPP with dilated convolutions
- Fusion: Feature pyramid with lateral connections
- Parameters: 17.76M
- FLOPs: 21.79G @ 512x512
Installation
# With uv (recommended)
uv sync
# Or with pip
pip install -e .
Dependencies
- Python >= 3.10
- PyTorch >= 2.0
- timm >= 0.9
- mmcv >= 2.0 (for ConvModule only)
Project Structure
isdnet/
βββ __init__.py
βββ config.py # Training configuration
βββ models/
β βββ __init__.py
β βββ isdnet.py # Main ISDNet model
β βββ modules.py # STDC blocks, Laplacian pyramid
β βββ heads.py # ASPP, ISDHead, RefineASPPHead
βββ datasets/
β βββ __init__.py
β βββ flair.py # FLAIR dataset class
βββ utils/
βββ __init__.py
βββ distributed.py # DDP utilities
train.py # Training script
inference.py # Evaluation script
Training
Multi-GPU training on FLAIR dataset:
uv run torchrun --nproc_per_node=4 train.py
Single GPU:
uv run python train.py
Inference
Evaluate on test set:
uv run python inference.py --checkpoint isdnet_flair_best.pth
uv run python inference.py --checkpoint isdnet_flair_best.pth --split valid
Configuration (in isdnet/config.py):
- Batch size: 16 per GPU (64 total)
- Learning rate: 1e-3
- Optimizer: SGD with momentum
- Scheduler: PolynomialLR
- Epochs: 80
- Crop size: 512x512
Usage
from isdnet import ISDNet, FLAIRDataset
# Create model
model = ISDNet(
num_classes=15,
backbone='resnet18',
stdc_pretrain='STDCNet813M_73.91.tar'
).cuda()
# Training forward
outputs = model(images, return_loss=True)
# Returns: out, out_deep, out_aux16, out_aux8, aux_out, losses_re, losses_fa
# Inference forward
predictions = model(images, return_loss=False)
# Returns: (B, num_classes, H, W) logits
Results on FLAIR Dataset
| Metric | Value |
|---|---|
| Val mIoU | 59.82% |
| Test mIoU | 52.77% |
| Pixel Accuracy | 72.02% |
Per-class IoU (Test)
| Class | IoU |
|---|---|
| water | 81.6% |
| vineyard | 74.7% |
| building | 72.0% |
| deciduous | 66.4% |
| impervious | 66.5% |
| greenhouse | 61.3% |
| bare soil | 56.2% |
| coniferous | 55.1% |
| agricultural | 53.3% |
| snow | 51.9% |
| pervious | 49.0% |
| herbaceous | 46.0% |
| plowed land | 33.4% |
| brushwood | 24.0% |
| swimming_pool | 0.0% |
STDC Pretrained Weights
Download STDC pretrained weights: STDCNet813M_73.91.tar
Citation
@inproceedings{guo2022isdnet,
title={ISDNet: Integrating Shallow and Deep Networks for Efficient Ultra-High Resolution Segmentation},
author={Guo, Shaohua and Liu, Liang and Gan, Zhenye and Wang, Yabiao and Zhang, Wuhao and Wang, Chengjie and Jiang, Guannan and Zhang, Wei and Yi, Ran and Ma, Lizhuang and others},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={4361--4370},
year={2022}
}
License
Apache-2.0