WHU Building Detection โ EfficientNet-B4 + UNet++
A semantic segmentation model for building detection in high-resolution aerial imagery, trained on the WHU Building Dataset.
Model Description
| Property | Value |
|---|---|
| Architecture | UNet++ |
| Encoder | EfficientNet-B4 (ImageNet pretrained) |
| Framework | segmentation-models-pytorch (SMP) |
| Training Framework | PyTorch Lightning |
| Input | 3-channel RGB, 512x512 tiles |
| Output | 2-class mask (Background=0, Building=1) |
| Parameters | ~20.8M |
| Model Size | ~84 MB |
Performance
Evaluated on the WHU Building Dataset test split (1,228 tiles):
| Metric | Score |
|---|---|
| IoU | 0.9054 |
| Dice | 0.9503 |
| Best Val IoU | 0.9434 |
Training Details
- Dataset: WHU Building Dataset โ 5,732 training tiles (512x512 RGB at 0.3m resolution)
- Validation split: 20% of training data
- Optimizer: AdamW (lr=1e-4, weight_decay=1e-4)
- Loss: CrossEntropyLoss
- Epochs: 36 (early stopping, patience=10)
- Batch size: 16
- GPU: NVIDIA RTX 6000 Ada (48GB)
- Encoder weights: ImageNet pretrained
Quick Start
Installation
pip install geoai-py timm segmentation-models-pytorch
Inference with GeoAI
import geoai
# Run building detection on a GeoTIFF
geoai.timm_segmentation_from_hub(
input_path="input_image.tif",
output_path="building_prediction.tif",
repo_id="giswqs/whu-building-unetplusplus-efficientnet-b4",
window_size=512,
overlap=256,
batch_size=4,
)
# Vectorize to building footprints
gdf = geoai.orthogonalize(
input_path="building_prediction.tif",
output_path="building_footprints.geojson",
epsilon=2.0,
)
Manual Loading
import json
import torch
import segmentation_models_pytorch as smp
# Load config
with open("config.json") as f:
config = json.load(f)
# Create model
model = smp.UnetPlusPlus(
encoder_name="efficientnet-b4",
encoder_weights=None,
in_channels=3,
classes=2,
)
# Load weights
state_dict = torch.load("model.pth", map_location="cpu")
model.load_state_dict(state_dict)
model.eval()
Example Notebook
See the full inference notebook with visualization and analysis:
Dataset
The WHU Building Dataset consists of aerial imagery at 0.3m resolution with binary building masks:
- Train: 5,732 tiles (512x512 RGB)
- Val: 1,228 tiles
- Test: 1,228 tiles
Reference
Ji, S., Wei, S., & Lu, M. (2019). Fully Convolutional Networks for Multisource Building Identification. IEEE Transactions on Geoscience and Remote Sensing, 57(1), 108-120.
License
This model is released under the Apache 2.0 License.
Links
- GeoAI package: https://github.com/opengeos/geoai
- Documentation: https://geoai.gishub.org
- Downloads last month
- 30