WHU Building Detection โ€” EfficientNet-B4 + UNet++

A semantic segmentation model for building detection in high-resolution aerial imagery, trained on the WHU Building Dataset.

Model Description

Property Value
Architecture UNet++
Encoder EfficientNet-B4 (ImageNet pretrained)
Framework segmentation-models-pytorch (SMP)
Training Framework PyTorch Lightning
Input 3-channel RGB, 512x512 tiles
Output 2-class mask (Background=0, Building=1)
Parameters ~20.8M
Model Size ~84 MB

Performance

Evaluated on the WHU Building Dataset test split (1,228 tiles):

Metric Score
IoU 0.9054
Dice 0.9503
Best Val IoU 0.9434

Training Details

  • Dataset: WHU Building Dataset โ€” 5,732 training tiles (512x512 RGB at 0.3m resolution)
  • Validation split: 20% of training data
  • Optimizer: AdamW (lr=1e-4, weight_decay=1e-4)
  • Loss: CrossEntropyLoss
  • Epochs: 36 (early stopping, patience=10)
  • Batch size: 16
  • GPU: NVIDIA RTX 6000 Ada (48GB)
  • Encoder weights: ImageNet pretrained

Quick Start

Installation

pip install geoai-py timm segmentation-models-pytorch

Inference with GeoAI

import geoai

# Run building detection on a GeoTIFF
geoai.timm_segmentation_from_hub(
    input_path="input_image.tif",
    output_path="building_prediction.tif",
    repo_id="giswqs/whu-building-unetplusplus-efficientnet-b4",
    window_size=512,
    overlap=256,
    batch_size=4,
)

# Vectorize to building footprints
gdf = geoai.orthogonalize(
    input_path="building_prediction.tif",
    output_path="building_footprints.geojson",
    epsilon=2.0,
)

Manual Loading

import json
import torch
import segmentation_models_pytorch as smp

# Load config
with open("config.json") as f:
    config = json.load(f)

# Create model
model = smp.UnetPlusPlus(
    encoder_name="efficientnet-b4",
    encoder_weights=None,
    in_channels=3,
    classes=2,
)

# Load weights
state_dict = torch.load("model.pth", map_location="cpu")
model.load_state_dict(state_dict)
model.eval()

Example Notebook

See the full inference notebook with visualization and analysis:

Open In Colab

Dataset

The WHU Building Dataset consists of aerial imagery at 0.3m resolution with binary building masks:

  • Train: 5,732 tiles (512x512 RGB)
  • Val: 1,228 tiles
  • Test: 1,228 tiles

Reference

Ji, S., Wei, S., & Lu, M. (2019). Fully Convolutional Networks for Multisource Building Identification. IEEE Transactions on Geoscience and Remote Sensing, 57(1), 108-120.

License

This model is released under the Apache 2.0 License.

Links

Downloads last month
30
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train giswqs/whu-building-unetplusplus-efficientnet-b4