| --- |
| language: |
| - en |
| license: apache-2.0 |
| tags: |
| - earth-observation |
| - segmentation |
| - unet |
| - pytorch |
| - remote-sensing |
| - spacenet |
| datasets: |
| - harshinde/spacenet-rio |
| metrics: |
| - iou |
| - accuracy |
| - dice |
| pipeline_tag: image-segmentation |
| --- |
| |
| # SpaceNet Rio Building Detection Model |
|
|
| This model detects building footprints from high-resolution satellite imagery. It is a PyTorch-based U-Net model trained on the **SpaceNet (Rio de Janeiro)** dataset for semantic segmentation (binary: background vs. building). |
|
|
| ## Model Details |
|
|
| - **Architecture:** U-Net with residual connections, 4 encoder/decoder levels, 10% spatial dropout, and 1024-channel bottleneck. |
| - **Task:** Semantic Segmentation (Building Footprint Extraction) |
| - **Input:** 3-band (RGB) pan-sharpened GeoTIFFs (dynamic architecture also supports 8-band multispectral). |
| - **Output:** Binary mask (0: background, 1: building). |
| - **Parameters:** ~31M (Kaiming He initialized) |
| - **Framework:** PyTorch |
|
|
| ## Uses |
|
|
| ### Direct Use |
| This model can be used to automatically detect and extract building footprint masks from satellite imagery. It is primarily designed for high-resolution (e.g., ~50cm/pixel) RGB satellite tiles. |
|
|
| ### Out-of-Scope Use |
| - General object detection (e.g., cars, roads). |
| - Imagery with completely different spatial resolutions (e.g., 30m Landsat data) without fine-tuning. |
|
|
| ## Training Details |
|
|
| ### Dataset |
| Trained on the [SpaceNet Rio de Janeiro](https://huggingface.co/datasets/harshinde/spacenet-rio) dataset. |
| - **Total Tiles:** 6,940 |
| - **Split:** 7:1:2 (Train: 4,857 | Val: 693 | Test: 1,387) |
|
|
| ### Hyperparameters |
| - **Epochs:** 100 (with early stopping patience of 15) |
| - **Batch Size:** 16 (Train) / 4 (Val) |
| - **Learning Rate:** 0.001 with 5 warmup epochs |
| - **Weight Decay:** 0.0001 |
| - **Loss Function:** Combined Dice Loss (weight 1.0) + Cross-Entropy Loss (weight 1.0) |
| - **Image Crops:** 400x400 (Train) / 480x480 (Val) |
|
|
| ### Training Metrics |
| Training metrics were tracked using TensorBoard and include: |
| - Training/Validation Loss |
| - Mean IoU and Per-Class IoU |
| - Pixel Accuracy |
|
|
| You can view the full training logs and curves [here on TensorBoard](https://huggingface.co/harshinde/spacenet/tensorboard). |
|
|
| ## How to Get Started with the Model |
|
|
| You can load the weights using PyTorch: |
|
|
| ```python |
| import torch |
| |
| # Assuming the U-Net architecture is defined in your local code |
| # model = UNet(in_channels=3, num_classes=2) |
| |
| checkpoint = torch.load("best_model.pt", map_location="cpu") |
| |
| # Depending on how the state dict was saved, load it into the model |
| # model.load_state_dict(checkpoint['model_state_dict']) # if saved as a dictionary |
| # OR |
| # model.load_state_dict(checkpoint) # if saved as raw state_dict |
| |
| model.eval() |
| ``` |