Model Card โ SegFormer-B0 Kathmandu Valley Satellite Segmentation
Model Description
This model is a fine-tuned SegFormer-B0 for semantic segmentation of satellite imagery over Kathmandu Valley, Nepal. It classifies each pixel into one of 7 land-use categories: Background, Residential Area, Road, River, Forest, and Unused Land. The model is intended for urban planning, GIS analysis, and geospatial research applications.
- Developed by: praniil
- Model type: Semantic Segmentation (SegFormer-B0)
- Language(s) (NLP): N/A (Computer Vision)
- License: MIT
- Finetuned from model:
nvidia/mit-b0(SegFormer-B0 pretrained on ImageNet)
Model Sources
- Repository: https://github.com/praniil/satellite-image-segmentation
- HuggingFace Hub:
Pranilllllll/segformer-satellite-segementation
Uses
Direct Use
This model can be used out-of-the-box for satellite image segmentation over Kathmandu Valley or similar urban/semi-urban landscapes. It accepts a 512ร512 RGB satellite image and outputs a per-pixel land-use classification mask.
import torch
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from transformers import SegformerForSemanticSegmentation, SegformerFeatureExtractor
device = "cuda" if torch.cuda.is_available() else "cpu"
HF_REPO = "Pranilllllll/segformer-satellite-segementation"
model = SegformerForSemanticSegmentation.from_pretrained(HF_REPO).to(device)
processor = SegformerFeatureExtractor.from_pretrained(HF_REPO)
model.eval()
image = Image.open("path_to_your_satellite_image.png").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
pixel_values = inputs["pixel_values"].to(device)
with torch.no_grad():
outputs = model(pixel_values=pixel_values)
logits = outputs.logits # [1, 7, H, W]
pred_mask = torch.argmax(logits, dim=1).squeeze().cpu().numpy()
colors = np.array([
[0, 0, 0], # Background
[128, 0, 0], # Residential Area
[0, 128, 0], # Road
[0, 0, 128], # River
[0, 128, 128], # Forest
[128, 128, 0], # Unused Land
[128, 0, 128], # (reserved)
], dtype=np.uint8)
seg_image = colors[pred_mask]
plt.imsave("prediction.png", seg_image)
print("Inference complete. Prediction saved as prediction.png")
Downstream Use
This model can be plugged into larger GIS pipelines for:
- Automated land-use/land-cover (LULC) mapping
- Urban sprawl analysis
- River and forest change detection
- Input feature generation for spatial planning models
Out-of-Scope Use
- Not suitable for segmenting non-satellite imagery (street photos, drone footage with different resolution/angle).
- Performance may degrade on satellite imagery from regions with significantly different land cover patterns than Kathmandu Valley.
- Not suitable for fine-grained object detection within classes (e.g., identifying individual buildings).
Bias, Risks, and Limitations
- Geographic bias: Trained exclusively on Kathmandu Valley tiles; may not generalize to other geographies.
- Class imbalance: Despite weighted loss, rare classes (Road, River) may have lower per-class IoU.
- Resolution dependency: Expects 512ร512 input tiles; other resolutions require resizing and may affect accuracy.
- Annotation noise: Manual annotations via CVAT may have some boundary ambiguity between classes.
Recommendations
Validate predictions on your specific region before using results for critical planning decisions. Cross-checking against GIS datasets (e.g., OpenStreetMap) is recommended.
How to Get Started with the Model
Install dependencies:
pip install torch transformers Pillow matplotlib
Then use the inference script in the Direct Use section above.
Training Details
Training Data
A custom dataset was built from satellite imagery of Kathmandu Valley, Nepal, divided into a grid of tiles.
- Total images: ~400 tiles
- Resolution: 512 ร 512 pixels
- Annotation tool: CVAT
- Task: Multi-class semantic segmentation
Annotation Classes
| Class ID | Class Name | RGB Color |
|---|---|---|
| 0 | Background | (0, 0, 0) |
| 1 | Residential Area | (128, 0, 0) |
| 2 | Road | (0, 128, 0) |
| 3 | River | (0, 0, 128) |
| 4 | Forest | (0, 128, 128) |
| 5 | Unused Land | (128, 128, 0) |
Training Procedure
Preprocessing
- Images resized to 512 ร 512
- Standard ImageNet normalization via
SegformerFeatureExtractor
Data Augmentation
Applied using albumentations:
- Horizontal and vertical flips
- Random 90-degree rotations
- Resize to 512 ร 512
Training Hyperparameters
| Hyperparameter | Value |
|---|---|
| Input size | 512 ร 512 |
| Batch size | 16 |
| Optimizer | AdamW |
| Learning rate | 3e-5 |
| Loss function | Weighted Cross-Entropy |
| Epochs | 300 (early stopping, patience=25) |
| Cross-validation | 3-fold |
| Training regime | bf16 mixed precision |
Class Imbalance Handling
Inverse frequency class weights were computed from the training set and applied to the cross-entropy loss, ensuring rare classes (Road, River) contribute proportionally during training.
Evaluation
Metrics
- Mean IoU (mIoU): Primary metric โ overlap between predicted and ground truth masks averaged across all classes.
- Per-class IoU: Segmentation accuracy per land-use category.
- Qualitative inspection: Visual comparison of predicted vs. ground truth masks.
Results
Cross-validation results are reported as mean ยฑ standard deviation of mIoU across 3 folds. Training curves (loss, mIoU, gradient norm) are available in the eval_plots/ directory.
The stable gradient norm across training confirms the MiT encoder converged effectively without vanishing gradient issues.
Model Architecture
- Backbone: SegFormer-B0 (
nvidia/mit-b0) - Encoder: MiT (Mix Transformer) โ hierarchical global context without positional encoding
- Decoder: Lightweight MLP head โ per-pixel class probability predictions
- Output: 7-class segmentation mask over a 512ร512 spatial grid
Environmental Impact
- Hardware Type: CUDA-enabled GPU
- Cloud Provider: Not applicable (local training)
- Compute Region: Nepal
- Carbon Emitted: Not measured
Citation
@misc{praniil2024kathmandu-segmentation,
author = {praniil},
title = {Kathmandu Valley Satellite Image Segmentation with SegFormer-B0},
year = {2024},
publisher = {GitHub},
howpublished = {\url{https://github.com/praniil/satellite-image-segmentation}},
}
Model Card Authors
Model Card Contact
Open an issue at https://github.com/praniil/satellite-image-segmentation/issues
- Downloads last month
- 256