π΅ SegFormer-B2 β Offroad Desert Semantic Segmentation
YOLO Pune Hackathon 2026 Β· Duality AI Γ MIT WPU
Fine-tuned nvidia/mit-b2 on synthetic desert imagery from Duality AI's Falcon simulation platform. The model segments every pixel of a desert scene into one of 10 classes. It was trained on Desert A and evaluated on a completely unseen Desert B location β a domain shift challenge.
π Results
| Split | mIoU | Notes |
|---|---|---|
| Val (Desert A holdout) | 0.6293 | 317 images, dedicated Duality AI split |
| Test (Desert B β raw) | 0.2843 | 1002 images, different location |
| Test (Desert B β corrected) | 0.4061 | 7 present classes only; Flowers, Logs, Ground Clutter absent in Desert B |
Per-class IoU (Test Set β Desert B)
| Class | Val IoU | Test IoU | Present in Desert B |
|---|---|---|---|
| π³ Trees | 0.8572 | 0.3896 | β |
| πΏ Lush Bushes | 0.6937 | 0.0003 | β |
| πΎ Dry Grass | 0.6923 | 0.4502 | β |
| πͺ¨ Dry Bushes | 0.4929 | 0.3831 | β |
| πΈ Flowers | 0.5757 | 0.0000 | β Absent |
| πͺ΅ Logs | 0.5214 | 0.0000 | β Absent |
| β°οΈ Rocks | 0.4877 | 0.0402 | β |
| ποΈ Landscape | 0.6007 | 0.5993 | β |
| βοΈ Sky | 0.9839 | 0.9802 | β |
| πͺ¨ Ground Clutter | 0.3874 | 0.0000 | β Absent |
Note on corrected mIoU: The 3 absent classes (Flowers, Logs, Ground Clutter) score IoU=0 by definition β the model never sees them in the test set. Corrected mIoU averages only the 7 classes that actually appear in Desert B.
ποΈ Model Details
| Property | Value |
|---|---|
| Architecture | SegFormer-B2 (Mix-Transformer encoder + All-MLP decoder) |
| Base model | nvidia/mit-b2 (ImageNet-1K pretrained) |
| Total parameters | 27.4M |
| Encoder | 23.7M (pretrained) |
| Decoder head | 3.7M (randomly initialised, fine-tuned) |
| Input resolution | 512 Γ 512 px |
| Output classes | 10 |
| Training platform | Kaggle GPU (T4/P100) |
ποΈ Dataset
Synthetic desert images generated by Duality AI's Falcon simulation platform.
| Split | Images | Masks | Location |
|---|---|---|---|
| Train | 2,857 | 2,857 | Desert A |
| Val | 317 | 317 | Desert A (dedicated split) |
| Test | 1,002 | 1,002 | Desert B (unseen) |
Mask label IDs β non-standard sparse integers remapped to 0β9:
| Raw ID | Class | Compact ID |
|---|---|---|
| 100 | Trees | 0 |
| 200 | Lush Bushes | 1 |
| 300 | Dry Grass | 2 |
| 500 | Dry Bushes | 3 |
| 550 | Ground Clutter | 4 |
| 600 | Flowers β rare | 5 |
| 700 | Logs β rare | 6 |
| 800 | Rocks | 7 |
| 7100 | Landscape | 8 |
| 10000 | Sky | 9 |
βοΈ Training Configuration
Optimiser & Schedule
Optimiser : AdamW lr=6e-5 weight_decay=1e-2
LR schedule : CosineAnnealingLR T_max=50 eta_min=1e-7
Grad clip : max_norm=1.0
Early stop : patience=7 epochs on val mIoU
Best epoch : 19 (V2 model)
Loss Function
Weighted CrossEntropyLoss with ignore_index=255 to handle unlabelled pixels.
| Class | Weight | Reason |
|---|---|---|
| Flowers | 5.0Γ | Extremely rare β forces model to notice them |
| Logs | 4.0Γ | Rare and often partially occluded |
| Ground Clutter | 2.0Γ | Small objects, easily missed |
| Lush Bushes, Dry Bushes | 2.0Γ | Medium frequency |
| Trees, Dry Grass, Rocks | 1.5Γ | Moderate |
| Sky | 0.8Γ | Dominant β downweighted |
| Landscape | 0.5Γ | Most dominant β heavily downweighted |
Augmentation Pipeline
All four techniques recommended in Duality AI's official training guide:
A.Resize(512, 512),
A.HorizontalFlip(p=0.5), # 1. Flip β deserts have no L/R bias
A.RandomResizedCrop(size=(512,512), # 2. Zoom β simulates camera distance
scale=(0.6, 1.0), ratio=(0.75, 1.33), p=0.5),
A.Affine(shear=(-15,15), rotate=(-10,10), p=0.4), # 3. Shear β off-level camera angles
# 4. Mosaic β 4 images stitched into 2Γ2 grid (30% probability, custom implementation)
A.HueSaturationValue( # 5. HSV β MILD to preserve warm desert palette
hue_shift_limit=10, sat_shift_limit=20,
val_shift_limit=15, p=0.4),
A.RandomBrightnessContrast(
brightness_limit=0.15, contrast_limit=0.15, p=0.4),
A.Normalize(mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225]),
β οΈ HSV jitter is intentionally mild. The Duality AI desert palette is warm-toned. Aggressive colour shifts would train the model on unrealistic lighting and hurt generalisation.
π Usage
Quick inference
import torch
import numpy as np
from PIL import Image
from transformers import SegformerForSemanticSegmentation
import albumentations as A
from albumentations.pytorch import ToTensorV2
import torch.nn.functional as F
# ββ Class definitions ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
CLASS_NAMES = [
"Trees", "Lush_Bushes", "Dry_Grass", "Dry_Bushes",
"Ground_Clutter", "Flowers", "Logs", "Rocks", "Landscape", "Sky"
]
PALETTE = np.array([
[34,139,34],[0,200,83],[210,180,140],[200,200,180],[139,90,43],
[255,20,147],[139,69,19],[128,128,128],[205,170,100],[135,206,235]
], dtype=np.uint8)
# ββ Load model βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = SegformerForSemanticSegmentation.from_pretrained(
"YOUR_HF_USERNAME/segformer-b2-desert-segmentation"
).to(device).eval()
# ββ Preprocess βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
transform = A.Compose([
A.Resize(512, 512),
A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
ToTensorV2(),
])
# ββ Inference ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
img = np.array(Image.open("desert_image.png").convert("RGB"))
h0, w0 = img.shape[:2]
tensor = transform(image=img)["image"].unsqueeze(0).to(device)
with torch.no_grad():
logits = model(pixel_values=tensor).logits # [1, 10, H/4, W/4]
up = F.interpolate(logits, size=(h0, w0),
mode="bilinear", align_corners=False)
pred = up.argmax(dim=1).squeeze(0).cpu().numpy() # [H, W] values 0β9
# ββ Colour visualisation βββββββββββββββββββββββββββββββββββββββββββββββββββββ
def mask_to_rgb(mask):
rgb = np.zeros((*mask.shape, 3), dtype=np.uint8)
for c in range(10):
rgb[mask == c] = PALETTE[c]
return rgb
pred_rgb = mask_to_rgb(pred)
Image.fromarray(pred_rgb).save("segmentation_output.png")
print("Classes found:", [CLASS_NAMES[c] for c in np.unique(pred)])
Load from checkpoint
import torch
from transformers import SegformerForSemanticSegmentation
# HuggingFace directory
model = SegformerForSemanticSegmentation.from_pretrained(
"YOUR_HF_USERNAME/segformer-b2-desert-segmentation"
).eval()
# Raw .pth checkpoint (if you downloaded it separately)
ckpt = torch.load("best_model.pth", map_location="cpu")
model.load_state_dict(ckpt["model_state"])
print(f"Loaded epoch {ckpt['epoch']} β val mIoU {ckpt['val_miou']:.4f}")
β οΈ Limitations & Known Issues
- Domain shift is real. The model was trained on Desert A (Duality AI Falcon synthetic data). Performance on real-world desert images or different Falcon biomes may vary significantly.
- 3 classes absent in Desert B. Flowers, Logs, and Ground Clutter do not appear in the test location. The model has learned to predict them on Desert A but will produce near-zero IoU on any location where they are absent.
- Lush Bushes test generalisation is poor (IoU 0.6937 val β 0.0003 test). Desert B appears to have a very different bush distribution or colour tone from Desert A.
- Rocks generalise weakly (0.4877 β 0.0402). Rock textures vary heavily between locations.
- Sky and Landscape generalise near-perfectly (Sky: 0.9839 β 0.9802; Landscape: 0.6007 β 0.5993). These classes are visually consistent across desert biomes.
π Repository Structure
segformer-b2-desert-segmentation/
βββ config.json β model architecture config
βββ model.safetensors β fine-tuned weights (404 MB)
βββ preprocessor_config.json β image processor settings
βββ metadata.json β training metadata & scores
βββ README.md β this file
π Citation
If you use this model, please cite:
@misc{mitwpu2025desert,
title = {SegFormer-B2 Fine-tuned on Duality AI Desert Segmentation},
author = {MIT WPU Team},
year = {2025},
howpublished = {YOLO Pune Hackathon 2025, Duality AI Challenge},
url = {https://huggingface.co/YOUR_HF_USERNAME/segformer-b2-desert-segmentation}
}
Base model:
@article{xie2021segformer,
title = {SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
author = {Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
journal = {NeurIPS},
year = {2021}
}
π Acknowledgements
- Duality AI for the Falcon synthetic dataset and challenge
- NVIDIA for the pretrained SegFormer-B2 backbone
- YOLO Pune Hackathon 2025 organisers at MIT WPU
Model trained and evaluated by MIT WPU for the Duality AI Offroad Segmentation challenge at YOLO Pune Hackathon 2026.
- Downloads last month
- 4