🌡 SegFormer-B2 β€” Offroad Desert Semantic Segmentation

YOLO Pune Hackathon 2026 Β· Duality AI Γ— MIT WPU

Fine-tuned nvidia/mit-b2 on synthetic desert imagery from Duality AI's Falcon simulation platform. The model segments every pixel of a desert scene into one of 10 classes. It was trained on Desert A and evaluated on a completely unseen Desert B location β€” a domain shift challenge.


πŸ“Š Results

Split mIoU Notes
Val (Desert A holdout) 0.6293 317 images, dedicated Duality AI split
Test (Desert B β€” raw) 0.2843 1002 images, different location
Test (Desert B β€” corrected) 0.4061 7 present classes only; Flowers, Logs, Ground Clutter absent in Desert B

Per-class IoU (Test Set β€” Desert B)

Class Val IoU Test IoU Present in Desert B
🌳 Trees 0.8572 0.3896 βœ…
🌿 Lush Bushes 0.6937 0.0003 βœ…
🌾 Dry Grass 0.6923 0.4502 βœ…
πŸͺ¨ Dry Bushes 0.4929 0.3831 βœ…
🌸 Flowers 0.5757 0.0000 ❌ Absent
πŸͺ΅ Logs 0.5214 0.0000 ❌ Absent
⛰️ Rocks 0.4877 0.0402 βœ…
🏜️ Landscape 0.6007 0.5993 βœ…
☁️ Sky 0.9839 0.9802 βœ…
πŸͺ¨ Ground Clutter 0.3874 0.0000 ❌ Absent

Note on corrected mIoU: The 3 absent classes (Flowers, Logs, Ground Clutter) score IoU=0 by definition β€” the model never sees them in the test set. Corrected mIoU averages only the 7 classes that actually appear in Desert B.


πŸ—οΈ Model Details

Property Value
Architecture SegFormer-B2 (Mix-Transformer encoder + All-MLP decoder)
Base model nvidia/mit-b2 (ImageNet-1K pretrained)
Total parameters 27.4M
Encoder 23.7M (pretrained)
Decoder head 3.7M (randomly initialised, fine-tuned)
Input resolution 512 Γ— 512 px
Output classes 10
Training platform Kaggle GPU (T4/P100)

πŸ—‚οΈ Dataset

Synthetic desert images generated by Duality AI's Falcon simulation platform.

Split Images Masks Location
Train 2,857 2,857 Desert A
Val 317 317 Desert A (dedicated split)
Test 1,002 1,002 Desert B (unseen)

Mask label IDs β€” non-standard sparse integers remapped to 0–9:

Raw ID Class Compact ID
100 Trees 0
200 Lush Bushes 1
300 Dry Grass 2
500 Dry Bushes 3
550 Ground Clutter 4
600 Flowers βš‘ rare 5
700 Logs βš‘ rare 6
800 Rocks 7
7100 Landscape 8
10000 Sky 9

βš™οΈ Training Configuration

Optimiser & Schedule

Optimiser   : AdamW   lr=6e-5   weight_decay=1e-2
LR schedule : CosineAnnealingLR   T_max=50   eta_min=1e-7
Grad clip   : max_norm=1.0
Early stop  : patience=7 epochs on val mIoU
Best epoch  : 19   (V2 model)

Loss Function

Weighted CrossEntropyLoss with ignore_index=255 to handle unlabelled pixels.

Class Weight Reason
Flowers 5.0Γ— Extremely rare β€” forces model to notice them
Logs 4.0Γ— Rare and often partially occluded
Ground Clutter 2.0Γ— Small objects, easily missed
Lush Bushes, Dry Bushes 2.0Γ— Medium frequency
Trees, Dry Grass, Rocks 1.5Γ— Moderate
Sky 0.8Γ— Dominant β€” downweighted
Landscape 0.5Γ— Most dominant β€” heavily downweighted

Augmentation Pipeline

All four techniques recommended in Duality AI's official training guide:

A.Resize(512, 512),
A.HorizontalFlip(p=0.5),                         # 1. Flip β€” deserts have no L/R bias
A.RandomResizedCrop(size=(512,512),               # 2. Zoom β€” simulates camera distance
    scale=(0.6, 1.0), ratio=(0.75, 1.33), p=0.5),
A.Affine(shear=(-15,15), rotate=(-10,10), p=0.4), # 3. Shear β€” off-level camera angles
# 4. Mosaic β€” 4 images stitched into 2Γ—2 grid (30% probability, custom implementation)
A.HueSaturationValue(                             # 5. HSV β€” MILD to preserve warm desert palette
    hue_shift_limit=10, sat_shift_limit=20,
    val_shift_limit=15, p=0.4),
A.RandomBrightnessContrast(
    brightness_limit=0.15, contrast_limit=0.15, p=0.4),
A.Normalize(mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225]),

⚠️ HSV jitter is intentionally mild. The Duality AI desert palette is warm-toned. Aggressive colour shifts would train the model on unrealistic lighting and hurt generalisation.


πŸš€ Usage

Quick inference

import torch
import numpy as np
from PIL import Image
from transformers import SegformerForSemanticSegmentation
import albumentations as A
from albumentations.pytorch import ToTensorV2
import torch.nn.functional as F

# ── Class definitions ────────────────────────────────────────────────────────
CLASS_NAMES = [
    "Trees", "Lush_Bushes", "Dry_Grass", "Dry_Bushes",
    "Ground_Clutter", "Flowers", "Logs", "Rocks", "Landscape", "Sky"
]
PALETTE = np.array([
    [34,139,34],[0,200,83],[210,180,140],[200,200,180],[139,90,43],
    [255,20,147],[139,69,19],[128,128,128],[205,170,100],[135,206,235]
], dtype=np.uint8)

# ── Load model ───────────────────────────────────────────────────────────────
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model  = SegformerForSemanticSegmentation.from_pretrained(
    "YOUR_HF_USERNAME/segformer-b2-desert-segmentation"
).to(device).eval()

# ── Preprocess ───────────────────────────────────────────────────────────────
transform = A.Compose([
    A.Resize(512, 512),
    A.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ToTensorV2(),
])

# ── Inference ────────────────────────────────────────────────────────────────
img    = np.array(Image.open("desert_image.png").convert("RGB"))
h0, w0 = img.shape[:2]
tensor = transform(image=img)["image"].unsqueeze(0).to(device)

with torch.no_grad():
    logits = model(pixel_values=tensor).logits          # [1, 10, H/4, W/4]
    up     = F.interpolate(logits, size=(h0, w0),
                           mode="bilinear", align_corners=False)
    pred   = up.argmax(dim=1).squeeze(0).cpu().numpy()  # [H, W]  values 0–9

# ── Colour visualisation ─────────────────────────────────────────────────────
def mask_to_rgb(mask):
    rgb = np.zeros((*mask.shape, 3), dtype=np.uint8)
    for c in range(10):
        rgb[mask == c] = PALETTE[c]
    return rgb

pred_rgb = mask_to_rgb(pred)
Image.fromarray(pred_rgb).save("segmentation_output.png")
print("Classes found:", [CLASS_NAMES[c] for c in np.unique(pred)])

Load from checkpoint

import torch
from transformers import SegformerForSemanticSegmentation

# HuggingFace directory
model = SegformerForSemanticSegmentation.from_pretrained(
    "YOUR_HF_USERNAME/segformer-b2-desert-segmentation"
).eval()

# Raw .pth checkpoint (if you downloaded it separately)
ckpt = torch.load("best_model.pth", map_location="cpu")
model.load_state_dict(ckpt["model_state"])
print(f"Loaded epoch {ckpt['epoch']} β€” val mIoU {ckpt['val_miou']:.4f}")

⚠️ Limitations & Known Issues

  • Domain shift is real. The model was trained on Desert A (Duality AI Falcon synthetic data). Performance on real-world desert images or different Falcon biomes may vary significantly.
  • 3 classes absent in Desert B. Flowers, Logs, and Ground Clutter do not appear in the test location. The model has learned to predict them on Desert A but will produce near-zero IoU on any location where they are absent.
  • Lush Bushes test generalisation is poor (IoU 0.6937 val β†’ 0.0003 test). Desert B appears to have a very different bush distribution or colour tone from Desert A.
  • Rocks generalise weakly (0.4877 β†’ 0.0402). Rock textures vary heavily between locations.
  • Sky and Landscape generalise near-perfectly (Sky: 0.9839 β†’ 0.9802; Landscape: 0.6007 β†’ 0.5993). These classes are visually consistent across desert biomes.

πŸ“ Repository Structure

segformer-b2-desert-segmentation/
β”œβ”€β”€ config.json                  ← model architecture config
β”œβ”€β”€ model.safetensors            ← fine-tuned weights (404 MB)
β”œβ”€β”€ preprocessor_config.json     ← image processor settings
β”œβ”€β”€ metadata.json                ← training metadata & scores
└── README.md                    ← this file

πŸ“š Citation

If you use this model, please cite:

@misc{mitwpu2025desert,
  title        = {SegFormer-B2 Fine-tuned on Duality AI Desert Segmentation},
  author       = {MIT WPU Team},
  year         = {2025},
  howpublished = {YOLO Pune Hackathon 2025, Duality AI Challenge},
  url          = {https://huggingface.co/YOUR_HF_USERNAME/segformer-b2-desert-segmentation}
}

Base model:

@article{xie2021segformer,
  title   = {SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
  author  = {Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
  journal = {NeurIPS},
  year    = {2021}
}

πŸ… Acknowledgements

  • Duality AI for the Falcon synthetic dataset and challenge
  • NVIDIA for the pretrained SegFormer-B2 backbone
  • YOLO Pune Hackathon 2025 organisers at MIT WPU

Model trained and evaluated by MIT WPU for the Duality AI Offroad Segmentation challenge at YOLO Pune Hackathon 2026.

Downloads last month
4
Safetensors
Model size
27.4M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support