--- title: Desert Semantic Segmentation Demo emoji: 🌵 colorFrom: yellow colorTo: orange sdk: gradio sdk_version: 4.44.0 app_file: app.py pinned: false license: apache-2.0 tags: - semantic-segmentation - segformer - transformers - desert - ugv - offroad datasets: - Offroad_Segmentation_Training_Dataset metrics: - mean_iou --- # 🌵 Desert Semantic Segmentation using SegFormer (MiT-B2) A **SegFormer** transformer model fine-tuned on the Offroad Segmentation Training Dataset for 10-class semantic segmentation of desert terrain — built for UGV (Unmanned Ground Vehicle) autonomous navigation in off-road environments. --- ## 🧠 Model Architecture | Component | Detail | |-----------------|-------------------------------------| | Framework | HuggingFace Transformers | | Model | SegFormer | | Backbone | MiT-B2 (`nvidia/mit-b2`) | | Parameters | 27,354,314 (all trainable) | | Decoder | Lightweight MLP Head | | Classes | 10 | | Input Size | 512 × 512 | | GPU | NVIDIA A100-PCIE-40GB | --- ## 🗂 Dataset Classes (10 Categories) | Class ID | Raw Mask Value | Label | |----------|---------------|---------------| | 0 | 100 | Trees | | 1 | 200 | Lush Bushes | | 2 | 300 | Dry Grass | | 3 | 500 | Dry Bushes | | 4 | 550 | Ground Clutter| | 5 | 600 | Flowers | | 6 | 700 | Logs | | 7 | 800 | Rocks | | 8 | 7100 | Landscape | | 9 | 10000 | Sky | --- ## 📊 Dataset Statistics | Split | Samples | Proportion | |------------|---------|------------| | Train | 2,142 | 75% | | Validation | 286 | 10% | | Test | 429 | 15% | | **Total** | **2,857** | — | - Image resolution: **960 × 540** (RGB) - Mask format: uint16 with raw class value encoding - Total annotated instances: **16,951** --- ## 🎨 Augmentation Pipeline 11 augmentations specifically chosen for desert and off-road conditions: | Augmentation | Purpose | |----------------------|-----------------------------------------------------| | Color Jitter | Handles varying sun angles and color temperatures | | Gamma Change | Simulates over/under-exposed outdoor scenes | | Gaussian Noise | Robustness to sensor noise in UGV cameras | | Motion / Gaussian / Median Blur | Motion blur from vehicle movement | | Random Shadows | Shadows from rocks, vegetation, terrain | | Random Fog | Dust storms and atmospheric haze | | Brightness/Contrast | Atmospheric and lighting variations | | Texture Mixup | Prevents overfitting to specific terrain patterns | | Horizontal Flip | Improves directional generalization | | Shift / Scale / Rotate | Spatial robustness | | Coarse Dropout | Simulates sensor occlusion | --- ## ⚙️ Training Configuration | Parameter | Value | |--------------------|-------------| | Epochs | 50 | | Batch Size | 8 | | Learning Rate | 6e-5 | | Optimizer | AdamW | | Warmup Steps | 500 | | Weight Decay | 0.01 | | FP16 | ✅ Enabled | | Best Model Metric | mean_iou | | Eval Strategy | Per epoch | --- ## 📈 Evaluation Results Evaluated on the **validation split** (286 images) using COCO-style mean IoU. | Metric | Value | |-----------------|--------| | **Mean IoU** | **0.6529** | | **Mean Accuracy** | **0.7592** | ### Per-Class IoU | Class | IoU | |----------------|--------| | Trees | 0.8517 | | Lush Bushes | 0.6990 | | Dry Grass | 0.7007 | | Dry Bushes | 0.4873 | | Ground Clutter | 0.3647 | | Flowers | 0.7246 | | Logs | 0.5591 | | Rocks | 0.4544 | | Landscape | 0.7014 | | Sky | 0.9860 | **Best class:** Sky (0.9860) — large uniform regions **Hardest class:** Ground Clutter (0.3647) — small, heterogeneous objects --- ## ⚙️ Inference ```python from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation from PIL import Image import torch import torch.nn.functional as F # Load model processor = SegformerImageProcessor.from_pretrained("PUSHPENDAR/desert-segformer") model = SegformerForSemanticSegmentation.from_pretrained("PUSHPENDAR/desert-segformer") model.eval() # Load image image = Image.open("desert_scene.jpg").convert("RGB") inputs = processor(images=image, return_tensors="pt") # Predict with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits # (1, num_classes, H/4, W/4) # Upsample to original size upsampled = F.interpolate( logits, size=(image.height, image.width), mode="bilinear", align_corners=False ) pred_mask = upsampled.argmax(dim=1)[0].numpy() # (H, W) print("Predicted class map shape:", pred_mask.shape) ``` --- ## 📦 Repository Files | File / Folder | Description | |--------------------------|------------------------------------------| | `pytorch_model.bin` | Fine-tuned SegFormer weights | | `config.json` | Model configuration | | `preprocessor_config.json` | Image processor settings | | `outputs/validation_metrics.json` | Saved evaluation metrics | | `outputs/training_curves.png` | Loss and mIoU training curves | | `outputs/test_predictions/` | Per-image prediction masks | --- ## 🚀 Run Locally ```bash git clone https://huggingface.co/PUSHPENDAR/desert-segformer cd desert-segformer pip install transformers torch pillow python app.py ``` --- ## 📝 Citation If you use this model or dataset, please cite: ```bibtex @misc{desert-segformer-2025, title = {Desert Semantic Segmentation with SegFormer (MiT-B2)}, author = {Pushpendar Choudhary}, year = {2025}, publisher = {HuggingFace}, url = {https://huggingface.co/PUSHPENDAR/desert-segformer} } ``` --- ## 📄 License Apache 2.0 — see [LICENSE](LICENSE) for details.