segformer-desert / README.md
PUSHPENDAR's picture
Update README.md
bcc743d verified
---
title: Desert Semantic Segmentation Demo
emoji: 🌡
colorFrom: yellow
colorTo: orange
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: apache-2.0
tags:
- semantic-segmentation
- segformer
- transformers
- desert
- ugv
- offroad
datasets:
- Offroad_Segmentation_Training_Dataset
metrics:
- mean_iou
---
# 🌡 Desert Semantic Segmentation using SegFormer (MiT-B2)
A **SegFormer** transformer model fine-tuned on the Offroad Segmentation Training Dataset for 10-class semantic segmentation of desert terrain β€” built for UGV (Unmanned Ground Vehicle) autonomous navigation in off-road environments.
---
## 🧠 Model Architecture
| Component | Detail |
|-----------------|-------------------------------------|
| Framework | HuggingFace Transformers |
| Model | SegFormer |
| Backbone | MiT-B2 (`nvidia/mit-b2`) |
| Parameters | 27,354,314 (all trainable) |
| Decoder | Lightweight MLP Head |
| Classes | 10 |
| Input Size | 512 Γ— 512 |
| GPU | NVIDIA A100-PCIE-40GB |
---
## πŸ—‚ Dataset Classes (10 Categories)
| Class ID | Raw Mask Value | Label |
|----------|---------------|---------------|
| 0 | 100 | Trees |
| 1 | 200 | Lush Bushes |
| 2 | 300 | Dry Grass |
| 3 | 500 | Dry Bushes |
| 4 | 550 | Ground Clutter|
| 5 | 600 | Flowers |
| 6 | 700 | Logs |
| 7 | 800 | Rocks |
| 8 | 7100 | Landscape |
| 9 | 10000 | Sky |
---
## πŸ“Š Dataset Statistics
| Split | Samples | Proportion |
|------------|---------|------------|
| Train | 2,142 | 75% |
| Validation | 286 | 10% |
| Test | 429 | 15% |
| **Total** | **2,857** | β€” |
- Image resolution: **960 Γ— 540** (RGB)
- Mask format: uint16 with raw class value encoding
- Total annotated instances: **16,951**
---
## 🎨 Augmentation Pipeline
11 augmentations specifically chosen for desert and off-road conditions:
| Augmentation | Purpose |
|----------------------|-----------------------------------------------------|
| Color Jitter | Handles varying sun angles and color temperatures |
| Gamma Change | Simulates over/under-exposed outdoor scenes |
| Gaussian Noise | Robustness to sensor noise in UGV cameras |
| Motion / Gaussian / Median Blur | Motion blur from vehicle movement |
| Random Shadows | Shadows from rocks, vegetation, terrain |
| Random Fog | Dust storms and atmospheric haze |
| Brightness/Contrast | Atmospheric and lighting variations |
| Texture Mixup | Prevents overfitting to specific terrain patterns |
| Horizontal Flip | Improves directional generalization |
| Shift / Scale / Rotate | Spatial robustness |
| Coarse Dropout | Simulates sensor occlusion |
---
## βš™οΈ Training Configuration
| Parameter | Value |
|--------------------|-------------|
| Epochs | 50 |
| Batch Size | 8 |
| Learning Rate | 6e-5 |
| Optimizer | AdamW |
| Warmup Steps | 500 |
| Weight Decay | 0.01 |
| FP16 | βœ… Enabled |
| Best Model Metric | mean_iou |
| Eval Strategy | Per epoch |
---
## πŸ“ˆ Evaluation Results
Evaluated on the **validation split** (286 images) using COCO-style mean IoU.
| Metric | Value |
|-----------------|--------|
| **Mean IoU** | **0.6529** |
| **Mean Accuracy** | **0.7592** |
### Per-Class IoU
| Class | IoU |
|----------------|--------|
| Trees | 0.8517 |
| Lush Bushes | 0.6990 |
| Dry Grass | 0.7007 |
| Dry Bushes | 0.4873 |
| Ground Clutter | 0.3647 |
| Flowers | 0.7246 |
| Logs | 0.5591 |
| Rocks | 0.4544 |
| Landscape | 0.7014 |
| Sky | 0.9860 |
**Best class:** Sky (0.9860) β€” large uniform regions
**Hardest class:** Ground Clutter (0.3647) β€” small, heterogeneous objects
---
## βš™οΈ Inference
```python
from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
from PIL import Image
import torch
import torch.nn.functional as F
# Load model
processor = SegformerImageProcessor.from_pretrained("PUSHPENDAR/desert-segformer")
model = SegformerForSemanticSegmentation.from_pretrained("PUSHPENDAR/desert-segformer")
model.eval()
# Load image
image = Image.open("desert_scene.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
# Predict
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits # (1, num_classes, H/4, W/4)
# Upsample to original size
upsampled = F.interpolate(
logits,
size=(image.height, image.width),
mode="bilinear",
align_corners=False
)
pred_mask = upsampled.argmax(dim=1)[0].numpy() # (H, W)
print("Predicted class map shape:", pred_mask.shape)
```
---
## πŸ“¦ Repository Files
| File / Folder | Description |
|--------------------------|------------------------------------------|
| `pytorch_model.bin` | Fine-tuned SegFormer weights |
| `config.json` | Model configuration |
| `preprocessor_config.json` | Image processor settings |
| `outputs/validation_metrics.json` | Saved evaluation metrics |
| `outputs/training_curves.png` | Loss and mIoU training curves |
| `outputs/test_predictions/` | Per-image prediction masks |
---
## πŸš€ Run Locally
```bash
git clone https://huggingface.co/PUSHPENDAR/desert-segformer
cd desert-segformer
pip install transformers torch pillow
python app.py
```
---
## πŸ“ Citation
If you use this model or dataset, please cite:
```bibtex
@misc{desert-segformer-2025,
title = {Desert Semantic Segmentation with SegFormer (MiT-B2)},
author = {Pushpendar Choudhary},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/PUSHPENDAR/desert-segformer}
}
```
---
## πŸ“„ License
Apache 2.0 β€” see [LICENSE](LICENSE) for details.