Update README.md

bcc743d verified 2 months ago

6.64 kB

	---
	title: Desert Semantic Segmentation Demo
	emoji: 🌵
	colorFrom: yellow
	colorTo: orange
	sdk: gradio
	sdk_version: 4.44.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	tags:
	- semantic-segmentation
	- segformer
	- transformers
	- desert
	- ugv
	- offroad
	datasets:
	- Offroad_Segmentation_Training_Dataset
	metrics:
	- mean_iou
	---

	# 🌵 Desert Semantic Segmentation using SegFormer (MiT-B2)

	A SegFormer transformer model fine-tuned on the Offroad Segmentation Training Dataset for 10-class semantic segmentation of desert terrain — built for UGV (Unmanned Ground Vehicle) autonomous navigation in off-road environments.



	---

	## 🧠 Model Architecture

	\| Component \| Detail \|
	\|-----------------\|-------------------------------------\|
	\| Framework \| HuggingFace Transformers \|
	\| Model \| SegFormer \|
	\| Backbone \| MiT-B2 (`nvidia/mit-b2`) \|
	\| Parameters \| 27,354,314 (all trainable) \|
	\| Decoder \| Lightweight MLP Head \|
	\| Classes \| 10 \|
	\| Input Size \| 512 × 512 \|
	\| GPU \| NVIDIA A100-PCIE-40GB \|

	---

	## 🗂 Dataset Classes (10 Categories)

	\| Class ID \| Raw Mask Value \| Label \|
	\|----------\|---------------\|---------------\|
	\| 0 \| 100 \| Trees \|
	\| 1 \| 200 \| Lush Bushes \|
	\| 2 \| 300 \| Dry Grass \|
	\| 3 \| 500 \| Dry Bushes \|
	\| 4 \| 550 \| Ground Clutter\|
	\| 5 \| 600 \| Flowers \|
	\| 6 \| 700 \| Logs \|
	\| 7 \| 800 \| Rocks \|
	\| 8 \| 7100 \| Landscape \|
	\| 9 \| 10000 \| Sky \|

	---

	## 📊 Dataset Statistics

	\| Split \| Samples \| Proportion \|
	\|------------\|---------\|------------\|
	\| Train \| 2,142 \| 75% \|
	\| Validation \| 286 \| 10% \|
	\| Test \| 429 \| 15% \|
	\| Total \| 2,857 \| — \|

	- Image resolution: 960 × 540 (RGB)
	- Mask format: uint16 with raw class value encoding
	- Total annotated instances: 16,951

	---

	## 🎨 Augmentation Pipeline

	11 augmentations specifically chosen for desert and off-road conditions:

	\| Augmentation \| Purpose \|
	\|----------------------\|-----------------------------------------------------\|
	\| Color Jitter \| Handles varying sun angles and color temperatures \|
	\| Gamma Change \| Simulates over/under-exposed outdoor scenes \|
	\| Gaussian Noise \| Robustness to sensor noise in UGV cameras \|
	\| Motion / Gaussian / Median Blur \| Motion blur from vehicle movement \|
	\| Random Shadows \| Shadows from rocks, vegetation, terrain \|
	\| Random Fog \| Dust storms and atmospheric haze \|
	\| Brightness/Contrast \| Atmospheric and lighting variations \|
	\| Texture Mixup \| Prevents overfitting to specific terrain patterns \|
	\| Horizontal Flip \| Improves directional generalization \|
	\| Shift / Scale / Rotate \| Spatial robustness \|
	\| Coarse Dropout \| Simulates sensor occlusion \|

	---

	## ⚙️ Training Configuration

	\| Parameter \| Value \|
	\|--------------------\|-------------\|
	\| Epochs \| 50 \|
	\| Batch Size \| 8 \|
	\| Learning Rate \| 6e-5 \|
	\| Optimizer \| AdamW \|
	\| Warmup Steps \| 500 \|
	\| Weight Decay \| 0.01 \|
	\| FP16 \| ✅ Enabled \|
	\| Best Model Metric \| mean_iou \|
	\| Eval Strategy \| Per epoch \|

	---

	## 📈 Evaluation Results

	Evaluated on the validation split (286 images) using COCO-style mean IoU.

	\| Metric \| Value \|
	\|-----------------\|--------\|
	\| Mean IoU \| 0.6529 \|
	\| Mean Accuracy \| 0.7592 \|

	### Per-Class IoU

	\| Class \| IoU \|
	\|----------------\|--------\|
	\| Trees \| 0.8517 \|
	\| Lush Bushes \| 0.6990 \|
	\| Dry Grass \| 0.7007 \|
	\| Dry Bushes \| 0.4873 \|
	\| Ground Clutter \| 0.3647 \|
	\| Flowers \| 0.7246 \|
	\| Logs \| 0.5591 \|
	\| Rocks \| 0.4544 \|
	\| Landscape \| 0.7014 \|
	\| Sky \| 0.9860 \|

	Best class: Sky (0.9860) — large uniform regions
	Hardest class: Ground Clutter (0.3647) — small, heterogeneous objects

	---

	## ⚙️ Inference
	```python
	from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation
	from PIL import Image
	import torch
	import torch.nn.functional as F

	# Load model
	processor = SegformerImageProcessor.from_pretrained("PUSHPENDAR/desert-segformer")
	model = SegformerForSemanticSegmentation.from_pretrained("PUSHPENDAR/desert-segformer")
	model.eval()

	# Load image
	image = Image.open("desert_scene.jpg").convert("RGB")
	inputs = processor(images=image, return_tensors="pt")

	# Predict
	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs.logits # (1, num_classes, H/4, W/4)

	# Upsample to original size
	upsampled = F.interpolate(
	logits,
	size=(image.height, image.width),
	mode="bilinear",
	align_corners=False
	)
	pred_mask = upsampled.argmax(dim=1)[0].numpy() # (H, W)
	print("Predicted class map shape:", pred_mask.shape)
	```

	---

	## 📦 Repository Files

	\| File / Folder \| Description \|
	\|--------------------------\|------------------------------------------\|
	\| `pytorch_model.bin` \| Fine-tuned SegFormer weights \|
	\| `config.json` \| Model configuration \|
	\| `preprocessor_config.json` \| Image processor settings \|
	\| `outputs/validation_metrics.json` \| Saved evaluation metrics \|
	\| `outputs/training_curves.png` \| Loss and mIoU training curves \|
	\| `outputs/test_predictions/` \| Per-image prediction masks \|

	---

	## 🚀 Run Locally
	```bash
	git clone https://huggingface.co/PUSHPENDAR/desert-segformer
	cd desert-segformer
	pip install transformers torch pillow
	python app.py
	```

	---

	## 📝 Citation

	If you use this model or dataset, please cite:
	```bibtex
	@misc{desert-segformer-2025,
	title = {Desert Semantic Segmentation with SegFormer (MiT-B2)},
	author = {Pushpendar Choudhary},
	year = {2025},
	publisher = {HuggingFace},
	url = {https://huggingface.co/PUSHPENDAR/desert-segformer}
	}
	```

	---

	## 📄 License

	Apache 2.0 — see [LICENSE](LICENSE) for details.