Instructions to use PUSHPENDAR/segformer-desert with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use PUSHPENDAR/segformer-desert with Transformers:
# Load model directly from transformers import AutoImageProcessor, SegformerForSemanticSegmentation processor = AutoImageProcessor.from_pretrained("PUSHPENDAR/segformer-desert") model = SegformerForSemanticSegmentation.from_pretrained("PUSHPENDAR/segformer-desert") - Notebooks
- Google Colab
- Kaggle
| title: Desert Semantic Segmentation Demo | |
| emoji: π΅ | |
| colorFrom: yellow | |
| colorTo: orange | |
| sdk: gradio | |
| sdk_version: 4.44.0 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| tags: | |
| - semantic-segmentation | |
| - segformer | |
| - transformers | |
| - desert | |
| - ugv | |
| - offroad | |
| datasets: | |
| - Offroad_Segmentation_Training_Dataset | |
| metrics: | |
| - mean_iou | |
| # π΅ Desert Semantic Segmentation using SegFormer (MiT-B2) | |
| A **SegFormer** transformer model fine-tuned on the Offroad Segmentation Training Dataset for 10-class semantic segmentation of desert terrain β built for UGV (Unmanned Ground Vehicle) autonomous navigation in off-road environments. | |
| --- | |
| ## π§ Model Architecture | |
| | Component | Detail | | |
| |-----------------|-------------------------------------| | |
| | Framework | HuggingFace Transformers | | |
| | Model | SegFormer | | |
| | Backbone | MiT-B2 (`nvidia/mit-b2`) | | |
| | Parameters | 27,354,314 (all trainable) | | |
| | Decoder | Lightweight MLP Head | | |
| | Classes | 10 | | |
| | Input Size | 512 Γ 512 | | |
| | GPU | NVIDIA A100-PCIE-40GB | | |
| --- | |
| ## π Dataset Classes (10 Categories) | |
| | Class ID | Raw Mask Value | Label | | |
| |----------|---------------|---------------| | |
| | 0 | 100 | Trees | | |
| | 1 | 200 | Lush Bushes | | |
| | 2 | 300 | Dry Grass | | |
| | 3 | 500 | Dry Bushes | | |
| | 4 | 550 | Ground Clutter| | |
| | 5 | 600 | Flowers | | |
| | 6 | 700 | Logs | | |
| | 7 | 800 | Rocks | | |
| | 8 | 7100 | Landscape | | |
| | 9 | 10000 | Sky | | |
| --- | |
| ## π Dataset Statistics | |
| | Split | Samples | Proportion | | |
| |------------|---------|------------| | |
| | Train | 2,142 | 75% | | |
| | Validation | 286 | 10% | | |
| | Test | 429 | 15% | | |
| | **Total** | **2,857** | β | | |
| - Image resolution: **960 Γ 540** (RGB) | |
| - Mask format: uint16 with raw class value encoding | |
| - Total annotated instances: **16,951** | |
| --- | |
| ## π¨ Augmentation Pipeline | |
| 11 augmentations specifically chosen for desert and off-road conditions: | |
| | Augmentation | Purpose | | |
| |----------------------|-----------------------------------------------------| | |
| | Color Jitter | Handles varying sun angles and color temperatures | | |
| | Gamma Change | Simulates over/under-exposed outdoor scenes | | |
| | Gaussian Noise | Robustness to sensor noise in UGV cameras | | |
| | Motion / Gaussian / Median Blur | Motion blur from vehicle movement | | |
| | Random Shadows | Shadows from rocks, vegetation, terrain | | |
| | Random Fog | Dust storms and atmospheric haze | | |
| | Brightness/Contrast | Atmospheric and lighting variations | | |
| | Texture Mixup | Prevents overfitting to specific terrain patterns | | |
| | Horizontal Flip | Improves directional generalization | | |
| | Shift / Scale / Rotate | Spatial robustness | | |
| | Coarse Dropout | Simulates sensor occlusion | | |
| --- | |
| ## βοΈ Training Configuration | |
| | Parameter | Value | | |
| |--------------------|-------------| | |
| | Epochs | 50 | | |
| | Batch Size | 8 | | |
| | Learning Rate | 6e-5 | | |
| | Optimizer | AdamW | | |
| | Warmup Steps | 500 | | |
| | Weight Decay | 0.01 | | |
| | FP16 | β Enabled | | |
| | Best Model Metric | mean_iou | | |
| | Eval Strategy | Per epoch | | |
| --- | |
| ## π Evaluation Results | |
| Evaluated on the **validation split** (286 images) using COCO-style mean IoU. | |
| | Metric | Value | | |
| |-----------------|--------| | |
| | **Mean IoU** | **0.6529** | | |
| | **Mean Accuracy** | **0.7592** | | |
| ### Per-Class IoU | |
| | Class | IoU | | |
| |----------------|--------| | |
| | Trees | 0.8517 | | |
| | Lush Bushes | 0.6990 | | |
| | Dry Grass | 0.7007 | | |
| | Dry Bushes | 0.4873 | | |
| | Ground Clutter | 0.3647 | | |
| | Flowers | 0.7246 | | |
| | Logs | 0.5591 | | |
| | Rocks | 0.4544 | | |
| | Landscape | 0.7014 | | |
| | Sky | 0.9860 | | |
| **Best class:** Sky (0.9860) β large uniform regions | |
| **Hardest class:** Ground Clutter (0.3647) β small, heterogeneous objects | |
| --- | |
| ## βοΈ Inference | |
| ```python | |
| from transformers import SegformerImageProcessor, SegformerForSemanticSegmentation | |
| from PIL import Image | |
| import torch | |
| import torch.nn.functional as F | |
| # Load model | |
| processor = SegformerImageProcessor.from_pretrained("PUSHPENDAR/desert-segformer") | |
| model = SegformerForSemanticSegmentation.from_pretrained("PUSHPENDAR/desert-segformer") | |
| model.eval() | |
| # Load image | |
| image = Image.open("desert_scene.jpg").convert("RGB") | |
| inputs = processor(images=image, return_tensors="pt") | |
| # Predict | |
| with torch.no_grad(): | |
| outputs = model(**inputs) | |
| logits = outputs.logits # (1, num_classes, H/4, W/4) | |
| # Upsample to original size | |
| upsampled = F.interpolate( | |
| logits, | |
| size=(image.height, image.width), | |
| mode="bilinear", | |
| align_corners=False | |
| ) | |
| pred_mask = upsampled.argmax(dim=1)[0].numpy() # (H, W) | |
| print("Predicted class map shape:", pred_mask.shape) | |
| ``` | |
| --- | |
| ## π¦ Repository Files | |
| | File / Folder | Description | | |
| |--------------------------|------------------------------------------| | |
| | `pytorch_model.bin` | Fine-tuned SegFormer weights | | |
| | `config.json` | Model configuration | | |
| | `preprocessor_config.json` | Image processor settings | | |
| | `outputs/validation_metrics.json` | Saved evaluation metrics | | |
| | `outputs/training_curves.png` | Loss and mIoU training curves | | |
| | `outputs/test_predictions/` | Per-image prediction masks | | |
| --- | |
| ## π Run Locally | |
| ```bash | |
| git clone https://huggingface.co/PUSHPENDAR/desert-segformer | |
| cd desert-segformer | |
| pip install transformers torch pillow | |
| python app.py | |
| ``` | |
| --- | |
| ## π Citation | |
| If you use this model or dataset, please cite: | |
| ```bibtex | |
| @misc{desert-segformer-2025, | |
| title = {Desert Semantic Segmentation with SegFormer (MiT-B2)}, | |
| author = {Pushpendar Choudhary}, | |
| year = {2025}, | |
| publisher = {HuggingFace}, | |
| url = {https://huggingface.co/PUSHPENDAR/desert-segformer} | |
| } | |
| ``` | |
| --- | |
| ## π License | |
| Apache 2.0 β see [LICENSE](LICENSE) for details. |