SegFormer Semantic Segmentation (Road / Grass / Footpath / Water)

This repository contains a fine-tuned SegFormer model for semantic segmentation of outdoor scenes, specifically targeting:

background
footpath
grass
road
water

The model is designed for applications such as:

Autonomous navigation
Robotics perception
Scene understanding
Smart city / mapping solutions

Model Details

Architecture: SegFormer
Framework: Hugging Face Transformers
Input Size: 224 × 224
Task: Semantic Segmentation
Number of Classes: 5

Class Labels

ID	Label
0	background
1	footpath
2	grass
3	road
4	water

Quick Start

1. Install Dependencies

pip install transformers torch pillow

2. Run Inference

from transformers import SegformerForSemanticSegmentation, SegformerImageProcessor
from PIL import Image
import torch
import torch.nn.functional as F
import numpy as np

MODEL_ID = "Dinusharg/segformer_environment_1"
IMAGE_PATH = "input_image_path"

OUT_MASK = "segmented_mask.png"
OUT_OVERLAY = "segmented_overlay.png"

palette = {
    0: (0, 0, 0),        # background
    1: (255, 0, 0),      # footpath
    2: (0, 255, 0),      # grass
    3: (128, 128, 128),  # road
    4: (0, 0, 255),      # water
}

processor = SegformerImageProcessor.from_pretrained(MODEL_ID)
model = SegformerForSemanticSegmentation.from_pretrained(MODEL_ID)

image = Image.open(IMAGE_PATH).convert("RGB")
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

upsampled_logits = F.interpolate(
    outputs.logits,
    size=image.size[::-1],
    mode="bilinear",
    align_corners=False
)

pred = upsampled_logits.argmax(dim=1)[0].cpu().numpy()

print("Unique predicted classes:", sorted(set(pred.flatten().tolist())))
print("Labels:", model.config.id2label)


color_mask = np.zeros((pred.shape[0], pred.shape[1], 3), dtype=np.uint8)

for class_id, color in palette.items():
    color_mask[pred == class_id] = color

Image.fromarray(color_mask).save(OUT_MASK)
print(f"Saved mask: {OUT_MASK}")


ALPHA = 0.5

image_np = np.array(image).astype(np.float32)
mask_np = color_mask.astype(np.float32)

overlay = (image_np * (1 - ALPHA) + mask_np * ALPHA).clip(0, 255).astype(np.uint8)

Image.fromarray(overlay).save(OUT_OVERLAY)
print(f"Saved overlay: {OUT_OVERLAY}")