Transformers
Safetensors
segformer

Model Card for segformer-b5-v0

This is the Segformer model finetuned on our Coco 2D segmentation dataset. The model used was nvidia/segformer-b5-finetuned-ade-640-640 model finetuned on the ADE20k dataset, then further finetuned on ours. There are 21 targets of the model consisting of various terrain classes, actor classes, static objects, structures, and vegetation. Only ambiguous pixels due to lighting or range are casted to "background".

Model description

SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset.

Usage with πŸ€— Transformers

Example is for a single image input

import torch
from PIL import Image
from transformers import AutoModelForSemanticSegmentation, AutoImageProcessor

device = "cuda" if torch.cuda.is_available() else "cpu"

model = AutoModelForSemanticSegmentation.from_pretrained('coco-robotics/segformer-b5-v0').to(device)
processor = AutoImageProcessor.from_pretrained('coco-robotics/segformer-b5-v0')

img_path = "C10001-front-1762634096254608896.png"
image = Image.open(img_path).convert("RGB")

encoded = processor(image, return_tensors="pt", do_resize=True)
pixel_values = encoded["pixel_values"][0].unsqueeze(0).to(device) # H x W x C --> B x H x W x C
outputs = model(pixel_values=pixel_values)

output_bit_mask = processor.post_process_semantic_segmentation(
    outputs,
    target_sizes=[image.size],
)[0]

# do stuff with the output_bit_mask

Metrics (on validation set of coco-robotics/segmentation-dataset-v1)

Metric Value
mean_iou 0.5749067285989389
mean_accuracy 0.649778453005239
overall_accuracy 0.8936755042184825
Category IoU
ACTOR_ANIMAL 0.7992203904075871
ACTOR_PERSON 0.7952227860718364
ACTOR_VEHICLE 0.8673941403278557
EGO_OCCLUSION 0.8896253100764285
GROUND_CROSSWALK 0.7991581712507427
GROUND_CURB 0.4918772991651623
GROUND_NATURAL 0.799774131348015
GROUND_ROAD 0.7629326806057111
GROUND_ROAD_GUTTER 0.6953155778174064
GROUND_SIDEWALK 0.8718150366431902
OBJECT_FOD 0.14045525476000736
OBJECT_OBSTACLE 0.5621751257369784
OBJECT_TRAFFIC_LIGHT 0.4482038658589933
SKY 0.885975774962994
STATIC_STRUCTURE 0.8475792316455327
VEGETATION_FOD 0.06403045420457139
VEGETATION_OBSTACLE 0.7773793410957646
Category Accuracy
ACTOR_ANIMAL 0.8665253209968594
ACTOR_PERSON 0.9021704883448208
ACTOR_VEHICLE 0.9254748065745314
EGO_OCCLUSION 0.9474484788423406
GROUND_CROSSWALK 0.8841853347853721
GROUND_CURB 0.6587692194607684
GROUND_NATURAL 0.8957070537618846
GROUND_ROAD 0.8588586922202617
GROUND_ROAD_GUTTER 0.8331009942936073
GROUND_SIDEWALK 0.9799486065360883
OBJECT_FOD 0.21303769878642737
OBJECT_OBSTACLE 0.6691222211787283
OBJECT_TRAFFIC_LIGHT 0.549199754096623
SKY 0.9514087534867859
STATIC_STRUCTURE 0.9265264023797161
VEGETATION_FOD 0.06758384526697984
VEGETATION_OBSTACLE 0.8665013890929852

category_accuracy category_iou

Downloads last month
64
Safetensors
Model size
84.6M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support