Model Card for segformer-b5-v0
This is the Segformer model finetuned on our Coco 2D segmentation dataset. The model used was nvidia/segformer-b5-finetuned-ade-640-640 model finetuned on the ADE20k dataset, then further finetuned on ours. There are 21 targets of the model consisting of various terrain classes, actor classes, static objects, structures, and vegetation. Only ambiguous pixels due to lighting or range are casted to "background".
Model description
SegFormer consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on semantic segmentation benchmarks such as ADE20K and Cityscapes. The hierarchical Transformer is first pre-trained on ImageNet-1k, after which a decode head is added and fine-tuned altogether on a downstream dataset.
Usage with π€ Transformers
Example is for a single image input
import torch
from PIL import Image
from transformers import AutoModelForSemanticSegmentation, AutoImageProcessor
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForSemanticSegmentation.from_pretrained('coco-robotics/segformer-b5-v0').to(device)
processor = AutoImageProcessor.from_pretrained('coco-robotics/segformer-b5-v0')
img_path = "C10001-front-1762634096254608896.png"
image = Image.open(img_path).convert("RGB")
encoded = processor(image, return_tensors="pt", do_resize=True)
pixel_values = encoded["pixel_values"][0].unsqueeze(0).to(device) # H x W x C --> B x H x W x C
outputs = model(pixel_values=pixel_values)
output_bit_mask = processor.post_process_semantic_segmentation(
outputs,
target_sizes=[image.size],
)[0]
# do stuff with the output_bit_mask
Metrics (on validation set of coco-robotics/segmentation-dataset-v1)
| Metric | Value |
|---|---|
| mean_iou | 0.5749067285989389 |
| mean_accuracy | 0.649778453005239 |
| overall_accuracy | 0.8936755042184825 |
| Category | IoU |
|---|---|
| ACTOR_ANIMAL | 0.7992203904075871 |
| ACTOR_PERSON | 0.7952227860718364 |
| ACTOR_VEHICLE | 0.8673941403278557 |
| EGO_OCCLUSION | 0.8896253100764285 |
| GROUND_CROSSWALK | 0.7991581712507427 |
| GROUND_CURB | 0.4918772991651623 |
| GROUND_NATURAL | 0.799774131348015 |
| GROUND_ROAD | 0.7629326806057111 |
| GROUND_ROAD_GUTTER | 0.6953155778174064 |
| GROUND_SIDEWALK | 0.8718150366431902 |
| OBJECT_FOD | 0.14045525476000736 |
| OBJECT_OBSTACLE | 0.5621751257369784 |
| OBJECT_TRAFFIC_LIGHT | 0.4482038658589933 |
| SKY | 0.885975774962994 |
| STATIC_STRUCTURE | 0.8475792316455327 |
| VEGETATION_FOD | 0.06403045420457139 |
| VEGETATION_OBSTACLE | 0.7773793410957646 |
| Category | Accuracy |
|---|---|
| ACTOR_ANIMAL | 0.8665253209968594 |
| ACTOR_PERSON | 0.9021704883448208 |
| ACTOR_VEHICLE | 0.9254748065745314 |
| EGO_OCCLUSION | 0.9474484788423406 |
| GROUND_CROSSWALK | 0.8841853347853721 |
| GROUND_CURB | 0.6587692194607684 |
| GROUND_NATURAL | 0.8957070537618846 |
| GROUND_ROAD | 0.8588586922202617 |
| GROUND_ROAD_GUTTER | 0.8331009942936073 |
| GROUND_SIDEWALK | 0.9799486065360883 |
| OBJECT_FOD | 0.21303769878642737 |
| OBJECT_OBSTACLE | 0.6691222211787283 |
| OBJECT_TRAFFIC_LIGHT | 0.549199754096623 |
| SKY | 0.9514087534867859 |
| STATIC_STRUCTURE | 0.9265264023797161 |
| VEGETATION_FOD | 0.06758384526697984 |
| VEGETATION_OBSTACLE | 0.8665013890929852 |
- Downloads last month
- 64

