MTerryJack's picture
Create README.md
806562c verified
metadata
tags:
  - element_type:detect
  - model:segformer-b0
  - object:person
manako:
  description: >-
    Semantic segmentation is a computer vision technique for assigning a label
    to each pixel in an image, representing the semantic class of the objects or
    regions in the image. It's a form of dense prediction because it involves
    assigning a label to each pixel in an image, instead of just boxes around
    objects or key points as in object detection or instance segmentation. The
    goal of semantic segmentation is to recognize and understand the objects and
    scenes in an image, and partition the image into segments corresponding to
    different entities.
  source: https://huggingface.co/s3nh/SegFormer-b0-person-segmentation
  input_payload:
    - name: frame
      type: image
      description: RGB frame
  output_payload:
    - name: detections
      type: detections
      description: List of detections
  evaluation_score: 0.45983522025589574

Description

Semantic segmentation is a computer vision technique for assigning a label to each pixel in an image, representing the semantic class of the objects or regions in the image. It's a form of dense prediction because it involves assigning a label to each pixel in an image, instead of just boxes around objects or key points as in object detection or instance segmentation. The goal of semantic segmentation is to recognize and understand the objects and scenes in an image, and partition the image into segments corresponding to different entities.

Parameters

model = SegformerForSemanticSegmentation.from_pretrained("/notebooks/segformer_5_epoch",
                                                         num_labels=2, 
                                                         id2label=id2label, 
                                                         label2id=label2id, )

Usage


from torch import nn
import numpy as np
import matplotlib.pyplot as plt

# Transforms
_transform = A.Compose([
    A.Resize(height = 512, width=512), 
    ToTensorV2(), 
])


trans_image = _transform(image=np.array(image))
outputs = model(trans_image['image'].float().unsqueeze(0))
logits = outputs.logits.cpu()
print(logits.shape)


# First, rescale logits to original image size
upsampled_logits = nn.functional.interpolate(logits,
                size=image.size[::-1], # (height, width)
                mode='bilinear',
                align_corners=False)


seg = upsampled_logits.argmax(dim=1)[0]
color_seg = np.zeros((seg.shape[0], seg.shape[1], 3), dtype=np.uint8) # height, width, 3
palette = np.array([[0, 0, 0],[255, 255, 255]])
for label, color in enumerate(palette):
    color_seg[seg == label, :] = color
# Convert to BGR
color_seg = color_seg[..., ::-1]