SAM2 Kvasir Polyp Segmentation

A SAM2.1-hiera-base-plus model fine-tuned on the Kvasir-SEG dataset for gastrointestinal polyp segmentation in colonoscopy images.

Given a colonoscopy image and a bounding box prompt around a polyp, the model produces a pixel-level segmentation mask of the polyp.

What is SAM2?

SAM2 (Segment Anything Model 2) is Meta's next-generation segmentation foundation model. It can segment any object in an image given a prompt (bounding box, point, or mask). Fine-tuning SAM2 on domain-specific data like medical imaging significantly improves segmentation quality for specialized tasks.

Training approach: Only the mask decoder is trained (5.75% of parameters). The vision encoder and prompt encoder remain frozen, preserving SAM2's general visual understanding while adapting the mask prediction head to polyp morphology.

Training Details

Parameter Value
Base model facebook/sam2.1-hiera-base-plus
Method Mask decoder fine-tuning (encoders frozen)
Trainable parameters 4.2M / 73.3M total (5.75%)
Loss function DiceCE (Dice + Cross-Entropy from MONAI)
Dataset kowndinya23/Kvasir-SEG
Training images 880
Validation images 120
Hardware NVIDIA RTX 5090 (32GB VRAM)
Training time ~12 minutes
Epochs 30
Effective batch size 16 (8 per device x 2 gradient accumulation)
Learning rate 1e-5 (cosine schedule, 20 warmup steps)
Precision bf16
Prompt type Bounding box (derived from ground truth masks)
Framework Transformers 5.3.0 + MONAI

Training Curves

Training Metrics

  • Training Loss (DiceCE): Decreased from ~0.20 to ~0.13 over 30 epochs, showing clear improvement in segmentation quality
  • Learning Rate: Cosine decay from 1e-5 to 0 with 20-step warmup

Dataset

Kvasir-SEG contains 1,000 gastrointestinal polyp images from colonoscopy procedures with corresponding pixel-level segmentation masks. The images were captured at Vestre Viken Health Trust in Norway and annotated by experienced gastroenterologists.

  • Image resolution: Variable (332-1,350 pixels)
  • Annotation quality: Expert gastroenterologist annotations verified by a medical doctor
  • Polyp types: Various sizes, shapes, and appearances including flat, sessile, and pedunculated polyps

Usage

from transformers import AutoProcessor, Sam2Model
from PIL import Image
import torch
import numpy as np

# Load model and processor
model = Sam2Model.from_pretrained("usama10/sam2-kvasir-polyp-segmentation", dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained("usama10/sam2-kvasir-polyp-segmentation")
model.eval()

# Load a colonoscopy image
image = Image.open("colonoscopy.jpg").convert("RGB")

# Provide a bounding box prompt [x_min, y_min, x_max, y_max]
bbox = [[100, 50, 300, 250]]  # Approximate polyp location

inputs = processor(images=image, input_boxes=[bbox], return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs, multimask_output=False)

# Get the predicted mask
pred_mask = outputs.pred_masks.squeeze().cpu().numpy()
binary_mask = (pred_mask > 0).astype(np.uint8)

Clinical Applications

This model can assist in:

  • Polyp detection and delineation during colonoscopy review
  • Computer-aided diagnosis (CAD) systems for colorectal cancer screening
  • Training and education for endoscopy trainees
  • Research on polyp morphology and classification

Limitations

  • Trained on Kvasir-SEG only (1,000 images); performance on different endoscopy equipment or populations may vary
  • Requires a bounding box prompt; does not perform automatic polyp detection
  • The model is for research and educational purposes only and should NOT be used as the sole basis for clinical decisions
  • Performance on very small or flat polyps may be limited due to dataset composition
  • The 256x256 output mask resolution may lose fine boundary details for high-resolution inputs
Downloads last month
-
Safetensors
Model size
73.3M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for usama10/sam2-kvasir-polyp-segmentation

Finetuned
(2)
this model

Dataset used to train usama10/sam2-kvasir-polyp-segmentation

Evaluation results