SAM2 Kvasir Polyp Segmentation

A SAM2.1-hiera-base-plus model fine-tuned on the Kvasir-SEG dataset for gastrointestinal polyp segmentation in colonoscopy images.

Given a colonoscopy image and a bounding box prompt around a polyp, the model produces a pixel-level segmentation mask of the polyp.

What is SAM2?

SAM2 (Segment Anything Model 2) is Meta's next-generation segmentation foundation model. It can segment any object in an image given a prompt (bounding box, point, or mask). Fine-tuning SAM2 on domain-specific data like medical imaging significantly improves segmentation quality for specialized tasks.

Training approach: Only the mask decoder is trained (5.75% of parameters). The vision encoder and prompt encoder remain frozen, preserving SAM2's general visual understanding while adapting the mask prediction head to polyp morphology.

Training Details

Parameter	Value
Base model	facebook/sam2.1-hiera-base-plus
Method	Mask decoder fine-tuning (encoders frozen)
Trainable parameters	4.2M / 73.3M total (5.75%)
Loss function	DiceCE (Dice + Cross-Entropy from MONAI)
Dataset	kowndinya23/Kvasir-SEG
Training images	880
Validation images	120
Hardware	NVIDIA RTX 5090 (32GB VRAM)
Training time	~12 minutes
Epochs	30
Effective batch size	16 (8 per device x 2 gradient accumulation)
Learning rate	1e-5 (cosine schedule, 20 warmup steps)
Precision	bf16
Prompt type	Bounding box (derived from ground truth masks)
Framework	Transformers 5.3.0 + MONAI

Training Curves

Training Loss (DiceCE): Decreased from ~0.20 to ~0.13 over 30 epochs, showing clear improvement in segmentation quality
Learning Rate: Cosine decay from 1e-5 to 0 with 20-step warmup

Dataset

Kvasir-SEG contains 1,000 gastrointestinal polyp images from colonoscopy procedures with corresponding pixel-level segmentation masks. The images were captured at Vestre Viken Health Trust in Norway and annotated by experienced gastroenterologists.

Image resolution: Variable (332-1,350 pixels)
Annotation quality: Expert gastroenterologist annotations verified by a medical doctor
Polyp types: Various sizes, shapes, and appearances including flat, sessile, and pedunculated polyps

Usage

from transformers import AutoProcessor, Sam2Model
from PIL import Image
import torch
import numpy as np

# Load model and processor
model = Sam2Model.from_pretrained("usama10/sam2-kvasir-polyp-segmentation", dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained("usama10/sam2-kvasir-polyp-segmentation")
model.eval()

# Load a colonoscopy image
image = Image.open("colonoscopy.jpg").convert("RGB")

# Provide a bounding box prompt [x_min, y_min, x_max, y_max]
bbox = [[100, 50, 300, 250]]  # Approximate polyp location

inputs = processor(images=image, input_boxes=[bbox], return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs, multimask_output=False)

# Get the predicted mask
pred_mask = outputs.pred_masks.squeeze().cpu().numpy()
binary_mask = (pred_mask > 0).astype(np.uint8)

Clinical Applications

This model can assist in:

Polyp detection and delineation during colonoscopy review
Computer-aided diagnosis (CAD) systems for colorectal cancer screening
Training and education for endoscopy trainees
Research on polyp morphology and classification

Limitations

Trained on Kvasir-SEG only (1,000 images); performance on different endoscopy equipment or populations may vary
Requires a bounding box prompt; does not perform automatic polyp detection
The model is for research and educational purposes only and should NOT be used as the sole basis for clinical decisions
Performance on very small or flat polyps may be limited due to dataset composition
The 256x256 output mask resolution may lose fine boundary details for high-resolution inputs

Downloads last month: -

Safetensors

Model size

73.3M params

Tensor type

BF16

Model tree for usama10/sam2-kvasir-polyp-segmentation

Base model

facebook/sam2.1-hiera-base-plus

Finetuned

(20)

this model

Dataset used to train usama10/sam2-kvasir-polyp-segmentation

Evaluation results

Final Training DiceCE Loss on Kvasir-SEG
self-reported

0.130