SAM2 Kvasir Polyp Segmentation
A SAM2.1-hiera-base-plus model fine-tuned on the Kvasir-SEG dataset for gastrointestinal polyp segmentation in colonoscopy images.
Given a colonoscopy image and a bounding box prompt around a polyp, the model produces a pixel-level segmentation mask of the polyp.
What is SAM2?
SAM2 (Segment Anything Model 2) is Meta's next-generation segmentation foundation model. It can segment any object in an image given a prompt (bounding box, point, or mask). Fine-tuning SAM2 on domain-specific data like medical imaging significantly improves segmentation quality for specialized tasks.
Training approach: Only the mask decoder is trained (5.75% of parameters). The vision encoder and prompt encoder remain frozen, preserving SAM2's general visual understanding while adapting the mask prediction head to polyp morphology.
Training Details
| Parameter | Value |
|---|---|
| Base model | facebook/sam2.1-hiera-base-plus |
| Method | Mask decoder fine-tuning (encoders frozen) |
| Trainable parameters | 4.2M / 73.3M total (5.75%) |
| Loss function | DiceCE (Dice + Cross-Entropy from MONAI) |
| Dataset | kowndinya23/Kvasir-SEG |
| Training images | 880 |
| Validation images | 120 |
| Hardware | NVIDIA RTX 5090 (32GB VRAM) |
| Training time | ~12 minutes |
| Epochs | 30 |
| Effective batch size | 16 (8 per device x 2 gradient accumulation) |
| Learning rate | 1e-5 (cosine schedule, 20 warmup steps) |
| Precision | bf16 |
| Prompt type | Bounding box (derived from ground truth masks) |
| Framework | Transformers 5.3.0 + MONAI |
Training Curves
- Training Loss (DiceCE): Decreased from ~0.20 to ~0.13 over 30 epochs, showing clear improvement in segmentation quality
- Learning Rate: Cosine decay from 1e-5 to 0 with 20-step warmup
Dataset
Kvasir-SEG contains 1,000 gastrointestinal polyp images from colonoscopy procedures with corresponding pixel-level segmentation masks. The images were captured at Vestre Viken Health Trust in Norway and annotated by experienced gastroenterologists.
- Image resolution: Variable (332-1,350 pixels)
- Annotation quality: Expert gastroenterologist annotations verified by a medical doctor
- Polyp types: Various sizes, shapes, and appearances including flat, sessile, and pedunculated polyps
Usage
from transformers import AutoProcessor, Sam2Model
from PIL import Image
import torch
import numpy as np
# Load model and processor
model = Sam2Model.from_pretrained("usama10/sam2-kvasir-polyp-segmentation", dtype=torch.bfloat16)
processor = AutoProcessor.from_pretrained("usama10/sam2-kvasir-polyp-segmentation")
model.eval()
# Load a colonoscopy image
image = Image.open("colonoscopy.jpg").convert("RGB")
# Provide a bounding box prompt [x_min, y_min, x_max, y_max]
bbox = [[100, 50, 300, 250]] # Approximate polyp location
inputs = processor(images=image, input_boxes=[bbox], return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs, multimask_output=False)
# Get the predicted mask
pred_mask = outputs.pred_masks.squeeze().cpu().numpy()
binary_mask = (pred_mask > 0).astype(np.uint8)
Clinical Applications
This model can assist in:
- Polyp detection and delineation during colonoscopy review
- Computer-aided diagnosis (CAD) systems for colorectal cancer screening
- Training and education for endoscopy trainees
- Research on polyp morphology and classification
Limitations
- Trained on Kvasir-SEG only (1,000 images); performance on different endoscopy equipment or populations may vary
- Requires a bounding box prompt; does not perform automatic polyp detection
- The model is for research and educational purposes only and should NOT be used as the sole basis for clinical decisions
- Performance on very small or flat polyps may be limited due to dataset composition
- The 256x256 output mask resolution may lose fine boundary details for high-resolution inputs
- Downloads last month
- -
Model tree for usama10/sam2-kvasir-polyp-segmentation
Base model
facebook/sam2.1-hiera-base-plusDataset used to train usama10/sam2-kvasir-polyp-segmentation
Evaluation results
- Final Training DiceCE Loss on Kvasir-SEGself-reported0.130
