Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -1,56 +1,121 @@
|
|
| 1 |
---
|
| 2 |
-
library_name: transformers
|
| 3 |
license: apache-2.0
|
| 4 |
base_model: facebook/sam2.1-hiera-base-plus
|
| 5 |
tags:
|
| 6 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
model-index:
|
| 8 |
-
- name: sam2-kvasir-polyp-segmentation
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
-
|
| 13 |
-
should probably proofread and complete it, then remove this comment. -->
|
| 14 |
|
| 15 |
-
|
| 16 |
|
| 17 |
-
|
| 18 |
|
| 19 |
-
##
|
| 20 |
|
| 21 |
-
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
-
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
-
|
| 30 |
|
| 31 |
-
|
| 32 |
|
| 33 |
-
|
|
|
|
| 34 |
|
| 35 |
-
|
| 36 |
-
- learning_rate: 1e-05
|
| 37 |
-
- train_batch_size: 8
|
| 38 |
-
- eval_batch_size: 8
|
| 39 |
-
- seed: 42
|
| 40 |
-
- gradient_accumulation_steps: 2
|
| 41 |
-
- total_train_batch_size: 16
|
| 42 |
-
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
| 43 |
-
- lr_scheduler_type: cosine
|
| 44 |
-
- lr_scheduler_warmup_steps: 20
|
| 45 |
-
- num_epochs: 30
|
| 46 |
|
| 47 |
-
|
| 48 |
|
|
|
|
|
|
|
|
|
|
| 49 |
|
|
|
|
| 50 |
|
| 51 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
base_model: facebook/sam2.1-hiera-base-plus
|
| 4 |
tags:
|
| 5 |
+
- sam2
|
| 6 |
+
- segmentation
|
| 7 |
+
- medical-imaging
|
| 8 |
+
- polyp-detection
|
| 9 |
+
- gastrointestinal
|
| 10 |
+
- colonoscopy
|
| 11 |
+
datasets:
|
| 12 |
+
- kowndinya23/Kvasir-SEG
|
| 13 |
+
pipeline_tag: image-segmentation
|
| 14 |
model-index:
|
| 15 |
+
- name: sam2-kvasir-polyp-segmentation
|
| 16 |
+
results:
|
| 17 |
+
- task:
|
| 18 |
+
type: image-segmentation
|
| 19 |
+
name: Polyp Segmentation
|
| 20 |
+
dataset:
|
| 21 |
+
name: Kvasir-SEG
|
| 22 |
+
type: kowndinya23/Kvasir-SEG
|
| 23 |
+
metrics:
|
| 24 |
+
- type: loss
|
| 25 |
+
value: 0.13
|
| 26 |
+
name: Final Training DiceCE Loss
|
| 27 |
---
|
| 28 |
|
| 29 |
+
# SAM2 Kvasir Polyp Segmentation
|
|
|
|
| 30 |
|
| 31 |
+
A **SAM2.1-hiera-base-plus** model fine-tuned on the [Kvasir-SEG](https://huggingface.co/datasets/kowndinya23/Kvasir-SEG) dataset for **gastrointestinal polyp segmentation** in colonoscopy images.
|
| 32 |
|
| 33 |
+
Given a colonoscopy image and a bounding box prompt around a polyp, the model produces a pixel-level segmentation mask of the polyp.
|
| 34 |
|
| 35 |
+
## What is SAM2?
|
| 36 |
|
| 37 |
+
SAM2 (Segment Anything Model 2) is Meta's next-generation segmentation foundation model. It can segment any object in an image given a prompt (bounding box, point, or mask). Fine-tuning SAM2 on domain-specific data like medical imaging significantly improves segmentation quality for specialized tasks.
|
| 38 |
|
| 39 |
+
**Training approach:** Only the mask decoder is trained (5.75% of parameters). The vision encoder and prompt encoder remain frozen, preserving SAM2's general visual understanding while adapting the mask prediction head to polyp morphology.
|
| 40 |
|
| 41 |
+
## Training Details
|
| 42 |
|
| 43 |
+
| Parameter | Value |
|
| 44 |
+
|-----------|-------|
|
| 45 |
+
| **Base model** | [facebook/sam2.1-hiera-base-plus](https://huggingface.co/facebook/sam2.1-hiera-base-plus) |
|
| 46 |
+
| **Method** | Mask decoder fine-tuning (encoders frozen) |
|
| 47 |
+
| **Trainable parameters** | 4.2M / 73.3M total (5.75%) |
|
| 48 |
+
| **Loss function** | DiceCE (Dice + Cross-Entropy from MONAI) |
|
| 49 |
+
| **Dataset** | [kowndinya23/Kvasir-SEG](https://huggingface.co/datasets/kowndinya23/Kvasir-SEG) |
|
| 50 |
+
| **Training images** | 880 |
|
| 51 |
+
| **Validation images** | 120 |
|
| 52 |
+
| **Hardware** | NVIDIA RTX 5090 (32GB VRAM) |
|
| 53 |
+
| **Training time** | ~12 minutes |
|
| 54 |
+
| **Epochs** | 30 |
|
| 55 |
+
| **Effective batch size** | 16 (8 per device x 2 gradient accumulation) |
|
| 56 |
+
| **Learning rate** | 1e-5 (cosine schedule, 20 warmup steps) |
|
| 57 |
+
| **Precision** | bf16 |
|
| 58 |
+
| **Prompt type** | Bounding box (derived from ground truth masks) |
|
| 59 |
+
| **Framework** | Transformers 5.3.0 + MONAI |
|
| 60 |
|
| 61 |
+
## Training Curves
|
| 62 |
|
| 63 |
+

|
| 64 |
|
| 65 |
+
- **Training Loss (DiceCE)**: Decreased from ~0.20 to ~0.13 over 30 epochs, showing clear improvement in segmentation quality
|
| 66 |
+
- **Learning Rate**: Cosine decay from 1e-5 to 0 with 20-step warmup
|
| 67 |
|
| 68 |
+
## Dataset
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
+
[Kvasir-SEG](https://huggingface.co/datasets/kowndinya23/Kvasir-SEG) contains 1,000 gastrointestinal polyp images from colonoscopy procedures with corresponding pixel-level segmentation masks. The images were captured at Vestre Viken Health Trust in Norway and annotated by experienced gastroenterologists.
|
| 71 |
|
| 72 |
+
- **Image resolution**: Variable (332-1,350 pixels)
|
| 73 |
+
- **Annotation quality**: Expert gastroenterologist annotations verified by a medical doctor
|
| 74 |
+
- **Polyp types**: Various sizes, shapes, and appearances including flat, sessile, and pedunculated polyps
|
| 75 |
|
| 76 |
+
## Usage
|
| 77 |
|
| 78 |
+
```python
|
| 79 |
+
from transformers import AutoProcessor, Sam2Model
|
| 80 |
+
from PIL import Image
|
| 81 |
+
import torch
|
| 82 |
+
import numpy as np
|
| 83 |
|
| 84 |
+
# Load model and processor
|
| 85 |
+
model = Sam2Model.from_pretrained("usama10/sam2-kvasir-polyp-segmentation", dtype=torch.bfloat16)
|
| 86 |
+
processor = AutoProcessor.from_pretrained("usama10/sam2-kvasir-polyp-segmentation")
|
| 87 |
+
model.eval()
|
| 88 |
+
|
| 89 |
+
# Load a colonoscopy image
|
| 90 |
+
image = Image.open("colonoscopy.jpg").convert("RGB")
|
| 91 |
+
|
| 92 |
+
# Provide a bounding box prompt [x_min, y_min, x_max, y_max]
|
| 93 |
+
bbox = [[100, 50, 300, 250]] # Approximate polyp location
|
| 94 |
+
|
| 95 |
+
inputs = processor(images=image, input_boxes=[bbox], return_tensors="pt")
|
| 96 |
+
inputs = {k: v.to(model.device) for k, v in inputs.items()}
|
| 97 |
+
|
| 98 |
+
with torch.no_grad():
|
| 99 |
+
outputs = model(**inputs, multimask_output=False)
|
| 100 |
+
|
| 101 |
+
# Get the predicted mask
|
| 102 |
+
pred_mask = outputs.pred_masks.squeeze().cpu().numpy()
|
| 103 |
+
binary_mask = (pred_mask > 0).astype(np.uint8)
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
## Clinical Applications
|
| 107 |
+
|
| 108 |
+
This model can assist in:
|
| 109 |
+
|
| 110 |
+
- **Polyp detection and delineation** during colonoscopy review
|
| 111 |
+
- **Computer-aided diagnosis (CAD)** systems for colorectal cancer screening
|
| 112 |
+
- **Training and education** for endoscopy trainees
|
| 113 |
+
- **Research** on polyp morphology and classification
|
| 114 |
+
|
| 115 |
+
## Limitations
|
| 116 |
+
|
| 117 |
+
- Trained on Kvasir-SEG only (1,000 images); performance on different endoscopy equipment or populations may vary
|
| 118 |
+
- Requires a bounding box prompt; does not perform automatic polyp detection
|
| 119 |
+
- The model is for research and educational purposes only and should NOT be used as the sole basis for clinical decisions
|
| 120 |
+
- Performance on very small or flat polyps may be limited due to dataset composition
|
| 121 |
+
- The 256x256 output mask resolution may lose fine boundary details for high-resolution inputs
|