usama10
/

sam2-kvasir-polyp-segmentation

@@ -1,56 +1,121 @@
 ---
-library_name: transformers
 license: apache-2.0
 base_model: facebook/sam2.1-hiera-base-plus
 tags:
-- generated_from_trainer
 model-index:
-- name: sam2-kvasir-polyp-segmentation
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# sam2-kvasir-polyp-segmentation
-This model is a fine-tuned version of [facebook/sam2.1-hiera-base-plus](https://huggingface.co/facebook/sam2.1-hiera-base-plus) on an unknown dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 1e-05
-- train_batch_size: 8
-- eval_batch_size: 8
-- seed: 42
-- gradient_accumulation_steps: 2
-- total_train_batch_size: 16
-- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_steps: 20
-- num_epochs: 30
-### Training results
-### Framework versions
-- Transformers 5.3.0
-- Pytorch 2.10.0+cu128
-- Datasets 4.8.3
-- Tokenizers 0.22.2

 ---
 license: apache-2.0
 base_model: facebook/sam2.1-hiera-base-plus
 tags:
+  - sam2
+  - segmentation
+  - medical-imaging
+  - polyp-detection
+  - gastrointestinal
+  - colonoscopy
+datasets:
+  - kowndinya23/Kvasir-SEG
+pipeline_tag: image-segmentation
 model-index:
+  - name: sam2-kvasir-polyp-segmentation
+    results:
+      - task:
+          type: image-segmentation
+          name: Polyp Segmentation
+        dataset:
+          name: Kvasir-SEG
+          type: kowndinya23/Kvasir-SEG
+        metrics:
+          - type: loss
+            value: 0.13
+            name: Final Training DiceCE Loss
 ---
+# SAM2 Kvasir Polyp Segmentation
+A **SAM2.1-hiera-base-plus** model fine-tuned on the [Kvasir-SEG](https://huggingface.co/datasets/kowndinya23/Kvasir-SEG) dataset for **gastrointestinal polyp segmentation** in colonoscopy images.
+Given a colonoscopy image and a bounding box prompt around a polyp, the model produces a pixel-level segmentation mask of the polyp.
+## What is SAM2?
+SAM2 (Segment Anything Model 2) is Meta's next-generation segmentation foundation model. It can segment any object in an image given a prompt (bounding box, point, or mask). Fine-tuning SAM2 on domain-specific data like medical imaging significantly improves segmentation quality for specialized tasks.
+**Training approach:** Only the mask decoder is trained (5.75% of parameters). The vision encoder and prompt encoder remain frozen, preserving SAM2's general visual understanding while adapting the mask prediction head to polyp morphology.
+## Training Details
+| Parameter | Value |
+|-----------|-------|
+| **Base model** | [facebook/sam2.1-hiera-base-plus](https://huggingface.co/facebook/sam2.1-hiera-base-plus) |
+| **Method** | Mask decoder fine-tuning (encoders frozen) |
+| **Trainable parameters** | 4.2M / 73.3M total (5.75%) |
+| **Loss function** | DiceCE (Dice + Cross-Entropy from MONAI) |
+| **Dataset** | [kowndinya23/Kvasir-SEG](https://huggingface.co/datasets/kowndinya23/Kvasir-SEG) |
+| **Training images** | 880 |
+| **Validation images** | 120 |
+| **Hardware** | NVIDIA RTX 5090 (32GB VRAM) |
+| **Training time** | ~12 minutes |
+| **Epochs** | 30 |
+| **Effective batch size** | 16 (8 per device x 2 gradient accumulation) |
+| **Learning rate** | 1e-5 (cosine schedule, 20 warmup steps) |
+| **Precision** | bf16 |
+| **Prompt type** | Bounding box (derived from ground truth masks) |
+| **Framework** | Transformers 5.3.0 + MONAI |
+## Training Curves
+![Training Metrics](sam2_training_metrics_plots.png)
+- **Training Loss (DiceCE)**: Decreased from ~0.20 to ~0.13 over 30 epochs, showing clear improvement in segmentation quality
+- **Learning Rate**: Cosine decay from 1e-5 to 0 with 20-step warmup
+## Dataset
+[Kvasir-SEG](https://huggingface.co/datasets/kowndinya23/Kvasir-SEG) contains 1,000 gastrointestinal polyp images from colonoscopy procedures with corresponding pixel-level segmentation masks. The images were captured at Vestre Viken Health Trust in Norway and annotated by experienced gastroenterologists.
+- **Image resolution**: Variable (332-1,350 pixels)
+- **Annotation quality**: Expert gastroenterologist annotations verified by a medical doctor
+- **Polyp types**: Various sizes, shapes, and appearances including flat, sessile, and pedunculated polyps
+## Usage
+```python
+from transformers import AutoProcessor, Sam2Model
+from PIL import Image
+import torch
+import numpy as np
+# Load model and processor
+model = Sam2Model.from_pretrained("usama10/sam2-kvasir-polyp-segmentation", dtype=torch.bfloat16)
+processor = AutoProcessor.from_pretrained("usama10/sam2-kvasir-polyp-segmentation")
+model.eval()
+# Load a colonoscopy image
+image = Image.open("colonoscopy.jpg").convert("RGB")
+# Provide a bounding box prompt [x_min, y_min, x_max, y_max]
+bbox = [[100, 50, 300, 250]]  # Approximate polyp location
+inputs = processor(images=image, input_boxes=[bbox], return_tensors="pt")
+inputs = {k: v.to(model.device) for k, v in inputs.items()}
+with torch.no_grad():
+    outputs = model(**inputs, multimask_output=False)
+# Get the predicted mask
+pred_mask = outputs.pred_masks.squeeze().cpu().numpy()
+binary_mask = (pred_mask > 0).astype(np.uint8)
+```
+## Clinical Applications
+This model can assist in:
+- **Polyp detection and delineation** during colonoscopy review
+- **Computer-aided diagnosis (CAD)** systems for colorectal cancer screening
+- **Training and education** for endoscopy trainees
+- **Research** on polyp morphology and classification
+## Limitations
+- Trained on Kvasir-SEG only (1,000 images); performance on different endoscopy equipment or populations may vary
+- Requires a bounding box prompt; does not perform automatic polyp detection
+- The model is for research and educational purposes only and should NOT be used as the sole basis for clinical decisions
+- Performance on very small or flat polyps may be limited due to dataset composition
+- The 256x256 output mask resolution may lose fine boundary details for high-resolution inputs