usama10 commited on
Commit
58f50b6
·
verified ·
1 Parent(s): 735c59f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +98 -33
README.md CHANGED
@@ -1,56 +1,121 @@
1
  ---
2
- library_name: transformers
3
  license: apache-2.0
4
  base_model: facebook/sam2.1-hiera-base-plus
5
  tags:
6
- - generated_from_trainer
 
 
 
 
 
 
 
 
7
  model-index:
8
- - name: sam2-kvasir-polyp-segmentation
9
- results: []
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
14
 
15
- # sam2-kvasir-polyp-segmentation
16
 
17
- This model is a fine-tuned version of [facebook/sam2.1-hiera-base-plus](https://huggingface.co/facebook/sam2.1-hiera-base-plus) on an unknown dataset.
18
 
19
- ## Model description
20
 
21
- More information needed
22
 
23
- ## Intended uses & limitations
24
 
25
- More information needed
26
 
27
- ## Training and evaluation data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
- More information needed
30
 
31
- ## Training procedure
32
 
33
- ### Training hyperparameters
 
34
 
35
- The following hyperparameters were used during training:
36
- - learning_rate: 1e-05
37
- - train_batch_size: 8
38
- - eval_batch_size: 8
39
- - seed: 42
40
- - gradient_accumulation_steps: 2
41
- - total_train_batch_size: 16
42
- - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
43
- - lr_scheduler_type: cosine
44
- - lr_scheduler_warmup_steps: 20
45
- - num_epochs: 30
46
 
47
- ### Training results
48
 
 
 
 
49
 
 
50
 
51
- ### Framework versions
 
 
 
 
52
 
53
- - Transformers 5.3.0
54
- - Pytorch 2.10.0+cu128
55
- - Datasets 4.8.3
56
- - Tokenizers 0.22.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  license: apache-2.0
3
  base_model: facebook/sam2.1-hiera-base-plus
4
  tags:
5
+ - sam2
6
+ - segmentation
7
+ - medical-imaging
8
+ - polyp-detection
9
+ - gastrointestinal
10
+ - colonoscopy
11
+ datasets:
12
+ - kowndinya23/Kvasir-SEG
13
+ pipeline_tag: image-segmentation
14
  model-index:
15
+ - name: sam2-kvasir-polyp-segmentation
16
+ results:
17
+ - task:
18
+ type: image-segmentation
19
+ name: Polyp Segmentation
20
+ dataset:
21
+ name: Kvasir-SEG
22
+ type: kowndinya23/Kvasir-SEG
23
+ metrics:
24
+ - type: loss
25
+ value: 0.13
26
+ name: Final Training DiceCE Loss
27
  ---
28
 
29
+ # SAM2 Kvasir Polyp Segmentation
 
30
 
31
+ A **SAM2.1-hiera-base-plus** model fine-tuned on the [Kvasir-SEG](https://huggingface.co/datasets/kowndinya23/Kvasir-SEG) dataset for **gastrointestinal polyp segmentation** in colonoscopy images.
32
 
33
+ Given a colonoscopy image and a bounding box prompt around a polyp, the model produces a pixel-level segmentation mask of the polyp.
34
 
35
+ ## What is SAM2?
36
 
37
+ SAM2 (Segment Anything Model 2) is Meta's next-generation segmentation foundation model. It can segment any object in an image given a prompt (bounding box, point, or mask). Fine-tuning SAM2 on domain-specific data like medical imaging significantly improves segmentation quality for specialized tasks.
38
 
39
+ **Training approach:** Only the mask decoder is trained (5.75% of parameters). The vision encoder and prompt encoder remain frozen, preserving SAM2's general visual understanding while adapting the mask prediction head to polyp morphology.
40
 
41
+ ## Training Details
42
 
43
+ | Parameter | Value |
44
+ |-----------|-------|
45
+ | **Base model** | [facebook/sam2.1-hiera-base-plus](https://huggingface.co/facebook/sam2.1-hiera-base-plus) |
46
+ | **Method** | Mask decoder fine-tuning (encoders frozen) |
47
+ | **Trainable parameters** | 4.2M / 73.3M total (5.75%) |
48
+ | **Loss function** | DiceCE (Dice + Cross-Entropy from MONAI) |
49
+ | **Dataset** | [kowndinya23/Kvasir-SEG](https://huggingface.co/datasets/kowndinya23/Kvasir-SEG) |
50
+ | **Training images** | 880 |
51
+ | **Validation images** | 120 |
52
+ | **Hardware** | NVIDIA RTX 5090 (32GB VRAM) |
53
+ | **Training time** | ~12 minutes |
54
+ | **Epochs** | 30 |
55
+ | **Effective batch size** | 16 (8 per device x 2 gradient accumulation) |
56
+ | **Learning rate** | 1e-5 (cosine schedule, 20 warmup steps) |
57
+ | **Precision** | bf16 |
58
+ | **Prompt type** | Bounding box (derived from ground truth masks) |
59
+ | **Framework** | Transformers 5.3.0 + MONAI |
60
 
61
+ ## Training Curves
62
 
63
+ ![Training Metrics](sam2_training_metrics_plots.png)
64
 
65
+ - **Training Loss (DiceCE)**: Decreased from ~0.20 to ~0.13 over 30 epochs, showing clear improvement in segmentation quality
66
+ - **Learning Rate**: Cosine decay from 1e-5 to 0 with 20-step warmup
67
 
68
+ ## Dataset
 
 
 
 
 
 
 
 
 
 
69
 
70
+ [Kvasir-SEG](https://huggingface.co/datasets/kowndinya23/Kvasir-SEG) contains 1,000 gastrointestinal polyp images from colonoscopy procedures with corresponding pixel-level segmentation masks. The images were captured at Vestre Viken Health Trust in Norway and annotated by experienced gastroenterologists.
71
 
72
+ - **Image resolution**: Variable (332-1,350 pixels)
73
+ - **Annotation quality**: Expert gastroenterologist annotations verified by a medical doctor
74
+ - **Polyp types**: Various sizes, shapes, and appearances including flat, sessile, and pedunculated polyps
75
 
76
+ ## Usage
77
 
78
+ ```python
79
+ from transformers import AutoProcessor, Sam2Model
80
+ from PIL import Image
81
+ import torch
82
+ import numpy as np
83
 
84
+ # Load model and processor
85
+ model = Sam2Model.from_pretrained("usama10/sam2-kvasir-polyp-segmentation", dtype=torch.bfloat16)
86
+ processor = AutoProcessor.from_pretrained("usama10/sam2-kvasir-polyp-segmentation")
87
+ model.eval()
88
+
89
+ # Load a colonoscopy image
90
+ image = Image.open("colonoscopy.jpg").convert("RGB")
91
+
92
+ # Provide a bounding box prompt [x_min, y_min, x_max, y_max]
93
+ bbox = [[100, 50, 300, 250]] # Approximate polyp location
94
+
95
+ inputs = processor(images=image, input_boxes=[bbox], return_tensors="pt")
96
+ inputs = {k: v.to(model.device) for k, v in inputs.items()}
97
+
98
+ with torch.no_grad():
99
+ outputs = model(**inputs, multimask_output=False)
100
+
101
+ # Get the predicted mask
102
+ pred_mask = outputs.pred_masks.squeeze().cpu().numpy()
103
+ binary_mask = (pred_mask > 0).astype(np.uint8)
104
+ ```
105
+
106
+ ## Clinical Applications
107
+
108
+ This model can assist in:
109
+
110
+ - **Polyp detection and delineation** during colonoscopy review
111
+ - **Computer-aided diagnosis (CAD)** systems for colorectal cancer screening
112
+ - **Training and education** for endoscopy trainees
113
+ - **Research** on polyp morphology and classification
114
+
115
+ ## Limitations
116
+
117
+ - Trained on Kvasir-SEG only (1,000 images); performance on different endoscopy equipment or populations may vary
118
+ - Requires a bounding box prompt; does not perform automatic polyp detection
119
+ - The model is for research and educational purposes only and should NOT be used as the sole basis for clinical decisions
120
+ - Performance on very small or flat polyps may be limited due to dataset composition
121
+ - The 256x256 output mask resolution may lose fine boundary details for high-resolution inputs