U-Net for Gastrointestinal Polyp Segmentation (Kvasir-SEG)
Model Description
This model is a U-Net convolutional neural network trained for binary segmentation of gastrointestinal polyps on the Kvasir-SEG dataset.
Architecture
The model follows the classic U-Net architecture with the following components:
- Encoder: 4 levels of DoubleConv blocks (Conv2d β BatchNorm β ReLU β Conv2d β BatchNorm β ReLU) followed by MaxPool2d, progressively increasing channels from 3 β 64 β 128 β 256 β 512 while halving spatial resolution.
- Bottleneck: A DoubleConv block at the lowest resolution (512 β 1024 channels, 16Γ16 spatial).
- Decoder: 4 levels of transposed convolutions (ConvTranspose2d) for upsampling, each followed by concatenation with the corresponding encoder skip connection and a DoubleConv block.
- Skip Connections: Feature maps from each encoder level are concatenated with the corresponding decoder level to preserve spatial information.
- Output: A 1Γ1 Conv2d reducing to 1 channel, producing a binary segmentation mask of shape [B, 1, H, W].
Loss Function
Training uses a combined BCE + Dice Loss:
- BCEWithLogitsLoss: Provides stable pixel-wise binary cross-entropy.
- Dice Loss: Directly optimizes overlap between prediction and ground truth, making it robust to the class imbalance present in this dataset (β85% background, β15% polyp).
Training
- Dataset: Angelou0516/kvasir-seg
- Train/Val/Test split: 800 / 100 / 100
- Image size: 256Γ256
- Batch size: 16
- Epochs: 20
- Optimizer: AdamW (lr=1e-4)
Results
| Metric | Score |
|---|---|
| IoU | 0.6895 |
| Accuracy | 0.9374 |
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support