U-Net for Gastrointestinal Polyp Segmentation (Kvasir-SEG)

Model Description

This model is a U-Net convolutional neural network trained for binary segmentation of gastrointestinal polyps on the Kvasir-SEG dataset.

Architecture

The model follows the classic U-Net architecture with the following components:

  • Encoder: 4 levels of DoubleConv blocks (Conv2d β†’ BatchNorm β†’ ReLU β†’ Conv2d β†’ BatchNorm β†’ ReLU) followed by MaxPool2d, progressively increasing channels from 3 β†’ 64 β†’ 128 β†’ 256 β†’ 512 while halving spatial resolution.
  • Bottleneck: A DoubleConv block at the lowest resolution (512 β†’ 1024 channels, 16Γ—16 spatial).
  • Decoder: 4 levels of transposed convolutions (ConvTranspose2d) for upsampling, each followed by concatenation with the corresponding encoder skip connection and a DoubleConv block.
  • Skip Connections: Feature maps from each encoder level are concatenated with the corresponding decoder level to preserve spatial information.
  • Output: A 1Γ—1 Conv2d reducing to 1 channel, producing a binary segmentation mask of shape [B, 1, H, W].

Loss Function

Training uses a combined BCE + Dice Loss:

  • BCEWithLogitsLoss: Provides stable pixel-wise binary cross-entropy.
  • Dice Loss: Directly optimizes overlap between prediction and ground truth, making it robust to the class imbalance present in this dataset (β‰ˆ85% background, β‰ˆ15% polyp).

Training

  • Dataset: Angelou0516/kvasir-seg
  • Train/Val/Test split: 800 / 100 / 100
  • Image size: 256Γ—256
  • Batch size: 16
  • Epochs: 20
  • Optimizer: AdamW (lr=1e-4)

Results

Metric Score
IoU 0.6895
Accuracy 0.9374
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train rcd12/unet-kvasir