U-Net for Polyp Segmentation
Model Overview
This repository contains a U-Net convolutional neural network trained for binary medical image segmentation of gastrointestinal polyps.
The model predicts segmentation masks that identify polyp regions in endoscopic images. It was implemented in PyTorch and trained using the Kvasir-SEG dataset.
Model Architecture
The model is based on the U-Net architecture, a convolutional neural network designed for biomedical image segmentation.
The network follows a encoder–decoder structure with skip connections.
Encoder (Contracting Path)
The encoder extracts hierarchical image features through repeated convolution blocks.
Each block contains:
- Two 3×3 convolution layers
- Batch Normalization
- ReLU activation
- 2×2 MaxPooling for downsampling
Feature channels increase as depth increases:
64 → 128 → 256 → 512 → 1024
The deepest layer acts as the bottleneck, capturing high-level semantic information.
Decoder (Expanding Path)
The decoder reconstructs the segmentation mask while recovering spatial information lost during downsampling.
Each decoder stage performs:
- Transposed convolution (upsampling)
- Concatenation with encoder features via skip connections
- Double convolution blocks
Channel sizes decrease as spatial resolution increases:
512 → 256 → 128 → 64
The final layer is a 1×1 convolution producing a single-channel segmentation mask.
Dataset
Training was performed using the Kvasir-SEG dataset, which contains endoscopic images with manually annotated polyp masks.
Dataset characteristics:
- RGB gastrointestinal endoscopy images
- Pixel-level segmentation masks
- Binary segmentation task (polyp vs background)
Dataset split:
- 80% training
- 20% validation
Data Preprocessing
Before training, images and masks undergo the following preprocessing steps:
- Resize to 256 × 256 pixels
- Convert to PyTorch tensors
- Normalize pixel values to [0,1]
Segmentation masks are converted to binary format using thresholding.
Loss Function
The model is trained using a combined loss function:
BCEWithLogitsLoss + Dice Loss
Binary Cross Entropy (BCE)
Binary Cross Entropy performs pixel-wise classification between foreground (polyp) and background.
It provides stable gradients and reliable convergence during training.
Dice Loss
Dice Loss measures the overlap between predicted masks and ground truth masks.
This loss is particularly useful in medical image segmentation, where foreground regions often occupy a small portion of the image.
Loss Combination Rationale
Combining BCE and Dice Loss allows the model to:
- Maintain stable training through pixel-wise supervision
- Directly optimize segmentation overlap
- Improve robustness against class imbalance
Evaluation Metrics
The model is evaluated using two segmentation metrics.
Dice Score
Measures similarity between predicted and ground truth masks.
Dice = (2 × Intersection) / (Prediction + Ground Truth)
Intersection over Union (IoU)
Measures the overlap between predicted and true regions relative to their union.
IoU = Intersection / Union
These metrics are widely used for benchmarking segmentation models in medical imaging.
Training Configuration
Training settings used:
- Framework: PyTorch
- Optimizer: Adam
- Learning rate: 1e-4
- Batch size: 8
- Number of epochs: 50
- Input resolution: 256 × 256
Model weights are stored using the safetensors format.
Usage
Example of loading the model from the Hugging Face Hub:
from model import UNet
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
model = UNet(in_channels=3, out_channels=1)
weights_path = hf_hub_download(
repo_id="AnaRMSuni/week4-caa-unet",
filename="model.safetensors"
)
model.load_state_dict(load_file(weights_path))
model.eval()
Limitations
- The dataset size is relatively small.
- Performance may degrade on images from different medical devices or domains.
- The model performs binary segmentation only.
Intended Use
This model is intended for:
- Research
- Educational purposes
- Experiments in medical image segmentation
It should not be used for clinical diagnosis or medical decision making.
Author
AnaRMSuni