U-Net for Polyp Segmentation

Model Overview

This repository contains a U-Net convolutional neural network trained for binary medical image segmentation of gastrointestinal polyps.

The model predicts segmentation masks that identify polyp regions in endoscopic images. It was implemented in PyTorch and trained using the Kvasir-SEG dataset.


Model Architecture

The model is based on the U-Net architecture, a convolutional neural network designed for biomedical image segmentation.

The network follows a encoder–decoder structure with skip connections.

Encoder (Contracting Path)

The encoder extracts hierarchical image features through repeated convolution blocks.

Each block contains:

  • Two 3×3 convolution layers
  • Batch Normalization
  • ReLU activation
  • 2×2 MaxPooling for downsampling

Feature channels increase as depth increases:

64 → 128 → 256 → 512 → 1024

The deepest layer acts as the bottleneck, capturing high-level semantic information.


Decoder (Expanding Path)

The decoder reconstructs the segmentation mask while recovering spatial information lost during downsampling.

Each decoder stage performs:

  • Transposed convolution (upsampling)
  • Concatenation with encoder features via skip connections
  • Double convolution blocks

Channel sizes decrease as spatial resolution increases:

512 → 256 → 128 → 64

The final layer is a 1×1 convolution producing a single-channel segmentation mask.


Dataset

Training was performed using the Kvasir-SEG dataset, which contains endoscopic images with manually annotated polyp masks.

Dataset characteristics:

  • RGB gastrointestinal endoscopy images
  • Pixel-level segmentation masks
  • Binary segmentation task (polyp vs background)

Dataset split:

  • 80% training
  • 20% validation

Data Preprocessing

Before training, images and masks undergo the following preprocessing steps:

  1. Resize to 256 × 256 pixels
  2. Convert to PyTorch tensors
  3. Normalize pixel values to [0,1]

Segmentation masks are converted to binary format using thresholding.


Loss Function

The model is trained using a combined loss function:

BCEWithLogitsLoss + Dice Loss

Binary Cross Entropy (BCE)

Binary Cross Entropy performs pixel-wise classification between foreground (polyp) and background.

It provides stable gradients and reliable convergence during training.

Dice Loss

Dice Loss measures the overlap between predicted masks and ground truth masks.

This loss is particularly useful in medical image segmentation, where foreground regions often occupy a small portion of the image.

Loss Combination Rationale

Combining BCE and Dice Loss allows the model to:

  • Maintain stable training through pixel-wise supervision
  • Directly optimize segmentation overlap
  • Improve robustness against class imbalance

Evaluation Metrics

The model is evaluated using two segmentation metrics.

Dice Score

Measures similarity between predicted and ground truth masks.

Dice = (2 × Intersection) / (Prediction + Ground Truth)

Intersection over Union (IoU)

Measures the overlap between predicted and true regions relative to their union.

IoU = Intersection / Union

These metrics are widely used for benchmarking segmentation models in medical imaging.


Training Configuration

Training settings used:

  • Framework: PyTorch
  • Optimizer: Adam
  • Learning rate: 1e-4
  • Batch size: 8
  • Number of epochs: 50
  • Input resolution: 256 × 256

Model weights are stored using the safetensors format.


Usage

Example of loading the model from the Hugging Face Hub:

from model import UNet
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

model = UNet(in_channels=3, out_channels=1)

weights_path = hf_hub_download(
    repo_id="AnaRMSuni/week4-caa-unet",
    filename="model.safetensors"
)

model.load_state_dict(load_file(weights_path))
model.eval()

Limitations

  • The dataset size is relatively small.
  • Performance may degrade on images from different medical devices or domains.
  • The model performs binary segmentation only.

Intended Use

This model is intended for:

  • Research
  • Educational purposes
  • Experiments in medical image segmentation

It should not be used for clinical diagnosis or medical decision making.


Author

AnaRMSuni

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support