UNet3+ EfficientNet — Polyp Segmentation (HPO-Optimised)

Binary polyp segmentation model trained on Kvasir-SEG. Selected as the best configuration after a systematic sweep over 24 architecture × backbone combinations followed by Optuna hyperparameter optimisation (60 trials).

Model Description

Property Value
Architecture UNet 3+ (full-scale skip connections)
Backbone EfficientNet-B0 (ImageNet pre-trained via timm)
Input size 256 × 256 × 3
Output 256 × 256 × 1 logit map (sigmoid → binary mask)
Parameters ~50 MB
Loss Dice-Focal (y = 1.12, dice weight = 0.30)

Architecture Details

UNet 3+ (Huang et al., ICASSP 2020) extends the standard U-Net by adding full-scale skip connections: every decoder node aggregates feature maps from all encoder scales simultaneously, giving each node access to both fine-grained spatial detail and deep semantic context. Each incoming stream is projected to a fixed number of inter-channels (64) before concatenation so the total channel count is constant across all decoder levels.

The EfficientNet-B0 backbone (pre-trained on ImageNet-1k) replaces the standard U-Net encoder, providing rich multi-scale representations at five resolution levels.

Test Set Results

Evaluated on the fixed 53-image test partition of Kvasir-SEG (50 % of the original validation split, seed 42):

Metric Value
Dice 0.9234
IoU 0.8577
F1 0.9234
Precision 0.9474
Recall 0.9005
Accuracy 0.9745
Loss 0.0914

Comparison with Sweep Models

This model was selected from an initial sweep of 24 architecture × backbone combinations:

Rank Model Test Dice Test IoU
1 attention_unet_convnext 0.9411 0.8888
2 unet3plus_convnext 0.9395 0.8859
3 unet_convnext 0.9383 0.8838
unet3plus_efficientnet (this, post-HPO) 0.9234 0.8577

Note: Sweep models were trained for 50 epochs with default hyperparameters (lr = 1e-3, BCEDice loss). This model was retrained for 5 more epochs using the Optuna-optimised configuration, which significantly reduces eval loss (0.0537 vs. target < 0.10) at the cost of a slight metric shift on the held-out test set.

Training Procedure

Hyperparameter Optimisation

Optuna with MedianPruner ran 60 trials (28 completed, 32 pruned) on the top-3 sweep models. The best trial (#32) achieved eval_loss = 0.0537 (target: < 0.10 ✓).

Hyperparameter Value
Learning rate 0.001794
Weight decay 2.51e-06
Warmup ratio 0.121
LR scheduler cosine_with_restarts
Batch size 64
Loss type dice_focal
Focal gamma 1.1217
Dice weight 0.3012

Training Configuration

  • Optimiser: AdamW
  • Epochs: 50 (sweep) + 5 (HPO final retrain)
  • FP16: enabled
  • Dataset: Kvasir-SEG augmented (4,800 train / 100 val / 100 test)
  • Augmentation: random H/V flips, ±30° rotation, brightness/contrast/saturation ±20 %

How to Use

This model uses a custom PyTorch architecture. The model code is included in the repository.

Installation

pip install torch torchvision timm transformers

Inference

import torch
from transformers import AutoModel
from torchvision.transforms import functional as TF
from PIL import Image

# Load model — downloads weights + code automatically
model = AutoModel.from_pretrained(
    "andreribeiro87/unet3plus-efficientnet-kvasir-seg",
    trust_remote_code=True,
)
model.eval()

# Preprocess
image = Image.open("your_colonoscopy_image.jpg").convert("RGB")
x = TF.to_tensor(TF.resize(image, [256, 256])).unsqueeze(0)  # (1, 3, 256, 256)

# Predict
with torch.no_grad():
    outputs = model(pixel_values=x)
    mask = (outputs["logits"].sigmoid() > 0.5).squeeze()  # bool (256, 256)

pred_mask = TF.to_pil_image(mask.float())

Citation

If you use this model or dataset, please cite the original Kvasir-SEG paper:

@inproceedings{jha2020kvasir,
  title     = {Kvasir-SEG: A Segmented Polyp Dataset},
  author    = {Jha, Debesh and Smedsrud, Pia H and Riegler, Michael A and Halvorsen, P{a}l
               and de Lange, Thomas and Johansen, Dag and Johansen, H{a}vard D},
  booktitle = {MultiMedia Modeling (MMM)},
  year      = {2020}
}
@inproceedings{huang2020unet3plus,
  title     = {UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation},
  author    = {Huang, Huimin and Lin, Lanfen and Tong, Ruofeng and Hu, Hongjie and
               Zhang, Qiaowei and Iwamoto, Yutaro and Han, Xianhua and Chen, Yen-Wei and Wu, Jian},
  booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year      = {2020}
}

Limitations

  • Trained and evaluated exclusively on Kvasir-SEG (single-centre, single-modality). Performance may degrade on other colonoscopy datasets or imaging conditions.
  • Binary segmentation only; does not distinguish between polyp types or severity.
  • Input resolution is fixed at 256 × 256; very small polyps may not be fully captured.
  • Not validated for clinical use. This is a research model.
Downloads last month
72
Safetensors
Model size
13M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train andreribeiro87/unet3plus-efficientnet-kvasir-seg