UNet3+ EfficientNet — Polyp Segmentation (HPO-Optimised)

Binary polyp segmentation model trained on Kvasir-SEG. Selected as the best configuration after a systematic sweep over 24 architecture × backbone combinations followed by Optuna hyperparameter optimisation (60 trials).

Model Description

Property	Value
Architecture	UNet 3+ (full-scale skip connections)
Backbone	EfficientNet-B0 (ImageNet pre-trained via `timm`)
Input size	256 × 256 × 3
Output	256 × 256 × 1 logit map (sigmoid → binary mask)
Parameters	~50 MB
Loss	Dice-Focal (y = 1.12, dice weight = 0.30)

Architecture Details

UNet 3+ (Huang et al., ICASSP 2020) extends the standard U-Net by adding full-scale skip connections: every decoder node aggregates feature maps from all encoder scales simultaneously, giving each node access to both fine-grained spatial detail and deep semantic context. Each incoming stream is projected to a fixed number of inter-channels (64) before concatenation so the total channel count is constant across all decoder levels.

The EfficientNet-B0 backbone (pre-trained on ImageNet-1k) replaces the standard U-Net encoder, providing rich multi-scale representations at five resolution levels.

Test Set Results

Evaluated on the fixed 53-image test partition of Kvasir-SEG (50 % of the original validation split, seed 42):

Metric	Value
Dice	0.9234
IoU	0.8577
F1	0.9234
Precision	0.9474
Recall	0.9005
Accuracy	0.9745
Loss	0.0914

Comparison with Sweep Models

This model was selected from an initial sweep of 24 architecture × backbone combinations:

Rank	Model	Test Dice	Test IoU
1	attention_unet_convnext	0.9411	0.8888
2	unet3plus_convnext	0.9395	0.8859
3	unet_convnext	0.9383	0.8838
—	unet3plus_efficientnet (this, post-HPO)	0.9234	0.8577

Note: Sweep models were trained for 50 epochs with default hyperparameters (lr = 1e-3, BCEDice loss). This model was retrained for 5 more epochs using the Optuna-optimised configuration, which significantly reduces eval loss (0.0537 vs. target < 0.10) at the cost of a slight metric shift on the held-out test set.

Training Procedure

Hyperparameter Optimisation

Optuna with MedianPruner ran 60 trials (28 completed, 32 pruned) on the top-3 sweep models. The best trial (#32) achieved eval_loss = 0.0537 (target: < 0.10 ✓).

Hyperparameter	Value
Learning rate	0.001794
Weight decay	2.51e-06
Warmup ratio	0.121
LR scheduler	cosine_with_restarts
Batch size	64
Loss type	dice_focal
Focal gamma	1.1217
Dice weight	0.3012

Training Configuration

Optimiser: AdamW
Epochs: 50 (sweep) + 5 (HPO final retrain)
FP16: enabled
Dataset: Kvasir-SEG augmented (4,800 train / 100 val / 100 test)
Augmentation: random H/V flips, ±30° rotation, brightness/contrast/saturation ±20 %

How to Use

This model uses a custom PyTorch architecture. The model code is included in the repository.

Installation

pip install torch torchvision timm transformers

Inference

import torch
from transformers import AutoModel
from torchvision.transforms import functional as TF
from PIL import Image

# Load model — downloads weights + code automatically
model = AutoModel.from_pretrained(
    "andreribeiro87/unet3plus-efficientnet-kvasir-seg",
    trust_remote_code=True,
)
model.eval()

# Preprocess
image = Image.open("your_colonoscopy_image.jpg").convert("RGB")
x = TF.to_tensor(TF.resize(image, [256, 256])).unsqueeze(0)  # (1, 3, 256, 256)

# Predict
with torch.no_grad():
    outputs = model(pixel_values=x)
    mask = (outputs["logits"].sigmoid() > 0.5).squeeze()  # bool (256, 256)

pred_mask = TF.to_pil_image(mask.float())

Citation

If you use this model or dataset, please cite the original Kvasir-SEG paper:

@inproceedings{jha2020kvasir,
  title     = {Kvasir-SEG: A Segmented Polyp Dataset},
  author    = {Jha, Debesh and Smedsrud, Pia H and Riegler, Michael A and Halvorsen, P{a}l
               and de Lange, Thomas and Johansen, Dag and Johansen, H{a}vard D},
  booktitle = {MultiMedia Modeling (MMM)},
  year      = {2020}
}

@inproceedings{huang2020unet3plus,
  title     = {UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation},
  author    = {Huang, Huimin and Lin, Lanfen and Tong, Ruofeng and Hu, Hongjie and
               Zhang, Qiaowei and Iwamoto, Yutaro and Han, Xianhua and Chen, Yen-Wei and Wu, Jian},
  booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year      = {2020}
}

Limitations

Trained and evaluated exclusively on Kvasir-SEG (single-centre, single-modality). Performance may degrade on other colonoscopy datasets or imaging conditions.
Binary segmentation only; does not distinguish between polyp types or severity.
Input resolution is fixed at 256 × 256; very small polyps may not be fully captured.
Not validated for clinical use. This is a research model.

Downloads last month: 11

Safetensors

Model size

13M params

Tensor type

F32

andreribeiro87
/

unet3plus-efficientnet-kvasir-seg