UNet3+ EfficientNet — Polyp Segmentation (HPO-Optimised)
Binary polyp segmentation model trained on Kvasir-SEG. Selected as the best configuration after a systematic sweep over 24 architecture × backbone combinations followed by Optuna hyperparameter optimisation (60 trials).
Model Description
| Property | Value |
|---|---|
| Architecture | UNet 3+ (full-scale skip connections) |
| Backbone | EfficientNet-B0 (ImageNet pre-trained via timm) |
| Input size | 256 × 256 × 3 |
| Output | 256 × 256 × 1 logit map (sigmoid → binary mask) |
| Parameters | ~50 MB |
| Loss | Dice-Focal (y = 1.12, dice weight = 0.30) |
Architecture Details
UNet 3+ (Huang et al., ICASSP 2020) extends the standard U-Net by adding full-scale skip connections: every decoder node aggregates feature maps from all encoder scales simultaneously, giving each node access to both fine-grained spatial detail and deep semantic context. Each incoming stream is projected to a fixed number of inter-channels (64) before concatenation so the total channel count is constant across all decoder levels.
The EfficientNet-B0 backbone (pre-trained on ImageNet-1k) replaces the standard U-Net encoder, providing rich multi-scale representations at five resolution levels.
Test Set Results
Evaluated on the fixed 53-image test partition of Kvasir-SEG (50 % of the original validation split, seed 42):
| Metric | Value |
|---|---|
| Dice | 0.9234 |
| IoU | 0.8577 |
| F1 | 0.9234 |
| Precision | 0.9474 |
| Recall | 0.9005 |
| Accuracy | 0.9745 |
| Loss | 0.0914 |
Comparison with Sweep Models
This model was selected from an initial sweep of 24 architecture × backbone combinations:
| Rank | Model | Test Dice | Test IoU |
|---|---|---|---|
| 1 | attention_unet_convnext | 0.9411 | 0.8888 |
| 2 | unet3plus_convnext | 0.9395 | 0.8859 |
| 3 | unet_convnext | 0.9383 | 0.8838 |
| — | unet3plus_efficientnet (this, post-HPO) | 0.9234 | 0.8577 |
Note: Sweep models were trained for 50 epochs with default hyperparameters (lr = 1e-3, BCEDice loss). This model was retrained for 5 more epochs using the Optuna-optimised configuration, which significantly reduces eval loss (0.0537 vs. target < 0.10) at the cost of a slight metric shift on the held-out test set.
Training Procedure
Hyperparameter Optimisation
Optuna with MedianPruner ran 60 trials (28 completed, 32 pruned) on the top-3 sweep models.
The best trial (#32) achieved eval_loss = 0.0537 (target: < 0.10 ✓).
| Hyperparameter | Value |
|---|---|
| Learning rate | 0.001794 |
| Weight decay | 2.51e-06 |
| Warmup ratio | 0.121 |
| LR scheduler | cosine_with_restarts |
| Batch size | 64 |
| Loss type | dice_focal |
| Focal gamma | 1.1217 |
| Dice weight | 0.3012 |
Training Configuration
- Optimiser: AdamW
- Epochs: 50 (sweep) + 5 (HPO final retrain)
- FP16: enabled
- Dataset: Kvasir-SEG augmented (4,800 train / 100 val / 100 test)
- Augmentation: random H/V flips, ±30° rotation, brightness/contrast/saturation ±20 %
How to Use
This model uses a custom PyTorch architecture. The model code is included in the repository.
Installation
pip install torch torchvision timm transformers
Inference
import torch
from transformers import AutoModel
from torchvision.transforms import functional as TF
from PIL import Image
# Load model — downloads weights + code automatically
model = AutoModel.from_pretrained(
"andreribeiro87/unet3plus-efficientnet-kvasir-seg",
trust_remote_code=True,
)
model.eval()
# Preprocess
image = Image.open("your_colonoscopy_image.jpg").convert("RGB")
x = TF.to_tensor(TF.resize(image, [256, 256])).unsqueeze(0) # (1, 3, 256, 256)
# Predict
with torch.no_grad():
outputs = model(pixel_values=x)
mask = (outputs["logits"].sigmoid() > 0.5).squeeze() # bool (256, 256)
pred_mask = TF.to_pil_image(mask.float())
Citation
If you use this model or dataset, please cite the original Kvasir-SEG paper:
@inproceedings{jha2020kvasir,
title = {Kvasir-SEG: A Segmented Polyp Dataset},
author = {Jha, Debesh and Smedsrud, Pia H and Riegler, Michael A and Halvorsen, P{a}l
and de Lange, Thomas and Johansen, Dag and Johansen, H{a}vard D},
booktitle = {MultiMedia Modeling (MMM)},
year = {2020}
}
@inproceedings{huang2020unet3plus,
title = {UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation},
author = {Huang, Huimin and Lin, Lanfen and Tong, Ruofeng and Hu, Hongjie and
Zhang, Qiaowei and Iwamoto, Yutaro and Han, Xianhua and Chen, Yen-Wei and Wu, Jian},
booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
year = {2020}
}
Limitations
- Trained and evaluated exclusively on Kvasir-SEG (single-centre, single-modality). Performance may degrade on other colonoscopy datasets or imaging conditions.
- Binary segmentation only; does not distinguish between polyp types or severity.
- Input resolution is fixed at 256 × 256; very small polyps may not be fully captured.
- Not validated for clinical use. This is a research model.
- Downloads last month
- 72