license: mit
library_name: pytorch
pipeline_tag: image-segmentation
tags:
- medical-image-segmentation
- image-segmentation
- semantic-segmentation
- polyp-segmentation
- colonoscopy
- depth-estimation
- pseudo-depth
- real-time
- onnx
- pytorch
- arxiv:2605.16519
metrics:
- dice
- iou
- recall
DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy
DepthPolyp is a lightweight pseudo-depth guided model for real-time colonoscopic polyp segmentation. Given an RGB colonoscopy frame, it jointly predicts:
- a binary polyp segmentation probability map
- a pseudo-depth probability map for depth-aware structural guidance
The model uses a MiT-B0 encoder and lightweight fusion/gating modules to keep deployment cost low while improving robustness under blur, illumination changes, reflections, and other real-world colonoscopy degradations.
- Paper: arXiv:2605.16519
- Code: github.com/ReaganWu/DepthPolyp
- Demo: DepthPolyp-demo
- License: MIT
Model Details
| Item | Value |
|---|---|
| Model | DepthPolyp |
| Encoder | MiT-B0 |
| Input | RGB image, 224 x 224 |
| Outputs | segmentation, pseudo-depth |
| Parameters | 3.57M |
| Complexity | 0.86 GMACs |
| Training data | Kvasir-SEG with degradation-aware training |
| PyTorch checkpoint | DepthPolyp_Kvasir.pth |
| ONNX checkpoint | DepthPolyp_Kvasir.onnx |
ONNX I/O names:
input: image
outputs: segmentation, depth
Intended Use
DepthPolyp is intended for research on colonoscopic polyp segmentation, lightweight medical image segmentation, robustness under endoscopic video degradation, and deployment-oriented model comparison.
This model is not a standalone medical device and is not intended for clinical diagnosis without appropriate validation, regulatory review, and expert oversight.
Quick Start: ONNX Runtime
pip install onnxruntime pillow numpy
python scripts/infer_onnx.py \
--onnx DepthPolyp_Kvasir.onnx \
--input samples/kvasir/images \
--output outputs
The script writes binary masks, pseudo-depth visualizations, and mask overlays.
Quick Start: PyTorch
pip install torch torchvision pillow numpy
import torch
from PIL import Image
from torchvision import transforms
from model.depthpolyp import build_depthpolyp
device = "cuda" if torch.cuda.is_available() else "cpu"
model = build_depthpolyp(
encoder_name="b0",
in_channels=3,
num_classes=2,
decoder_channels=256,
activation=None,
)
state_dict = torch.load("DepthPolyp_Kvasir.pth", map_location="cpu", weights_only=True)
model.load_state_dict(state_dict, strict=True)
model.to(device).eval()
image = Image.open("samples/kvasir/images/sample_01.jpg").convert("RGB")
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
])
x = transform(image).unsqueeze(0).to(device)
with torch.no_grad():
seg_prob, depth_prob = model(x)
print(seg_prob.shape) # [1, 1, 224, 224]
print(depth_prob.shape) # [1, 1, 224, 224]
Loading Files with huggingface_hub
from huggingface_hub import hf_hub_download
repo_id = "ReaganWZY/DepthPolyp"
pth_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.pth")
onnx_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.onnx")
If you publish under a different Hugging Face repo id, replace ReaganWZY/DepthPolyp with that id.
Evaluation
Paper-reported reference results:
| Protocol | Kvasir Dice/IoU/Recall | ClinicDB Dice/IoU/Recall | ColonDB Dice/IoU/Recall |
|---|---|---|---|
N->C |
0.891 / 0.805 / 0.885 | 0.854 / 0.748 / 0.845 | 0.801 / 0.669 / 0.759 |
N->N |
0.853 / 0.745 / 0.854 | 0.751 / 0.608 / 0.759 | 0.734 / 0.582 / 0.697 |
Real-world robustness and deployment results from the paper:
| Params | GMACs | Avg. Dice | PolypGen Dice | iPhone FPS | Raspberry Pi 4 FPS |
|---|---|---|---|---|---|
| 3.57M | 0.86 | 0.779 | 0.679 | 181.54 | 4.05 |
Training Data and Protocol
The released checkpoint is trained on Kvasir-SEG with degradation-aware training. Pseudo-depth targets are generated with Depth-Anything v2 Small and are used only during training; depth targets are not required at inference time.
Reference training settings from the paper:
- Input resolution: 224 x 224
- Optimizer: AdamW
- Learning rate: 1e-4
- Weight decay: 1e-4
- Batch size: 16
- Epochs: 200
- Schedule: 10% warm-up followed by cosine annealing
Citation
@misc{wu2026depthpolyp,
title={DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy},
author={Wu, Zhuoyu and Ou, Wenhui and Zhang, Lexi and Tan, Pei-Sze and Wu, Dongjun and Zhao, Junhe and Fang, Wenqi and Phan, Raphaël C.-W.},
year={2026},
eprint={2605.16519},
archivePrefix={arXiv},
primaryClass={cs.CV}
}