DepthPolyp / README.md
ReaganWZY's picture
Link DepthPolyp demo Space
671b796 verified
metadata
license: mit
library_name: pytorch
pipeline_tag: image-segmentation
tags:
  - medical-image-segmentation
  - image-segmentation
  - semantic-segmentation
  - polyp-segmentation
  - colonoscopy
  - depth-estimation
  - pseudo-depth
  - real-time
  - onnx
  - pytorch
  - arxiv:2605.16519
metrics:
  - dice
  - iou
  - recall

DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy

DepthPolyp is a lightweight pseudo-depth guided model for real-time colonoscopic polyp segmentation. Given an RGB colonoscopy frame, it jointly predicts:

  1. a binary polyp segmentation probability map
  2. a pseudo-depth probability map for depth-aware structural guidance

The model uses a MiT-B0 encoder and lightweight fusion/gating modules to keep deployment cost low while improving robustness under blur, illumination changes, reflections, and other real-world colonoscopy degradations.

Model Details

Item Value
Model DepthPolyp
Encoder MiT-B0
Input RGB image, 224 x 224
Outputs segmentation, pseudo-depth
Parameters 3.57M
Complexity 0.86 GMACs
Training data Kvasir-SEG with degradation-aware training
PyTorch checkpoint DepthPolyp_Kvasir.pth
ONNX checkpoint DepthPolyp_Kvasir.onnx

ONNX I/O names:

input: image
outputs: segmentation, depth

Intended Use

DepthPolyp is intended for research on colonoscopic polyp segmentation, lightweight medical image segmentation, robustness under endoscopic video degradation, and deployment-oriented model comparison.

This model is not a standalone medical device and is not intended for clinical diagnosis without appropriate validation, regulatory review, and expert oversight.

Quick Start: ONNX Runtime

pip install onnxruntime pillow numpy

python scripts/infer_onnx.py \
  --onnx DepthPolyp_Kvasir.onnx \
  --input samples/kvasir/images \
  --output outputs

The script writes binary masks, pseudo-depth visualizations, and mask overlays.

Quick Start: PyTorch

pip install torch torchvision pillow numpy
import torch
from PIL import Image
from torchvision import transforms

from model.depthpolyp import build_depthpolyp

device = "cuda" if torch.cuda.is_available() else "cpu"

model = build_depthpolyp(
    encoder_name="b0",
    in_channels=3,
    num_classes=2,
    decoder_channels=256,
    activation=None,
)
state_dict = torch.load("DepthPolyp_Kvasir.pth", map_location="cpu", weights_only=True)
model.load_state_dict(state_dict, strict=True)
model.to(device).eval()

image = Image.open("samples/kvasir/images/sample_01.jpg").convert("RGB")
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])
x = transform(image).unsqueeze(0).to(device)

with torch.no_grad():
    seg_prob, depth_prob = model(x)

print(seg_prob.shape)    # [1, 1, 224, 224]
print(depth_prob.shape)  # [1, 1, 224, 224]

Loading Files with huggingface_hub

from huggingface_hub import hf_hub_download

repo_id = "ReaganWZY/DepthPolyp"
pth_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.pth")
onnx_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.onnx")

If you publish under a different Hugging Face repo id, replace ReaganWZY/DepthPolyp with that id.

Evaluation

Paper-reported reference results:

Protocol Kvasir Dice/IoU/Recall ClinicDB Dice/IoU/Recall ColonDB Dice/IoU/Recall
N->C 0.891 / 0.805 / 0.885 0.854 / 0.748 / 0.845 0.801 / 0.669 / 0.759
N->N 0.853 / 0.745 / 0.854 0.751 / 0.608 / 0.759 0.734 / 0.582 / 0.697

Real-world robustness and deployment results from the paper:

Params GMACs Avg. Dice PolypGen Dice iPhone FPS Raspberry Pi 4 FPS
3.57M 0.86 0.779 0.679 181.54 4.05

Training Data and Protocol

The released checkpoint is trained on Kvasir-SEG with degradation-aware training. Pseudo-depth targets are generated with Depth-Anything v2 Small and are used only during training; depth targets are not required at inference time.

Reference training settings from the paper:

  • Input resolution: 224 x 224
  • Optimizer: AdamW
  • Learning rate: 1e-4
  • Weight decay: 1e-4
  • Batch size: 16
  • Epochs: 200
  • Schedule: 10% warm-up followed by cosine annealing

Citation

@misc{wu2026depthpolyp,
  title={DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy},
  author={Wu, Zhuoyu and Ou, Wenhui and Zhang, Lexi and Tan, Pei-Sze and Wu, Dongjun and Zhao, Junhe and Fang, Wenqi and Phan, Raphaël C.-W.},
  year={2026},
  eprint={2605.16519},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}