Link DepthPolyp demo Space

671b796 verified 1 day ago

5 kB

license: mit
library_name: pytorch
pipeline_tag: image-segmentation
tags:
  - medical-image-segmentation
  - image-segmentation
  - semantic-segmentation
  - polyp-segmentation
  - colonoscopy
  - depth-estimation
  - pseudo-depth
  - real-time
  - onnx
  - pytorch
  - arxiv:2605.16519
metrics:
  - dice
  - iou
  - recall

DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy

DepthPolyp is a lightweight pseudo-depth guided model for real-time colonoscopic polyp segmentation. Given an RGB colonoscopy frame, it jointly predicts:

a binary polyp segmentation probability map
a pseudo-depth probability map for depth-aware structural guidance

The model uses a MiT-B0 encoder and lightweight fusion/gating modules to keep deployment cost low while improving robustness under blur, illumination changes, reflections, and other real-world colonoscopy degradations.

Model Details

Item	Value
Model	DepthPolyp
Encoder	MiT-B0
Input	RGB image, 224 x 224
Outputs	segmentation, pseudo-depth
Parameters	3.57M
Complexity	0.86 GMACs
Training data	Kvasir-SEG with degradation-aware training
PyTorch checkpoint	`DepthPolyp_Kvasir.pth`
ONNX checkpoint	`DepthPolyp_Kvasir.onnx`

ONNX I/O names:

input: image
outputs: segmentation, depth

Intended Use

DepthPolyp is intended for research on colonoscopic polyp segmentation, lightweight medical image segmentation, robustness under endoscopic video degradation, and deployment-oriented model comparison.

This model is not a standalone medical device and is not intended for clinical diagnosis without appropriate validation, regulatory review, and expert oversight.

Quick Start: ONNX Runtime

pip install onnxruntime pillow numpy

python scripts/infer_onnx.py \
  --onnx DepthPolyp_Kvasir.onnx \
  --input samples/kvasir/images \
  --output outputs

The script writes binary masks, pseudo-depth visualizations, and mask overlays.

Quick Start: PyTorch

pip install torch torchvision pillow numpy

import torch
from PIL import Image
from torchvision import transforms

from model.depthpolyp import build_depthpolyp

device = "cuda" if torch.cuda.is_available() else "cpu"

model = build_depthpolyp(
    encoder_name="b0",
    in_channels=3,
    num_classes=2,
    decoder_channels=256,
    activation=None,
)
state_dict = torch.load("DepthPolyp_Kvasir.pth", map_location="cpu", weights_only=True)
model.load_state_dict(state_dict, strict=True)
model.to(device).eval()

image = Image.open("samples/kvasir/images/sample_01.jpg").convert("RGB")
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])
x = transform(image).unsqueeze(0).to(device)

with torch.no_grad():
    seg_prob, depth_prob = model(x)

print(seg_prob.shape)    # [1, 1, 224, 224]
print(depth_prob.shape)  # [1, 1, 224, 224]

Loading Files with `huggingface_hub`

from huggingface_hub import hf_hub_download

repo_id = "ReaganWZY/DepthPolyp"
pth_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.pth")
onnx_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.onnx")

If you publish under a different Hugging Face repo id, replace ReaganWZY/DepthPolyp with that id.

Evaluation

Paper-reported reference results:

Protocol	Kvasir Dice/IoU/Recall	ClinicDB Dice/IoU/Recall	ColonDB Dice/IoU/Recall
`N->C`	0.891 / 0.805 / 0.885	0.854 / 0.748 / 0.845	0.801 / 0.669 / 0.759
`N->N`	0.853 / 0.745 / 0.854	0.751 / 0.608 / 0.759	0.734 / 0.582 / 0.697

Real-world robustness and deployment results from the paper:

Params	GMACs	Avg. Dice	PolypGen Dice	iPhone FPS	Raspberry Pi 4 FPS
3.57M	0.86	0.779	0.679	181.54	4.05

Training Data and Protocol

The released checkpoint is trained on Kvasir-SEG with degradation-aware training. Pseudo-depth targets are generated with Depth-Anything v2 Small and are used only during training; depth targets are not required at inference time.

Reference training settings from the paper:

Input resolution: 224 x 224
Optimizer: AdamW
Learning rate: 1e-4
Weight decay: 1e-4
Batch size: 16
Epochs: 200
Schedule: 10% warm-up followed by cosine annealing

Citation

@misc{wu2026depthpolyp,
  title={DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy},
  author={Wu, Zhuoyu and Ou, Wenhui and Zhang, Lexi and Tan, Pei-Sze and Wu, Dongjun and Zhao, Junhe and Fang, Wenqi and Phan, Raphaël C.-W.},
  year={2026},
  eprint={2605.16519},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}