File size: 4,995 Bytes
5acc7ae 671b796 5acc7ae | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | ---
license: mit
library_name: pytorch
pipeline_tag: image-segmentation
tags:
- medical-image-segmentation
- image-segmentation
- semantic-segmentation
- polyp-segmentation
- colonoscopy
- depth-estimation
- pseudo-depth
- real-time
- onnx
- pytorch
- arxiv:2605.16519
metrics:
- dice
- iou
- recall
---
# DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy
DepthPolyp is a lightweight pseudo-depth guided model for real-time colonoscopic polyp segmentation. Given an RGB colonoscopy frame, it jointly predicts:
1. a binary polyp segmentation probability map
2. a pseudo-depth probability map for depth-aware structural guidance
The model uses a MiT-B0 encoder and lightweight fusion/gating modules to keep deployment cost low while improving robustness under blur, illumination changes, reflections, and other real-world colonoscopy degradations.
- Paper: [arXiv:2605.16519](https://arxiv.org/abs/2605.16519)
- Code: [github.com/ReaganWu/DepthPolyp](https://github.com/ReaganWu/DepthPolyp)
- Demo: [DepthPolyp-demo](https://huggingface.co/spaces/ReaganWZY/DepthPolyp-demo)
- License: MIT
## Model Details
| Item | Value |
| --- | --- |
| Model | DepthPolyp |
| Encoder | MiT-B0 |
| Input | RGB image, 224 x 224 |
| Outputs | segmentation, pseudo-depth |
| Parameters | 3.57M |
| Complexity | 0.86 GMACs |
| Training data | Kvasir-SEG with degradation-aware training |
| PyTorch checkpoint | `DepthPolyp_Kvasir.pth` |
| ONNX checkpoint | `DepthPolyp_Kvasir.onnx` |
ONNX I/O names:
```text
input: image
outputs: segmentation, depth
```
## Intended Use
DepthPolyp is intended for research on colonoscopic polyp segmentation, lightweight medical image segmentation, robustness under endoscopic video degradation, and deployment-oriented model comparison.
This model is not a standalone medical device and is not intended for clinical diagnosis without appropriate validation, regulatory review, and expert oversight.
## Quick Start: ONNX Runtime
```bash
pip install onnxruntime pillow numpy
python scripts/infer_onnx.py \
--onnx DepthPolyp_Kvasir.onnx \
--input samples/kvasir/images \
--output outputs
```
The script writes binary masks, pseudo-depth visualizations, and mask overlays.
## Quick Start: PyTorch
```bash
pip install torch torchvision pillow numpy
```
```python
import torch
from PIL import Image
from torchvision import transforms
from model.depthpolyp import build_depthpolyp
device = "cuda" if torch.cuda.is_available() else "cpu"
model = build_depthpolyp(
encoder_name="b0",
in_channels=3,
num_classes=2,
decoder_channels=256,
activation=None,
)
state_dict = torch.load("DepthPolyp_Kvasir.pth", map_location="cpu", weights_only=True)
model.load_state_dict(state_dict, strict=True)
model.to(device).eval()
image = Image.open("samples/kvasir/images/sample_01.jpg").convert("RGB")
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
])
x = transform(image).unsqueeze(0).to(device)
with torch.no_grad():
seg_prob, depth_prob = model(x)
print(seg_prob.shape) # [1, 1, 224, 224]
print(depth_prob.shape) # [1, 1, 224, 224]
```
## Loading Files with `huggingface_hub`
```python
from huggingface_hub import hf_hub_download
repo_id = "ReaganWZY/DepthPolyp"
pth_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.pth")
onnx_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.onnx")
```
If you publish under a different Hugging Face repo id, replace `ReaganWZY/DepthPolyp` with that id.
## Evaluation
Paper-reported reference results:
| Protocol | Kvasir Dice/IoU/Recall | ClinicDB Dice/IoU/Recall | ColonDB Dice/IoU/Recall |
| --- | --- | --- | --- |
| `N->C` | 0.891 / 0.805 / 0.885 | 0.854 / 0.748 / 0.845 | 0.801 / 0.669 / 0.759 |
| `N->N` | 0.853 / 0.745 / 0.854 | 0.751 / 0.608 / 0.759 | 0.734 / 0.582 / 0.697 |
Real-world robustness and deployment results from the paper:
| Params | GMACs | Avg. Dice | PolypGen Dice | iPhone FPS | Raspberry Pi 4 FPS |
| ---: | ---: | ---: | ---: | ---: | ---: |
| 3.57M | 0.86 | 0.779 | 0.679 | 181.54 | 4.05 |
## Training Data and Protocol
The released checkpoint is trained on Kvasir-SEG with degradation-aware training. Pseudo-depth targets are generated with Depth-Anything v2 Small and are used only during training; depth targets are not required at inference time.
Reference training settings from the paper:
- Input resolution: 224 x 224
- Optimizer: AdamW
- Learning rate: 1e-4
- Weight decay: 1e-4
- Batch size: 16
- Epochs: 200
- Schedule: 10% warm-up followed by cosine annealing
## Citation
```bibtex
@misc{wu2026depthpolyp,
title={DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy},
author={Wu, Zhuoyu and Ou, Wenhui and Zhang, Lexi and Tan, Pei-Sze and Wu, Dongjun and Zhao, Junhe and Fang, Wenqi and Phan, Raphaël C.-W.},
year={2026},
eprint={2605.16519},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
|