File size: 4,995 Bytes
5acc7ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
671b796
5acc7ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
license: mit
library_name: pytorch
pipeline_tag: image-segmentation
tags:
- medical-image-segmentation
- image-segmentation
- semantic-segmentation
- polyp-segmentation
- colonoscopy
- depth-estimation
- pseudo-depth
- real-time
- onnx
- pytorch
- arxiv:2605.16519
metrics:
- dice
- iou
- recall
---

# DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy

DepthPolyp is a lightweight pseudo-depth guided model for real-time colonoscopic polyp segmentation. Given an RGB colonoscopy frame, it jointly predicts:

1. a binary polyp segmentation probability map
2. a pseudo-depth probability map for depth-aware structural guidance

The model uses a MiT-B0 encoder and lightweight fusion/gating modules to keep deployment cost low while improving robustness under blur, illumination changes, reflections, and other real-world colonoscopy degradations.

- Paper: [arXiv:2605.16519](https://arxiv.org/abs/2605.16519)
- Code: [github.com/ReaganWu/DepthPolyp](https://github.com/ReaganWu/DepthPolyp)
- Demo: [DepthPolyp-demo](https://huggingface.co/spaces/ReaganWZY/DepthPolyp-demo)
- License: MIT

## Model Details

| Item | Value |
| --- | --- |
| Model | DepthPolyp |
| Encoder | MiT-B0 |
| Input | RGB image, 224 x 224 |
| Outputs | segmentation, pseudo-depth |
| Parameters | 3.57M |
| Complexity | 0.86 GMACs |
| Training data | Kvasir-SEG with degradation-aware training |
| PyTorch checkpoint | `DepthPolyp_Kvasir.pth` |
| ONNX checkpoint | `DepthPolyp_Kvasir.onnx` |

ONNX I/O names:

```text
input: image
outputs: segmentation, depth
```

## Intended Use

DepthPolyp is intended for research on colonoscopic polyp segmentation, lightweight medical image segmentation, robustness under endoscopic video degradation, and deployment-oriented model comparison.

This model is not a standalone medical device and is not intended for clinical diagnosis without appropriate validation, regulatory review, and expert oversight.

## Quick Start: ONNX Runtime

```bash
pip install onnxruntime pillow numpy

python scripts/infer_onnx.py \
  --onnx DepthPolyp_Kvasir.onnx \
  --input samples/kvasir/images \
  --output outputs
```

The script writes binary masks, pseudo-depth visualizations, and mask overlays.

## Quick Start: PyTorch

```bash
pip install torch torchvision pillow numpy
```

```python
import torch
from PIL import Image
from torchvision import transforms

from model.depthpolyp import build_depthpolyp

device = "cuda" if torch.cuda.is_available() else "cpu"

model = build_depthpolyp(
    encoder_name="b0",
    in_channels=3,
    num_classes=2,
    decoder_channels=256,
    activation=None,
)
state_dict = torch.load("DepthPolyp_Kvasir.pth", map_location="cpu", weights_only=True)
model.load_state_dict(state_dict, strict=True)
model.to(device).eval()

image = Image.open("samples/kvasir/images/sample_01.jpg").convert("RGB")
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])
x = transform(image).unsqueeze(0).to(device)

with torch.no_grad():
    seg_prob, depth_prob = model(x)

print(seg_prob.shape)    # [1, 1, 224, 224]
print(depth_prob.shape)  # [1, 1, 224, 224]
```

## Loading Files with `huggingface_hub`

```python
from huggingface_hub import hf_hub_download

repo_id = "ReaganWZY/DepthPolyp"
pth_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.pth")
onnx_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.onnx")
```

If you publish under a different Hugging Face repo id, replace `ReaganWZY/DepthPolyp` with that id.

## Evaluation

Paper-reported reference results:

| Protocol | Kvasir Dice/IoU/Recall | ClinicDB Dice/IoU/Recall | ColonDB Dice/IoU/Recall |
| --- | --- | --- | --- |
| `N->C` | 0.891 / 0.805 / 0.885 | 0.854 / 0.748 / 0.845 | 0.801 / 0.669 / 0.759 |
| `N->N` | 0.853 / 0.745 / 0.854 | 0.751 / 0.608 / 0.759 | 0.734 / 0.582 / 0.697 |

Real-world robustness and deployment results from the paper:

| Params | GMACs | Avg. Dice | PolypGen Dice | iPhone FPS | Raspberry Pi 4 FPS |
| ---: | ---: | ---: | ---: | ---: | ---: |
| 3.57M | 0.86 | 0.779 | 0.679 | 181.54 | 4.05 |

## Training Data and Protocol

The released checkpoint is trained on Kvasir-SEG with degradation-aware training. Pseudo-depth targets are generated with Depth-Anything v2 Small and are used only during training; depth targets are not required at inference time.

Reference training settings from the paper:

- Input resolution: 224 x 224
- Optimizer: AdamW
- Learning rate: 1e-4
- Weight decay: 1e-4
- Batch size: 16
- Epochs: 200
- Schedule: 10% warm-up followed by cosine annealing

## Citation

```bibtex
@misc{wu2026depthpolyp,
  title={DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy},
  author={Wu, Zhuoyu and Ou, Wenhui and Zhang, Lexi and Tan, Pei-Sze and Wu, Dongjun and Zhao, Junhe and Fang, Wenqi and Phan, Raphaël C.-W.},
  year={2026},
  eprint={2605.16519},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
```