File size: 4,995 Bytes

---
license: mit
library_name: pytorch
pipeline_tag: image-segmentation
tags:
- medical-image-segmentation
- image-segmentation
- semantic-segmentation
- polyp-segmentation
- colonoscopy
- depth-estimation
- pseudo-depth
- real-time
- onnx
- pytorch
- arxiv:2605.16519
metrics:
- dice
- iou
- recall
---

# DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy

DepthPolyp is a lightweight pseudo-depth guided model for real-time colonoscopic polyp segmentation. Given an RGB colonoscopy frame, it jointly predicts:

1. a binary polyp segmentation probability map
2. a pseudo-depth probability map for depth-aware structural guidance

The model uses a MiT-B0 encoder and lightweight fusion/gating modules to keep deployment cost low while improving robustness under blur, illumination changes, reflections, and other real-world colonoscopy degradations.

- Paper: [arXiv:2605.16519](https://arxiv.org/abs/2605.16519)
- Code: [github.com/ReaganWu/DepthPolyp](https://github.com/ReaganWu/DepthPolyp)
- Demo: [DepthPolyp-demo](https://huggingface.co/spaces/ReaganWZY/DepthPolyp-demo)
- License: MIT

## Model Details

| Item | Value |
| --- | --- |
| Model | DepthPolyp |
| Encoder | MiT-B0 |
| Input | RGB image, 224 x 224 |
| Outputs | segmentation, pseudo-depth |
| Parameters | 3.57M |
| Complexity | 0.86 GMACs |
| Training data | Kvasir-SEG with degradation-aware training |
| PyTorch checkpoint | `DepthPolyp_Kvasir.pth` |
| ONNX checkpoint | `DepthPolyp_Kvasir.onnx` |

ONNX I/O names:

```text
input: image
outputs: segmentation, depth
```

## Intended Use

DepthPolyp is intended for research on colonoscopic polyp segmentation, lightweight medical image segmentation, robustness under endoscopic video degradation, and deployment-oriented model comparison.

This model is not a standalone medical device and is not intended for clinical diagnosis without appropriate validation, regulatory review, and expert oversight.

## Quick Start: ONNX Runtime

```bash
pip install onnxruntime pillow numpy

python scripts/infer_onnx.py \
  --onnx DepthPolyp_Kvasir.onnx \
  --input samples/kvasir/images \
  --output outputs
```

The script writes binary masks, pseudo-depth visualizations, and mask overlays.

## Quick Start: PyTorch

```bash
pip install torch torchvision pillow numpy
```

```python
import torch
from PIL import Image
from torchvision import transforms

from model.depthpolyp import build_depthpolyp

device = "cuda" if torch.cuda.is_available() else "cpu"

model = build_depthpolyp(
    encoder_name="b0",
    in_channels=3,
    num_classes=2,
    decoder_channels=256,
    activation=None,
)
state_dict = torch.load("DepthPolyp_Kvasir.pth", map_location="cpu", weights_only=True)
model.load_state_dict(state_dict, strict=True)
model.to(device).eval()

image = Image.open("samples/kvasir/images/sample_01.jpg").convert("RGB")
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
])
x = transform(image).unsqueeze(0).to(device)

with torch.no_grad():
    seg_prob, depth_prob = model(x)

print(seg_prob.shape)    # [1, 1, 224, 224]
print(depth_prob.shape)  # [1, 1, 224, 224]
```

## Loading Files with `huggingface_hub`

```python
from huggingface_hub import hf_hub_download

repo_id = "ReaganWZY/DepthPolyp"
pth_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.pth")
onnx_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.onnx")
```

If you publish under a different Hugging Face repo id, replace `ReaganWZY/DepthPolyp` with that id.

## Evaluation

Paper-reported reference results:

| Protocol | Kvasir Dice/IoU/Recall | ClinicDB Dice/IoU/Recall | ColonDB Dice/IoU/Recall |
| --- | --- | --- | --- |
| `N->C` | 0.891 / 0.805 / 0.885 | 0.854 / 0.748 / 0.845 | 0.801 / 0.669 / 0.759 |
| `N->N` | 0.853 / 0.745 / 0.854 | 0.751 / 0.608 / 0.759 | 0.734 / 0.582 / 0.697 |

Real-world robustness and deployment results from the paper:

| Params | GMACs | Avg. Dice | PolypGen Dice | iPhone FPS | Raspberry Pi 4 FPS |
| ---: | ---: | ---: | ---: | ---: | ---: |
| 3.57M | 0.86 | 0.779 | 0.679 | 181.54 | 4.05 |

## Training Data and Protocol

The released checkpoint is trained on Kvasir-SEG with degradation-aware training. Pseudo-depth targets are generated with Depth-Anything v2 Small and are used only during training; depth targets are not required at inference time.

Reference training settings from the paper:

- Input resolution: 224 x 224
- Optimizer: AdamW
- Learning rate: 1e-4
- Weight decay: 1e-4
- Batch size: 16
- Epochs: 200
- Schedule: 10% warm-up followed by cosine annealing

## Citation

```bibtex
@misc{wu2026depthpolyp,
  title={DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy},
  author={Wu, Zhuoyu and Ou, Wenhui and Zhang, Lexi and Tan, Pei-Sze and Wu, Dongjun and Zhao, Junhe and Fang, Wenqi and Phan, Raphaël C.-W.},
  year={2026},
  eprint={2605.16519},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}
```