| --- |
| license: mit |
| library_name: pytorch |
| pipeline_tag: image-segmentation |
| tags: |
| - medical-image-segmentation |
| - image-segmentation |
| - semantic-segmentation |
| - polyp-segmentation |
| - colonoscopy |
| - depth-estimation |
| - pseudo-depth |
| - real-time |
| - onnx |
| - pytorch |
| - arxiv:2605.16519 |
| metrics: |
| - dice |
| - iou |
| - recall |
| --- |
| |
| # DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy |
|
|
| DepthPolyp is a lightweight pseudo-depth guided model for real-time colonoscopic polyp segmentation. Given an RGB colonoscopy frame, it jointly predicts: |
|
|
| 1. a binary polyp segmentation probability map |
| 2. a pseudo-depth probability map for depth-aware structural guidance |
|
|
| The model uses a MiT-B0 encoder and lightweight fusion/gating modules to keep deployment cost low while improving robustness under blur, illumination changes, reflections, and other real-world colonoscopy degradations. |
|
|
| - Paper: [arXiv:2605.16519](https://arxiv.org/abs/2605.16519) |
| - Code: [github.com/ReaganWu/DepthPolyp](https://github.com/ReaganWu/DepthPolyp) |
| - Demo: [DepthPolyp-demo](https://huggingface.co/spaces/ReaganWZY/DepthPolyp-demo) |
| - License: MIT |
|
|
| ## Model Details |
|
|
| | Item | Value | |
| | --- | --- | |
| | Model | DepthPolyp | |
| | Encoder | MiT-B0 | |
| | Input | RGB image, 224 x 224 | |
| | Outputs | segmentation, pseudo-depth | |
| | Parameters | 3.57M | |
| | Complexity | 0.86 GMACs | |
| | Training data | Kvasir-SEG with degradation-aware training | |
| | PyTorch checkpoint | `DepthPolyp_Kvasir.pth` | |
| | ONNX checkpoint | `DepthPolyp_Kvasir.onnx` | |
|
|
| ONNX I/O names: |
|
|
| ```text |
| input: image |
| outputs: segmentation, depth |
| ``` |
|
|
| ## Intended Use |
|
|
| DepthPolyp is intended for research on colonoscopic polyp segmentation, lightweight medical image segmentation, robustness under endoscopic video degradation, and deployment-oriented model comparison. |
|
|
| This model is not a standalone medical device and is not intended for clinical diagnosis without appropriate validation, regulatory review, and expert oversight. |
|
|
| ## Quick Start: ONNX Runtime |
|
|
| ```bash |
| pip install onnxruntime pillow numpy |
| |
| python scripts/infer_onnx.py \ |
| --onnx DepthPolyp_Kvasir.onnx \ |
| --input samples/kvasir/images \ |
| --output outputs |
| ``` |
|
|
| The script writes binary masks, pseudo-depth visualizations, and mask overlays. |
|
|
| ## Quick Start: PyTorch |
|
|
| ```bash |
| pip install torch torchvision pillow numpy |
| ``` |
|
|
| ```python |
| import torch |
| from PIL import Image |
| from torchvision import transforms |
| |
| from model.depthpolyp import build_depthpolyp |
| |
| device = "cuda" if torch.cuda.is_available() else "cpu" |
| |
| model = build_depthpolyp( |
| encoder_name="b0", |
| in_channels=3, |
| num_classes=2, |
| decoder_channels=256, |
| activation=None, |
| ) |
| state_dict = torch.load("DepthPolyp_Kvasir.pth", map_location="cpu", weights_only=True) |
| model.load_state_dict(state_dict, strict=True) |
| model.to(device).eval() |
| |
| image = Image.open("samples/kvasir/images/sample_01.jpg").convert("RGB") |
| transform = transforms.Compose([ |
| transforms.Resize((224, 224)), |
| transforms.ToTensor(), |
| ]) |
| x = transform(image).unsqueeze(0).to(device) |
| |
| with torch.no_grad(): |
| seg_prob, depth_prob = model(x) |
| |
| print(seg_prob.shape) # [1, 1, 224, 224] |
| print(depth_prob.shape) # [1, 1, 224, 224] |
| ``` |
|
|
| ## Loading Files with `huggingface_hub` |
| |
| ```python |
| from huggingface_hub import hf_hub_download |
|
|
| repo_id = "ReaganWZY/DepthPolyp" |
| pth_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.pth") |
| onnx_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.onnx") |
| ``` |
| |
| If you publish under a different Hugging Face repo id, replace `ReaganWZY/DepthPolyp` with that id. |
| |
| ## Evaluation |
| |
| Paper-reported reference results: |
| |
| | Protocol | Kvasir Dice/IoU/Recall | ClinicDB Dice/IoU/Recall | ColonDB Dice/IoU/Recall | |
| | --- | --- | --- | --- | |
| | `N->C` | 0.891 / 0.805 / 0.885 | 0.854 / 0.748 / 0.845 | 0.801 / 0.669 / 0.759 | |
| | `N->N` | 0.853 / 0.745 / 0.854 | 0.751 / 0.608 / 0.759 | 0.734 / 0.582 / 0.697 | |
| |
| Real-world robustness and deployment results from the paper: |
| |
| | Params | GMACs | Avg. Dice | PolypGen Dice | iPhone FPS | Raspberry Pi 4 FPS | |
| | ---: | ---: | ---: | ---: | ---: | ---: | |
| | 3.57M | 0.86 | 0.779 | 0.679 | 181.54 | 4.05 | |
| |
| ## Training Data and Protocol |
| |
| The released checkpoint is trained on Kvasir-SEG with degradation-aware training. Pseudo-depth targets are generated with Depth-Anything v2 Small and are used only during training; depth targets are not required at inference time. |
| |
| Reference training settings from the paper: |
| |
| - Input resolution: 224 x 224 |
| - Optimizer: AdamW |
| - Learning rate: 1e-4 |
| - Weight decay: 1e-4 |
| - Batch size: 16 |
| - Epochs: 200 |
| - Schedule: 10% warm-up followed by cosine annealing |
| |
| ## Citation |
| |
| ```bibtex |
| @misc{wu2026depthpolyp, |
| title={DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy}, |
| author={Wu, Zhuoyu and Ou, Wenhui and Zhang, Lexi and Tan, Pei-Sze and Wu, Dongjun and Zhao, Junhe and Fang, Wenqi and Phan, Raphaël C.-W.}, |
| year={2026}, |
| eprint={2605.16519}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CV} |
| } |
| ``` |
| |