--- license: mit library_name: pytorch pipeline_tag: image-segmentation tags: - medical-image-segmentation - image-segmentation - semantic-segmentation - polyp-segmentation - colonoscopy - depth-estimation - pseudo-depth - real-time - onnx - pytorch - arxiv:2605.16519 metrics: - dice - iou - recall --- # DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy DepthPolyp is a lightweight pseudo-depth guided model for real-time colonoscopic polyp segmentation. Given an RGB colonoscopy frame, it jointly predicts: 1. a binary polyp segmentation probability map 2. a pseudo-depth probability map for depth-aware structural guidance The model uses a MiT-B0 encoder and lightweight fusion/gating modules to keep deployment cost low while improving robustness under blur, illumination changes, reflections, and other real-world colonoscopy degradations. - Paper: [arXiv:2605.16519](https://arxiv.org/abs/2605.16519) - Code: [github.com/ReaganWu/DepthPolyp](https://github.com/ReaganWu/DepthPolyp) - Demo: [DepthPolyp-demo](https://huggingface.co/spaces/ReaganWZY/DepthPolyp-demo) - License: MIT ## Model Details | Item | Value | | --- | --- | | Model | DepthPolyp | | Encoder | MiT-B0 | | Input | RGB image, 224 x 224 | | Outputs | segmentation, pseudo-depth | | Parameters | 3.57M | | Complexity | 0.86 GMACs | | Training data | Kvasir-SEG with degradation-aware training | | PyTorch checkpoint | `DepthPolyp_Kvasir.pth` | | ONNX checkpoint | `DepthPolyp_Kvasir.onnx` | ONNX I/O names: ```text input: image outputs: segmentation, depth ``` ## Intended Use DepthPolyp is intended for research on colonoscopic polyp segmentation, lightweight medical image segmentation, robustness under endoscopic video degradation, and deployment-oriented model comparison. This model is not a standalone medical device and is not intended for clinical diagnosis without appropriate validation, regulatory review, and expert oversight. ## Quick Start: ONNX Runtime ```bash pip install onnxruntime pillow numpy python scripts/infer_onnx.py \ --onnx DepthPolyp_Kvasir.onnx \ --input samples/kvasir/images \ --output outputs ``` The script writes binary masks, pseudo-depth visualizations, and mask overlays. ## Quick Start: PyTorch ```bash pip install torch torchvision pillow numpy ``` ```python import torch from PIL import Image from torchvision import transforms from model.depthpolyp import build_depthpolyp device = "cuda" if torch.cuda.is_available() else "cpu" model = build_depthpolyp( encoder_name="b0", in_channels=3, num_classes=2, decoder_channels=256, activation=None, ) state_dict = torch.load("DepthPolyp_Kvasir.pth", map_location="cpu", weights_only=True) model.load_state_dict(state_dict, strict=True) model.to(device).eval() image = Image.open("samples/kvasir/images/sample_01.jpg").convert("RGB") transform = transforms.Compose([ transforms.Resize((224, 224)), transforms.ToTensor(), ]) x = transform(image).unsqueeze(0).to(device) with torch.no_grad(): seg_prob, depth_prob = model(x) print(seg_prob.shape) # [1, 1, 224, 224] print(depth_prob.shape) # [1, 1, 224, 224] ``` ## Loading Files with `huggingface_hub` ```python from huggingface_hub import hf_hub_download repo_id = "ReaganWZY/DepthPolyp" pth_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.pth") onnx_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.onnx") ``` If you publish under a different Hugging Face repo id, replace `ReaganWZY/DepthPolyp` with that id. ## Evaluation Paper-reported reference results: | Protocol | Kvasir Dice/IoU/Recall | ClinicDB Dice/IoU/Recall | ColonDB Dice/IoU/Recall | | --- | --- | --- | --- | | `N->C` | 0.891 / 0.805 / 0.885 | 0.854 / 0.748 / 0.845 | 0.801 / 0.669 / 0.759 | | `N->N` | 0.853 / 0.745 / 0.854 | 0.751 / 0.608 / 0.759 | 0.734 / 0.582 / 0.697 | Real-world robustness and deployment results from the paper: | Params | GMACs | Avg. Dice | PolypGen Dice | iPhone FPS | Raspberry Pi 4 FPS | | ---: | ---: | ---: | ---: | ---: | ---: | | 3.57M | 0.86 | 0.779 | 0.679 | 181.54 | 4.05 | ## Training Data and Protocol The released checkpoint is trained on Kvasir-SEG with degradation-aware training. Pseudo-depth targets are generated with Depth-Anything v2 Small and are used only during training; depth targets are not required at inference time. Reference training settings from the paper: - Input resolution: 224 x 224 - Optimizer: AdamW - Learning rate: 1e-4 - Weight decay: 1e-4 - Batch size: 16 - Epochs: 200 - Schedule: 10% warm-up followed by cosine annealing ## Citation ```bibtex @misc{wu2026depthpolyp, title={DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy}, author={Wu, Zhuoyu and Ou, Wenhui and Zhang, Lexi and Tan, Pei-Sze and Wu, Dongjun and Zhao, Junhe and Fang, Wenqi and Phan, Raphaƫl C.-W.}, year={2026}, eprint={2605.16519}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```