Link DepthPolyp demo Space

671b796 verified 1 day ago

5 kB

	---
	license: mit
	library_name: pytorch
	pipeline_tag: image-segmentation
	tags:
	- medical-image-segmentation
	- image-segmentation
	- semantic-segmentation
	- polyp-segmentation
	- colonoscopy
	- depth-estimation
	- pseudo-depth
	- real-time
	- onnx
	- pytorch
	- arxiv:2605.16519
	metrics:
	- dice
	- iou
	- recall
	---

	# DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy

	DepthPolyp is a lightweight pseudo-depth guided model for real-time colonoscopic polyp segmentation. Given an RGB colonoscopy frame, it jointly predicts:

	1. a binary polyp segmentation probability map
	2. a pseudo-depth probability map for depth-aware structural guidance

	The model uses a MiT-B0 encoder and lightweight fusion/gating modules to keep deployment cost low while improving robustness under blur, illumination changes, reflections, and other real-world colonoscopy degradations.

	- Paper: [arXiv:2605.16519](https://arxiv.org/abs/2605.16519)
	- Code: [github.com/ReaganWu/DepthPolyp](https://github.com/ReaganWu/DepthPolyp)
	- Demo: [DepthPolyp-demo](https://huggingface.co/spaces/ReaganWZY/DepthPolyp-demo)
	- License: MIT

	## Model Details

	\| Item \| Value \|
	\| --- \| --- \|
	\| Model \| DepthPolyp \|
	\| Encoder \| MiT-B0 \|
	\| Input \| RGB image, 224 x 224 \|
	\| Outputs \| segmentation, pseudo-depth \|
	\| Parameters \| 3.57M \|
	\| Complexity \| 0.86 GMACs \|
	\| Training data \| Kvasir-SEG with degradation-aware training \|
	\| PyTorch checkpoint \| `DepthPolyp_Kvasir.pth` \|
	\| ONNX checkpoint \| `DepthPolyp_Kvasir.onnx` \|

	ONNX I/O names:

	```text
	input: image
	outputs: segmentation, depth
	```

	## Intended Use

	DepthPolyp is intended for research on colonoscopic polyp segmentation, lightweight medical image segmentation, robustness under endoscopic video degradation, and deployment-oriented model comparison.

	This model is not a standalone medical device and is not intended for clinical diagnosis without appropriate validation, regulatory review, and expert oversight.

	## Quick Start: ONNX Runtime

	```bash
	pip install onnxruntime pillow numpy

	python scripts/infer_onnx.py \
	--onnx DepthPolyp_Kvasir.onnx \
	--input samples/kvasir/images \
	--output outputs
	```

	The script writes binary masks, pseudo-depth visualizations, and mask overlays.

	## Quick Start: PyTorch

	```bash
	pip install torch torchvision pillow numpy
	```

	```python
	import torch
	from PIL import Image
	from torchvision import transforms

	from model.depthpolyp import build_depthpolyp

	device = "cuda" if torch.cuda.is_available() else "cpu"

	model = build_depthpolyp(
	encoder_name="b0",
	in_channels=3,
	num_classes=2,
	decoder_channels=256,
	activation=None,
	)
	state_dict = torch.load("DepthPolyp_Kvasir.pth", map_location="cpu", weights_only=True)
	model.load_state_dict(state_dict, strict=True)
	model.to(device).eval()

	image = Image.open("samples/kvasir/images/sample_01.jpg").convert("RGB")
	transform = transforms.Compose([
	transforms.Resize((224, 224)),
	transforms.ToTensor(),
	])
	x = transform(image).unsqueeze(0).to(device)

	with torch.no_grad():
	seg_prob, depth_prob = model(x)

	print(seg_prob.shape) # [1, 1, 224, 224]
	print(depth_prob.shape) # [1, 1, 224, 224]
	```

	## Loading Files with `huggingface_hub`

	```python
	from huggingface_hub import hf_hub_download

	repo_id = "ReaganWZY/DepthPolyp"
	pth_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.pth")
	onnx_path = hf_hub_download(repo_id=repo_id, filename="DepthPolyp_Kvasir.onnx")
	```

	If you publish under a different Hugging Face repo id, replace `ReaganWZY/DepthPolyp` with that id.

	## Evaluation

	Paper-reported reference results:

	\| Protocol \| Kvasir Dice/IoU/Recall \| ClinicDB Dice/IoU/Recall \| ColonDB Dice/IoU/Recall \|
	\| --- \| --- \| --- \| --- \|
	\| `N->C` \| 0.891 / 0.805 / 0.885 \| 0.854 / 0.748 / 0.845 \| 0.801 / 0.669 / 0.759 \|
	\| `N->N` \| 0.853 / 0.745 / 0.854 \| 0.751 / 0.608 / 0.759 \| 0.734 / 0.582 / 0.697 \|

	Real-world robustness and deployment results from the paper:

	\| Params \| GMACs \| Avg. Dice \| PolypGen Dice \| iPhone FPS \| Raspberry Pi 4 FPS \|
	\| ---: \| ---: \| ---: \| ---: \| ---: \| ---: \|
	\| 3.57M \| 0.86 \| 0.779 \| 0.679 \| 181.54 \| 4.05 \|

	## Training Data and Protocol

	The released checkpoint is trained on Kvasir-SEG with degradation-aware training. Pseudo-depth targets are generated with Depth-Anything v2 Small and are used only during training; depth targets are not required at inference time.

	Reference training settings from the paper:

	- Input resolution: 224 x 224
	- Optimizer: AdamW
	- Learning rate: 1e-4
	- Weight decay: 1e-4
	- Batch size: 16
	- Epochs: 200
	- Schedule: 10% warm-up followed by cosine annealing

	## Citation

	```bibtex
	@misc{wu2026depthpolyp,
	title={DepthPolyp: Pseudo-Depth Guided Lightweight Segmentation for Real-Time Colonoscopy},
	author={Wu, Zhuoyu and Ou, Wenhui and Zhang, Lexi and Tan, Pei-Sze and Wu, Dongjun and Zhao, Junhe and Fang, Wenqi and Phan, Raphaël C.-W.},
	year={2026},
	eprint={2605.16519},
	archivePrefix={arXiv},
	primaryClass={cs.CV}
	}
	```