romainhardy
/

ColonCrafter

Depth Estimation

medical-imaging

Model card Files Files and versions

ColonCrafter / README.md

romainhardy's picture

Update README.md

01c40b7 verified 15 days ago

|

history blame contribute delete

2.1 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- depth-estimation
	- colonoscopy
	- medical-imaging
	- video
	- lora
	- diffusion
	library_name: transformers
	base_model:
	- tencent/DepthCrafter
	- stabilityai/stable-video-diffusion-img2vid-xt
	pipeline_tag: depth-estimation
	---

	# ColonCrafter: A Depth Estimation Model for Colonoscopy Videos Using Diffusion Priors

	ColonCrafter builds upon [DepthCrafter](https://huggingface.co/tencent/DepthCrafter) and [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) to provide temporally consistent depth predictions for colonoscopy video.

	## Model Details

	- Model Type: Video Depth Estimation (Diffusion-based)
	- Base Architecture: DepthCrafter UNet with LoRA adaptation
	- LoRA Configuration:
	- Rank: 16
	- Target modules: `to_q`, `to_k`, `to_v`, `to_out.0`
	- Dropout: 0.1
	- Precision: FP16

	## Installation

	Please refer to the installation instructions in our [repository](https://github.com/rajpurkarlab/ColonCrafter).

	## Usage

	```python
	import torch
	from src.depth.models.model import ColonCrafterInference

	# Load the model
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model = ColonCrafterInference.from_pretrained(
	"romainhardy/coloncrafter",
	device=device
	)

	# Prepare video tensor: (N, C, H, W) in [0, 1] range
	# video = ...

	# Run inference
	pred_depth, pred_disparity = model.predict_depth(
	video,
	num_inference_steps=1,
	window_size=16,
	overlap=8,
	guidance_scale=1.0,
	seed=42
	)
	```

	## Citation

	If you use this model in your research, please cite:

	```bibtex
	@article{hardy2025coloncrafter,
	title={ColonCrafter: A Depth Estimation Model for Colonoscopy Videos Using Diffusion Priors},
	author={Hardy, Romain and Berzin, Tyler and Rajpurkar, Pranav},
	journal={arXiv preprint arXiv:2509.13525},
	year={2025}
	}
	```

	## Acknowledgments

	This model builds upon [DepthCrafter](https://github.com/Tencent/DepthCrafter) and [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt).