--- license: apache-2.0 language: - en tags: - depth-estimation - colonoscopy - medical-imaging - video - lora - diffusion library_name: transformers base_model: - tencent/DepthCrafter - stabilityai/stable-video-diffusion-img2vid-xt pipeline_tag: depth-estimation --- # ColonCrafter: A Depth Estimation Model for Colonoscopy Videos Using Diffusion Priors ColonCrafter builds upon [DepthCrafter](https://huggingface.co/tencent/DepthCrafter) and [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) to provide temporally consistent depth predictions for colonoscopy video. ## Model Details - **Model Type:** Video Depth Estimation (Diffusion-based) - **Base Architecture:** DepthCrafter UNet with LoRA adaptation - **LoRA Configuration:** - Rank: 16 - Target modules: `to_q`, `to_k`, `to_v`, `to_out.0` - Dropout: 0.1 - **Precision:** FP16 ## Installation Please refer to the installation instructions in our [repository](https://github.com/rajpurkarlab/ColonCrafter). ## Usage ```python import torch from src.depth.models.model import ColonCrafterInference # Load the model device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = ColonCrafterInference.from_pretrained( "romainhardy/coloncrafter", device=device ) # Prepare video tensor: (N, C, H, W) in [0, 1] range # video = ... # Run inference pred_depth, pred_disparity = model.predict_depth( video, num_inference_steps=1, window_size=16, overlap=8, guidance_scale=1.0, seed=42 ) ``` ## Citation If you use this model in your research, please cite: ```bibtex @article{hardy2025coloncrafter, title={ColonCrafter: A Depth Estimation Model for Colonoscopy Videos Using Diffusion Priors}, author={Hardy, Romain and Berzin, Tyler and Rajpurkar, Pranav}, journal={arXiv preprint arXiv:2509.13525}, year={2025} } ``` ## Acknowledgments This model builds upon [DepthCrafter](https://github.com/Tencent/DepthCrafter) and [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt).