|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- depth-estimation |
|
|
- colonoscopy |
|
|
- medical-imaging |
|
|
- video |
|
|
- lora |
|
|
- diffusion |
|
|
library_name: transformers |
|
|
base_model: |
|
|
- tencent/DepthCrafter |
|
|
- stabilityai/stable-video-diffusion-img2vid-xt |
|
|
pipeline_tag: depth-estimation |
|
|
--- |
|
|
|
|
|
# ColonCrafter: A Depth Estimation Model for Colonoscopy Videos Using Diffusion Priors |
|
|
|
|
|
ColonCrafter builds upon [DepthCrafter](https://huggingface.co/tencent/DepthCrafter) and [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) to provide temporally consistent depth predictions for colonoscopy video. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model Type:** Video Depth Estimation (Diffusion-based) |
|
|
- **Base Architecture:** DepthCrafter UNet with LoRA adaptation |
|
|
- **LoRA Configuration:** |
|
|
- Rank: 16 |
|
|
- Target modules: `to_q`, `to_k`, `to_v`, `to_out.0` |
|
|
- Dropout: 0.1 |
|
|
- **Precision:** FP16 |
|
|
|
|
|
## Installation |
|
|
|
|
|
Please refer to the installation instructions in our [repository](https://github.com/rajpurkarlab/ColonCrafter). |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from src.depth.models.model import ColonCrafterInference |
|
|
|
|
|
# Load the model |
|
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
|
model = ColonCrafterInference.from_pretrained( |
|
|
"romainhardy/coloncrafter", |
|
|
device=device |
|
|
) |
|
|
|
|
|
# Prepare video tensor: (N, C, H, W) in [0, 1] range |
|
|
# video = ... |
|
|
|
|
|
# Run inference |
|
|
pred_depth, pred_disparity = model.predict_depth( |
|
|
video, |
|
|
num_inference_steps=1, |
|
|
window_size=16, |
|
|
overlap=8, |
|
|
guidance_scale=1.0, |
|
|
seed=42 |
|
|
) |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@article{hardy2025coloncrafter, |
|
|
title={ColonCrafter: A Depth Estimation Model for Colonoscopy Videos Using Diffusion Priors}, |
|
|
author={Hardy, Romain and Berzin, Tyler and Rajpurkar, Pranav}, |
|
|
journal={arXiv preprint arXiv:2509.13525}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
This model builds upon [DepthCrafter](https://github.com/Tencent/DepthCrafter) and [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt). |