romainhardy
/

ColonCrafter

@@ -1,16 +1,134 @@
 ---
 license: apache-2.0
 language:
-- en
 base_model:
-- tencent/DepthCrafter
-- stabilityai/stable-video-diffusion-img2vid-xt
 pipeline_tag: depth-estimation
-tags:
-- depth-estimation
-- colonoscopy
-- medical-imaging
-- video
-- lora
-- diffusion
----

 ---
 license: apache-2.0
 language:
+  - en
+tags:
+  - depth-estimation
+  - colonoscopy
+  - medical-imaging
+  - video
+  - lora
+  - diffusion
+library_name: transformers
 base_model:
+  - tencent/DepthCrafter
+  - stabilityai/stable-video-diffusion-img2vid-xt
 pipeline_tag: depth-estimation
+---
+# ColonCrafter - Depth Estimation for Colonoscopy Videos
+ColonCrafter is a LoRA-adapted video depth estimation model specifically fine-tuned for colonoscopy imagery. It builds upon [DepthCrafter](https://huggingface.co/tencent/DepthCrafter) and [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) to provide temporally consistent depth predictions for endoscopic video sequences.
+## Model Details
+- **Model Type:** Video Depth Estimation (Diffusion-based)
+- **Base Architecture:** DepthCrafter UNet with LoRA adaptation
+- **LoRA Configuration:**
+  - Rank: 16
+  - Target modules: `to_q`, `to_k`, `to_v`, `to_out.0`
+  - Dropout: 0.1
+- **Precision:** FP16
+## Installation
+```bash
+pip install torch peft diffusers transformers
+```
+Clone the repository:
+```bash
+git clone https://github.com/YOUR_USERNAME/coloncrafter.git
+cd coloncrafter
+pip install -e .
+```
+## Usage
+```python
+import torch
+from src.depth.models.model import ColonCrafterInference
+# Load the model
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model = ColonCrafterInference.from_pretrained(
+    "YOUR_USERNAME/coloncrafter",
+    device=device
+)
+# Prepare video tensor: (N, C, H, W) in [0, 1] range
+# video = ...
+# Run inference
+pred_depth, pred_disparity = model.predict_depth(
+    video,
+    num_inference_steps=25,  # More steps = higher quality
+    window_size=16,
+    overlap=8,
+    guidance_scale=1.0,
+    seed=42
+)
+```
+### Inference Parameters
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `num_inference_steps` | 25 | Number of denoising steps (1 for fast, 25+ for quality) |
+| `window_size` | 16 | Sliding window size for temporal processing |
+| `overlap` | 8 | Overlap between consecutive windows |
+| `guidance_scale` | 1.0 | Classifier-free guidance scale |
+| `seed` | 42 | Random seed for reproducibility |
+## Input/Output
+- **Input:** Video tensor of shape `(N, C, H, W)` with values in `[0, 1]` range
+  - `N`: Number of frames
+  - `C`: 3 (RGB channels)
+  - `H, W`: Height and width (recommended: 512×512)
+- **Output:** Tuple of `(depth, disparity)` arrays of shape `(N, H, W)`
+  - `disparity`: Direct model output (inverse depth)
+  - `depth`: Computed as `1.0 / disparity`
+## Training Data
+The model was fine-tuned on colonoscopy video data to adapt DepthCrafter's general video depth estimation capabilities to the specific challenges of endoscopic imagery, including:
+- Specular highlights
+- Non-Lambertian surfaces
+- Limited field of view
+- Tissue deformation
+## Intended Use
+This model is intended for research purposes in:
+- Colonoscopy depth estimation
+- 3D reconstruction of colon anatomy
+- Navigation assistance research
+- Surgical planning research
+## Limitations
+- Optimized for colonoscopy/endoscopy imagery; may not generalize to other domains
+- Requires GPU with sufficient VRAM for video processing
+- Depth predictions are relative (up to scale), not metric
+- Performance may degrade on heavily occluded or motion-blurred frames
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{coloncrafter2024,
+  title={ColonCrafter: Depth Estimation for Colonoscopy Videos},
+  author={Your Name},
+  year={2024},
+  url={https://huggingface.co/YOUR_USERNAME/coloncrafter}
+}
+```
+## Acknowledgments
+This model builds upon:
+- [DepthCrafter](https://github.com/Tencent/DepthCrafter) by Tencent
+- [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) by Stability AI