romainhardy commited on
Commit
f87dbde
·
verified ·
1 Parent(s): 6a27cf6

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +129 -11
README.md CHANGED
@@ -1,16 +1,134 @@
1
  ---
2
  license: apache-2.0
3
  language:
4
- - en
 
 
 
 
 
 
 
 
5
  base_model:
6
- - tencent/DepthCrafter
7
- - stabilityai/stable-video-diffusion-img2vid-xt
8
  pipeline_tag: depth-estimation
9
- tags:
10
- - depth-estimation
11
- - colonoscopy
12
- - medical-imaging
13
- - video
14
- - lora
15
- - diffusion
16
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  language:
4
+ - en
5
+ tags:
6
+ - depth-estimation
7
+ - colonoscopy
8
+ - medical-imaging
9
+ - video
10
+ - lora
11
+ - diffusion
12
+ library_name: transformers
13
  base_model:
14
+ - tencent/DepthCrafter
15
+ - stabilityai/stable-video-diffusion-img2vid-xt
16
  pipeline_tag: depth-estimation
17
+ ---
18
+
19
+ # ColonCrafter - Depth Estimation for Colonoscopy Videos
20
+
21
+ ColonCrafter is a LoRA-adapted video depth estimation model specifically fine-tuned for colonoscopy imagery. It builds upon [DepthCrafter](https://huggingface.co/tencent/DepthCrafter) and [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) to provide temporally consistent depth predictions for endoscopic video sequences.
22
+
23
+ ## Model Details
24
+
25
+ - **Model Type:** Video Depth Estimation (Diffusion-based)
26
+ - **Base Architecture:** DepthCrafter UNet with LoRA adaptation
27
+ - **LoRA Configuration:**
28
+ - Rank: 16
29
+ - Target modules: `to_q`, `to_k`, `to_v`, `to_out.0`
30
+ - Dropout: 0.1
31
+ - **Precision:** FP16
32
+
33
+ ## Installation
34
+
35
+ ```bash
36
+ pip install torch peft diffusers transformers
37
+ ```
38
+
39
+ Clone the repository:
40
+ ```bash
41
+ git clone https://github.com/YOUR_USERNAME/coloncrafter.git
42
+ cd coloncrafter
43
+ pip install -e .
44
+ ```
45
+
46
+ ## Usage
47
+
48
+ ```python
49
+ import torch
50
+ from src.depth.models.model import ColonCrafterInference
51
+
52
+ # Load the model
53
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
54
+ model = ColonCrafterInference.from_pretrained(
55
+ "YOUR_USERNAME/coloncrafter",
56
+ device=device
57
+ )
58
+
59
+ # Prepare video tensor: (N, C, H, W) in [0, 1] range
60
+ # video = ...
61
+
62
+ # Run inference
63
+ pred_depth, pred_disparity = model.predict_depth(
64
+ video,
65
+ num_inference_steps=25, # More steps = higher quality
66
+ window_size=16,
67
+ overlap=8,
68
+ guidance_scale=1.0,
69
+ seed=42
70
+ )
71
+ ```
72
+
73
+ ### Inference Parameters
74
+
75
+ | Parameter | Default | Description |
76
+ |-----------|---------|-------------|
77
+ | `num_inference_steps` | 25 | Number of denoising steps (1 for fast, 25+ for quality) |
78
+ | `window_size` | 16 | Sliding window size for temporal processing |
79
+ | `overlap` | 8 | Overlap between consecutive windows |
80
+ | `guidance_scale` | 1.0 | Classifier-free guidance scale |
81
+ | `seed` | 42 | Random seed for reproducibility |
82
+
83
+ ## Input/Output
84
+
85
+ - **Input:** Video tensor of shape `(N, C, H, W)` with values in `[0, 1]` range
86
+ - `N`: Number of frames
87
+ - `C`: 3 (RGB channels)
88
+ - `H, W`: Height and width (recommended: 512×512)
89
+
90
+ - **Output:** Tuple of `(depth, disparity)` arrays of shape `(N, H, W)`
91
+ - `disparity`: Direct model output (inverse depth)
92
+ - `depth`: Computed as `1.0 / disparity`
93
+
94
+ ## Training Data
95
+
96
+ The model was fine-tuned on colonoscopy video data to adapt DepthCrafter's general video depth estimation capabilities to the specific challenges of endoscopic imagery, including:
97
+ - Specular highlights
98
+ - Non-Lambertian surfaces
99
+ - Limited field of view
100
+ - Tissue deformation
101
+
102
+ ## Intended Use
103
+
104
+ This model is intended for research purposes in:
105
+ - Colonoscopy depth estimation
106
+ - 3D reconstruction of colon anatomy
107
+ - Navigation assistance research
108
+ - Surgical planning research
109
+
110
+ ## Limitations
111
+
112
+ - Optimized for colonoscopy/endoscopy imagery; may not generalize to other domains
113
+ - Requires GPU with sufficient VRAM for video processing
114
+ - Depth predictions are relative (up to scale), not metric
115
+ - Performance may degrade on heavily occluded or motion-blurred frames
116
+
117
+ ## Citation
118
+
119
+ If you use this model in your research, please cite:
120
+
121
+ ```bibtex
122
+ @misc{coloncrafter2024,
123
+ title={ColonCrafter: Depth Estimation for Colonoscopy Videos},
124
+ author={Your Name},
125
+ year={2024},
126
+ url={https://huggingface.co/YOUR_USERNAME/coloncrafter}
127
+ }
128
+ ```
129
+
130
+ ## Acknowledgments
131
+
132
+ This model builds upon:
133
+ - [DepthCrafter](https://github.com/Tencent/DepthCrafter) by Tencent
134
+ - [Stable Video Diffusion](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) by Stability AI