mrokuss commited on
Commit
5a91fce
·
verified ·
1 Parent(s): 524f634

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +107 -3
README.md CHANGED
@@ -18,10 +18,11 @@ tags:
18
  # VoxTell: Free-Text Promptable Universal 3D Medical Image Segmentation
19
 
20
  <div align="center">
21
-
 
22
  [![GitHub](https://img.shields.io/badge/GitHub-VoxTell-181717?logo=github&logoColor=white)](https://github.com/MIC-DKFZ/VoxTell)&#160;
23
  [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Model-VoxTell-yellow)](https://huggingface.co/MIC-DKFZ/VoxTell)&#160;
24
- [![arXiv](https://img.shields.io/badge/arXiv-2511.11450-B31B1B.svg)](https://arxiv.org/abs/2511.11450)
25
 
26
  </div>
27
 
@@ -75,11 +76,113 @@ DOWNLOAD_DIR = "/home/user/temp" # Optionally specify the download directory
75
 
76
  download_path = snapshot_download(
77
  repo_id="mrokuss/VoxTell",
78
- allow_patterns=[f"{MODEL_NAME}/*"],
79
  local_dir=DOWNLOAD_DIR
80
  )
81
  ```
82
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
  ## Architecture
84
 
85
  VoxTell employs a multi-stage vision-language fusion approach:
@@ -109,6 +212,7 @@ VoxTell employs a multi-stage vision-language fusion approach:
109
 
110
  VoxTell achieves state-of-the-art performance on anatomical and pathological segmentation tasks across multiple medical imaging benchmarks. Detailed performance metrics and comparisons are available in the [paper](https://arxiv.org/abs/2511.11450).
111
 
 
112
 
113
 
114
  ## Limitations / Known Issues
 
18
  # VoxTell: Free-Text Promptable Universal 3D Medical Image Segmentation
19
 
20
  <div align="center">
21
+
22
+ [![arXiv](https://img.shields.io/badge/arXiv-2511.11450-B31B1B.svg)](https://arxiv.org/abs/2511.11450)&#160;
23
  [![GitHub](https://img.shields.io/badge/GitHub-VoxTell-181717?logo=github&logoColor=white)](https://github.com/MIC-DKFZ/VoxTell)&#160;
24
  [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Model-VoxTell-yellow)](https://huggingface.co/MIC-DKFZ/VoxTell)&#160;
25
+ [![napari](https://img.shields.io/badge/napari-plugin-80d1ff)](https://github.com/MIC-DKFZ/napari-voxtell)
26
 
27
  </div>
28
 
 
76
 
77
  download_path = snapshot_download(
78
  repo_id="mrokuss/VoxTell",
79
+ allow_patterns=[f"{MODEL_NAME}/*", "*.json"],
80
  local_dir=DOWNLOAD_DIR
81
  )
82
  ```
83
 
84
+ ## 🛠 Installation
85
+
86
+ ### 1. Create a Virtual Environment
87
+
88
+ VoxTell supports Python 3.10+ and works with Conda, pip, or any other virtual environment manager. Here's an example using Conda:
89
+
90
+ ```bash
91
+ conda create -n voxtell python=3.12
92
+ conda activate voxtell
93
+ ```
94
+
95
+ ### 2. Install PyTorch
96
+
97
+ > [!WARNING]
98
+ > **Temporary Compatibility Warning**
99
+ > There is a known issue with **PyTorch 2.9.0** causing **OOM errors during inference** in `VoxTell` (related to 3D convolutions — see the PyTorch issue [here](https://github.com/pytorch/pytorch/issues/166122)).
100
+ > **Until this is resolved, please use PyTorch 2.8.0 or earlier.**
101
+
102
+ Install PyTorch compatible with your CUDA version. For example, for Ubuntu with a modern NVIDIA GPU:
103
+
104
+ ```bash
105
+ pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu126
106
+ ```
107
+
108
+ *For other configurations (macOS, CPU, different CUDA versions), please refer to the [PyTorch Get Started](https://pytorch.org/get-started/previous-versions/) page.*
109
+
110
+ Install via pip (you can also use [uv](https://docs.astral.sh/uv/)):
111
+
112
+ ```bash
113
+ pip install voxtell
114
+ ```
115
+
116
+ or install directly from the GitHub repository:
117
+
118
+ ```bash
119
+ git clone https://github.com/MIC-DKFZ/VoxTell
120
+ cd VoxTell
121
+ pip install -e .
122
+ ```
123
+
124
+ ### 3. Python API
125
+
126
+ For more control or integration into Python workflows, use the Python API:
127
+
128
+ ```python
129
+ import torch
130
+ from voxtell.inference.predictor import VoxTellPredictor
131
+ from nnunetv2.imageio.nibabel_reader_writer import NibabelIOWithReorient
132
+
133
+ # Select device
134
+ device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
135
+
136
+ # Load image
137
+ image_path = "/path/to/your/image.nii.gz"
138
+ img, _ = NibabelIOWithReorient().read_images([image_path])
139
+
140
+ # Define text prompts
141
+ text_prompts = ["liver", "right kidney", "left kidney", "spleen"]
142
+
143
+ # Initialize predictor
144
+ predictor = VoxTellPredictor(
145
+ model_dir="/path/to/voxtell_model_directory",
146
+ device=device,
147
+ )
148
+
149
+ # Run prediction
150
+ # Output shape: (num_prompts, x, y, z)
151
+ voxtell_seg = predictor.predict_single_image(img, text_prompts)
152
+ ```
153
+
154
+ #### 4. Optional: Visualize Results
155
+
156
+ You can visualize the segmentation results using [napari](https://napari.org/):
157
+
158
+ ```bash
159
+ pip install napari[all]
160
+ ```
161
+
162
+ ```python
163
+ import napari
164
+ import numpy as np
165
+
166
+ # Create a napari viewer and add the original image
167
+ viewer = napari.Viewer()
168
+ viewer.add_image(img, name='Image')
169
+
170
+ # Add segmentation results as label layers for each prompt
171
+ for i, prompt in enumerate(text_prompts):
172
+ viewer.add_labels(voxtell_seg[i].astype(np.uint8), name=prompt)
173
+
174
+ # Run napari
175
+ napari.run()
176
+ ```
177
+
178
+ ## Important: Image Orientation and Spacing
179
+
180
+ - ⚠️ **Image Orientation (Critical)**: For correct anatomical localization (e.g., distinguishing left from right), images **must be in RAS orientation**. VoxTell was trained on data reoriented using [this specific reader](https://github.com/MIC-DKFZ/nnUNet/blob/86606c53ef9f556d6f024a304b52a48378453641/nnunetv2/imageio/nibabel_reader_writer.py#L101). Orientation mismatches can be a source of error. An easy way to test for this is if a simple prompt like "liver" fails and segments parts of the spleen instead. Make sure your image metadata is correct.
181
+
182
+ - **Image Spacing**: The model does not resample images to a standardized spacing for faster inference. Performance may degrade on images with very uncommon voxel spacings (e.g., super high-resolution brain MRI). In such cases, consider resampling the image to a more typical clinical spacing (e.g., 1.5×1.5×1.5 mm³) before segmentation.
183
+
184
+ ---
185
+
186
  ## Architecture
187
 
188
  VoxTell employs a multi-stage vision-language fusion approach:
 
212
 
213
  VoxTell achieves state-of-the-art performance on anatomical and pathological segmentation tasks across multiple medical imaging benchmarks. Detailed performance metrics and comparisons are available in the [paper](https://arxiv.org/abs/2511.11450).
214
 
215
+ Tip: Experiment with different prompts tailored to your use case. For example, the prompt `lesions` is known to be overconfident, i.e. over-segmenting, compared to `lesion`.
216
 
217
 
218
  ## Limitations / Known Issues