Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -18,10 +18,11 @@ tags:
|
|
| 18 |
# VoxTell: Free-Text Promptable Universal 3D Medical Image Segmentation
|
| 19 |
|
| 20 |
<div align="center">
|
| 21 |
-
|
|
|
|
| 22 |
[](https://github.com/MIC-DKFZ/VoxTell) 
|
| 23 |
[](https://huggingface.co/MIC-DKFZ/VoxTell) 
|
| 24 |
-
[![
|
| 25 |
|
| 26 |
</div>
|
| 27 |
|
|
@@ -75,11 +76,113 @@ DOWNLOAD_DIR = "/home/user/temp" # Optionally specify the download directory
|
|
| 75 |
|
| 76 |
download_path = snapshot_download(
|
| 77 |
repo_id="mrokuss/VoxTell",
|
| 78 |
-
allow_patterns=[f"{MODEL_NAME}/*"],
|
| 79 |
local_dir=DOWNLOAD_DIR
|
| 80 |
)
|
| 81 |
```
|
| 82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
## Architecture
|
| 84 |
|
| 85 |
VoxTell employs a multi-stage vision-language fusion approach:
|
|
@@ -109,6 +212,7 @@ VoxTell employs a multi-stage vision-language fusion approach:
|
|
| 109 |
|
| 110 |
VoxTell achieves state-of-the-art performance on anatomical and pathological segmentation tasks across multiple medical imaging benchmarks. Detailed performance metrics and comparisons are available in the [paper](https://arxiv.org/abs/2511.11450).
|
| 111 |
|
|
|
|
| 112 |
|
| 113 |
|
| 114 |
## Limitations / Known Issues
|
|
|
|
| 18 |
# VoxTell: Free-Text Promptable Universal 3D Medical Image Segmentation
|
| 19 |
|
| 20 |
<div align="center">
|
| 21 |
+
|
| 22 |
+
[](https://arxiv.org/abs/2511.11450) 
|
| 23 |
[](https://github.com/MIC-DKFZ/VoxTell) 
|
| 24 |
[](https://huggingface.co/MIC-DKFZ/VoxTell) 
|
| 25 |
+
[](https://github.com/MIC-DKFZ/napari-voxtell)
|
| 26 |
|
| 27 |
</div>
|
| 28 |
|
|
|
|
| 76 |
|
| 77 |
download_path = snapshot_download(
|
| 78 |
repo_id="mrokuss/VoxTell",
|
| 79 |
+
allow_patterns=[f"{MODEL_NAME}/*", "*.json"],
|
| 80 |
local_dir=DOWNLOAD_DIR
|
| 81 |
)
|
| 82 |
```
|
| 83 |
|
| 84 |
+
## 🛠 Installation
|
| 85 |
+
|
| 86 |
+
### 1. Create a Virtual Environment
|
| 87 |
+
|
| 88 |
+
VoxTell supports Python 3.10+ and works with Conda, pip, or any other virtual environment manager. Here's an example using Conda:
|
| 89 |
+
|
| 90 |
+
```bash
|
| 91 |
+
conda create -n voxtell python=3.12
|
| 92 |
+
conda activate voxtell
|
| 93 |
+
```
|
| 94 |
+
|
| 95 |
+
### 2. Install PyTorch
|
| 96 |
+
|
| 97 |
+
> [!WARNING]
|
| 98 |
+
> **Temporary Compatibility Warning**
|
| 99 |
+
> There is a known issue with **PyTorch 2.9.0** causing **OOM errors during inference** in `VoxTell` (related to 3D convolutions — see the PyTorch issue [here](https://github.com/pytorch/pytorch/issues/166122)).
|
| 100 |
+
> **Until this is resolved, please use PyTorch 2.8.0 or earlier.**
|
| 101 |
+
|
| 102 |
+
Install PyTorch compatible with your CUDA version. For example, for Ubuntu with a modern NVIDIA GPU:
|
| 103 |
+
|
| 104 |
+
```bash
|
| 105 |
+
pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu126
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
*For other configurations (macOS, CPU, different CUDA versions), please refer to the [PyTorch Get Started](https://pytorch.org/get-started/previous-versions/) page.*
|
| 109 |
+
|
| 110 |
+
Install via pip (you can also use [uv](https://docs.astral.sh/uv/)):
|
| 111 |
+
|
| 112 |
+
```bash
|
| 113 |
+
pip install voxtell
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
or install directly from the GitHub repository:
|
| 117 |
+
|
| 118 |
+
```bash
|
| 119 |
+
git clone https://github.com/MIC-DKFZ/VoxTell
|
| 120 |
+
cd VoxTell
|
| 121 |
+
pip install -e .
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
### 3. Python API
|
| 125 |
+
|
| 126 |
+
For more control or integration into Python workflows, use the Python API:
|
| 127 |
+
|
| 128 |
+
```python
|
| 129 |
+
import torch
|
| 130 |
+
from voxtell.inference.predictor import VoxTellPredictor
|
| 131 |
+
from nnunetv2.imageio.nibabel_reader_writer import NibabelIOWithReorient
|
| 132 |
+
|
| 133 |
+
# Select device
|
| 134 |
+
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
|
| 135 |
+
|
| 136 |
+
# Load image
|
| 137 |
+
image_path = "/path/to/your/image.nii.gz"
|
| 138 |
+
img, _ = NibabelIOWithReorient().read_images([image_path])
|
| 139 |
+
|
| 140 |
+
# Define text prompts
|
| 141 |
+
text_prompts = ["liver", "right kidney", "left kidney", "spleen"]
|
| 142 |
+
|
| 143 |
+
# Initialize predictor
|
| 144 |
+
predictor = VoxTellPredictor(
|
| 145 |
+
model_dir="/path/to/voxtell_model_directory",
|
| 146 |
+
device=device,
|
| 147 |
+
)
|
| 148 |
+
|
| 149 |
+
# Run prediction
|
| 150 |
+
# Output shape: (num_prompts, x, y, z)
|
| 151 |
+
voxtell_seg = predictor.predict_single_image(img, text_prompts)
|
| 152 |
+
```
|
| 153 |
+
|
| 154 |
+
#### 4. Optional: Visualize Results
|
| 155 |
+
|
| 156 |
+
You can visualize the segmentation results using [napari](https://napari.org/):
|
| 157 |
+
|
| 158 |
+
```bash
|
| 159 |
+
pip install napari[all]
|
| 160 |
+
```
|
| 161 |
+
|
| 162 |
+
```python
|
| 163 |
+
import napari
|
| 164 |
+
import numpy as np
|
| 165 |
+
|
| 166 |
+
# Create a napari viewer and add the original image
|
| 167 |
+
viewer = napari.Viewer()
|
| 168 |
+
viewer.add_image(img, name='Image')
|
| 169 |
+
|
| 170 |
+
# Add segmentation results as label layers for each prompt
|
| 171 |
+
for i, prompt in enumerate(text_prompts):
|
| 172 |
+
viewer.add_labels(voxtell_seg[i].astype(np.uint8), name=prompt)
|
| 173 |
+
|
| 174 |
+
# Run napari
|
| 175 |
+
napari.run()
|
| 176 |
+
```
|
| 177 |
+
|
| 178 |
+
## Important: Image Orientation and Spacing
|
| 179 |
+
|
| 180 |
+
- ⚠️ **Image Orientation (Critical)**: For correct anatomical localization (e.g., distinguishing left from right), images **must be in RAS orientation**. VoxTell was trained on data reoriented using [this specific reader](https://github.com/MIC-DKFZ/nnUNet/blob/86606c53ef9f556d6f024a304b52a48378453641/nnunetv2/imageio/nibabel_reader_writer.py#L101). Orientation mismatches can be a source of error. An easy way to test for this is if a simple prompt like "liver" fails and segments parts of the spleen instead. Make sure your image metadata is correct.
|
| 181 |
+
|
| 182 |
+
- **Image Spacing**: The model does not resample images to a standardized spacing for faster inference. Performance may degrade on images with very uncommon voxel spacings (e.g., super high-resolution brain MRI). In such cases, consider resampling the image to a more typical clinical spacing (e.g., 1.5×1.5×1.5 mm³) before segmentation.
|
| 183 |
+
|
| 184 |
+
---
|
| 185 |
+
|
| 186 |
## Architecture
|
| 187 |
|
| 188 |
VoxTell employs a multi-stage vision-language fusion approach:
|
|
|
|
| 212 |
|
| 213 |
VoxTell achieves state-of-the-art performance on anatomical and pathological segmentation tasks across multiple medical imaging benchmarks. Detailed performance metrics and comparisons are available in the [paper](https://arxiv.org/abs/2511.11450).
|
| 214 |
|
| 215 |
+
Tip: Experiment with different prompts tailored to your use case. For example, the prompt `lesions` is known to be overconfident, i.e. over-segmenting, compared to `lesion`.
|
| 216 |
|
| 217 |
|
| 218 |
## Limitations / Known Issues
|