mrokuss
/

VoxTell

@@ -18,10 +18,11 @@ tags:
 # VoxTell: Free-Text Promptable Universal 3D Medical Image Segmentation
 <div align="center">
 [![GitHub](https://img.shields.io/badge/GitHub-VoxTell-181717?logo=github&logoColor=white)](https://github.com/MIC-DKFZ/VoxTell)&#160;
 [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Model-VoxTell-yellow)](https://huggingface.co/MIC-DKFZ/VoxTell)&#160;
-[![arXiv](https://img.shields.io/badge/arXiv-2511.11450-B31B1B.svg)](https://arxiv.org/abs/2511.11450)
 </div>
@@ -75,11 +76,113 @@ DOWNLOAD_DIR = "/home/user/temp" # Optionally specify the download directory
 download_path = snapshot_download(
 repo_id="mrokuss/VoxTell",
-allow_patterns=[f"{MODEL_NAME}/*"],
 local_dir=DOWNLOAD_DIR
 )
 ```
 ## Architecture
 VoxTell employs a multi-stage vision-language fusion approach:
@@ -109,6 +212,7 @@ VoxTell employs a multi-stage vision-language fusion approach:
 VoxTell achieves state-of-the-art performance on anatomical and pathological segmentation tasks across multiple medical imaging benchmarks. Detailed performance metrics and comparisons are available in the [paper](https://arxiv.org/abs/2511.11450).
 ## Limitations / Known Issues

 # VoxTell: Free-Text Promptable Universal 3D Medical Image Segmentation
 <div align="center">
+[![arXiv](https://img.shields.io/badge/arXiv-2511.11450-B31B1B.svg)](https://arxiv.org/abs/2511.11450)&#160;
 [![GitHub](https://img.shields.io/badge/GitHub-VoxTell-181717?logo=github&logoColor=white)](https://github.com/MIC-DKFZ/VoxTell)&#160;
 [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Model-VoxTell-yellow)](https://huggingface.co/MIC-DKFZ/VoxTell)&#160;
+[![napari](https://img.shields.io/badge/napari-plugin-80d1ff)](https://github.com/MIC-DKFZ/napari-voxtell)
 </div>
 download_path = snapshot_download(
 repo_id="mrokuss/VoxTell",
+allow_patterns=[f"{MODEL_NAME}/*", "*.json"],
 local_dir=DOWNLOAD_DIR
 )
 ```
+## 🛠 Installation
+### 1. Create a Virtual Environment
+VoxTell supports Python 3.10+ and works with Conda, pip, or any other virtual environment manager. Here's an example using Conda:
+```bash
+conda create -n voxtell python=3.12
+conda activate voxtell
+```
+### 2. Install PyTorch
+> [!WARNING]
+> **Temporary Compatibility Warning**
+> There is a known issue with **PyTorch 2.9.0** causing **OOM errors during inference** in `VoxTell` (related to 3D convolutions — see the PyTorch issue [here](https://github.com/pytorch/pytorch/issues/166122)).
+> **Until this is resolved, please use PyTorch 2.8.0 or earlier.**
+Install PyTorch compatible with your CUDA version. For example, for Ubuntu with a modern NVIDIA GPU:
+```bash
+pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu126
+```
+*For other configurations (macOS, CPU, different CUDA versions), please refer to the [PyTorch Get Started](https://pytorch.org/get-started/previous-versions/) page.*
+Install via pip (you can also use [uv](https://docs.astral.sh/uv/)):
+```bash
+pip install voxtell
+```
+or install directly from the GitHub repository:
+```bash
+git clone https://github.com/MIC-DKFZ/VoxTell
+cd VoxTell
+pip install -e .
+```
+### 3. Python API
+For more control or integration into Python workflows, use the Python API:
+```python
+import torch
+from voxtell.inference.predictor import VoxTellPredictor
+from nnunetv2.imageio.nibabel_reader_writer import NibabelIOWithReorient
+# Select device
+device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
+# Load image
+image_path = "/path/to/your/image.nii.gz"
+img, _ = NibabelIOWithReorient().read_images([image_path])
+# Define text prompts
+text_prompts = ["liver", "right kidney", "left kidney", "spleen"]
+# Initialize predictor
+predictor = VoxTellPredictor(
+      model_dir="/path/to/voxtell_model_directory",
+      device=device,
+)
+# Run prediction
+# Output shape: (num_prompts, x, y, z)
+voxtell_seg = predictor.predict_single_image(img, text_prompts)
+```
+#### 4. Optional: Visualize Results
+You can visualize the segmentation results using [napari](https://napari.org/):
+```bash
+pip install napari[all]
+```
+```python
+import napari
+import numpy as np
+# Create a napari viewer and add the original image
+viewer = napari.Viewer()
+viewer.add_image(img, name='Image')
+# Add segmentation results as label layers for each prompt
+for i, prompt in enumerate(text_prompts):
+      viewer.add_labels(voxtell_seg[i].astype(np.uint8), name=prompt)
+# Run napari
+napari.run()
+```
+## Important: Image Orientation and Spacing
+- ⚠️ **Image Orientation (Critical)**: For correct anatomical localization (e.g., distinguishing left from right), images **must be in RAS orientation**. VoxTell was trained on data reoriented using [this specific reader](https://github.com/MIC-DKFZ/nnUNet/blob/86606c53ef9f556d6f024a304b52a48378453641/nnunetv2/imageio/nibabel_reader_writer.py#L101). Orientation mismatches can be a source of error. An easy way to test for this is if a simple prompt like "liver" fails and segments parts of the spleen instead. Make sure your image metadata is correct.
+- **Image Spacing**: The model does not resample images to a standardized spacing for faster inference. Performance may degrade on images with very uncommon voxel spacings (e.g., super high-resolution brain MRI). In such cases, consider resampling the image to a more typical clinical spacing (e.g., 1.5×1.5×1.5 mm³) before segmentation.
+---
 ## Architecture
 VoxTell employs a multi-stage vision-language fusion approach:
 VoxTell achieves state-of-the-art performance on anatomical and pathological segmentation tasks across multiple medical imaging benchmarks. Detailed performance metrics and comparisons are available in the [paper](https://arxiv.org/abs/2511.11450).
+Tip: Experiment with different prompts tailored to your use case. For example, the prompt `lesions` is known to be overconfident, i.e. over-segmenting, compared to `lesion`.
 ## Limitations / Known Issues