|
|
--- |
|
|
license: mit |
|
|
pipeline_tag: image-text-to-text |
|
|
--- |
|
|
|
|
|
# COLIPRI: Comprehensive Language-Image Pre-training for 3D Medical Image Understanding |
|
|
|
|
|
COLIPRI (Comprehensive Language-Image Pre-training) is a family of vision-language encoders designed for 3D medical image understanding. |
|
|
|
|
|
The model was introduced in the paper: [Comprehensive language-image pre-training for 3D medical image understanding](https://huggingface.co/papers/2510.15042). |
|
|
|
|
|
## Description |
|
|
|
|
|
COLIPRI aligns 3D medical images with paired text using a multi-objective pre-training strategy. It combines vision-language alignment with a report generation objective and vision-only pre-training, allowing the model to benefit from both image-only and paired image-text 3D datasets. |
|
|
|
|
|
The COLIPRI encoders achieve state-of-the-art performance across various tasks: |
|
|
- **Radiological Report Generation**: Generating text descriptions from 3D scans. |
|
|
- **Semantic Segmentation**: Downstream adaptation for anatomical and pathological segmentation. |
|
|
- **Zero-shot Classification**: Predicting labels without task-specific fine-tuning. |
|
|
- **Classification Probing**: Providing strong feature representations for medical diagnostics. |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@article{wald2025comprehensive, |
|
|
title={Comprehensive language-image pre-training for 3D medical image understanding}, |
|
|
author={Wald, Tassilo and Hamamci, Ibrahim Ethem and Gao, Yuan and Bond-Taylor, Sam and Sharma, Harshita and Ilse, Maximilian and Lo, Cynthia and Melnichenko, Olesya and Codella, Noel C. F. and Wetscherek, Maria Teodora and others}, |
|
|
journal={arXiv preprint arXiv:2510.15042}, |
|
|
year={2025} |
|
|
} |
|
|
``` |