colipri / README.md
nielsr's picture
nielsr HF Staff
Add model card and metadata
cc8e902 verified
|
raw
history blame
1.63 kB
metadata
license: mit
pipeline_tag: image-text-to-text

COLIPRI: Comprehensive Language-Image Pre-training for 3D Medical Image Understanding

COLIPRI (Comprehensive Language-Image Pre-training) is a family of vision-language encoders designed for 3D medical image understanding.

The model was introduced in the paper: Comprehensive language-image pre-training for 3D medical image understanding.

Description

COLIPRI aligns 3D medical images with paired text using a multi-objective pre-training strategy. It combines vision-language alignment with a report generation objective and vision-only pre-training, allowing the model to benefit from both image-only and paired image-text 3D datasets.

The COLIPRI encoders achieve state-of-the-art performance across various tasks:

  • Radiological Report Generation: Generating text descriptions from 3D scans.
  • Semantic Segmentation: Downstream adaptation for anatomical and pathological segmentation.
  • Zero-shot Classification: Predicting labels without task-specific fine-tuning.
  • Classification Probing: Providing strong feature representations for medical diagnostics.

Citation

@article{wald2025comprehensive,
  title={Comprehensive language-image pre-training for 3D medical image understanding},
  author={Wald, Tassilo and Hamamci, Ibrahim Ethem and Gao, Yuan and Bond-Taylor, Sam and Sharma, Harshita and Ilse, Maximilian and Lo, Cynthia and Melnichenko, Olesya and Codella, Noel C. F. and Wetscherek, Maria Teodora and others},
  journal={arXiv preprint arXiv:2510.15042},
  year={2025}
}