Model card
Date effective: September 2025 Owner: ORA, Internal Policy & Practice
Model summary
| Developer | Microsoft Research |
| Description | COLIPRI is composed of a vision encoder for 3D CT scans and a text encoder for clinical reports. The encoders may be leveraged for downstream tasks such as classification or segmentation. The models were trained using CT-RATE and NLST. COLIPRI is not a generative model. |
| Model architecture | The vision encoder is Primus-M, a 3D vision transformer. The text encoder architecture is BERT. |
| Parameters | 1-500M |
| Inputs | 3D chest CT scans and/or radiology reports |
| Context length | N/A (not an LLM) |
| Outputs | Image and/or text embeddings, not directly human-interpretable. |
| GPUs | 4 x A100 GPUs |
| Training time | 1.75 Days |
| Public data summary (or summaries) | Training Data: National Lung Screening Trial (NLST) dataset: Chest CT images from approximately 26k patients CT-RATE dataset: Approximately 25.7k CT acquisitions with associate reports |
| Training dates | Dates of training: August 2024. Intended model weight release date: Dec 5, 2025 |
| Status | Static |
| Release date | Intended release date: Dec 5, 2025 |
| Release date in the EU (if different) | Intended release date: Dec 5, 2025 |
| License | Free/open source |
| Model dependencies | COLIPRI’s text decoder is fine-tuned from microsoft/BiomedVLP-CXR-BERT-specialized · Hugging Face. |
| List and link to any additional related assets | N/A |
| Acceptable use policy | N/A |
Model overview
COLIPRI is an encoder designed for 3D CT scans. Its architecture combines a 3D vision encoder (PRIMUS-M transformer) with a biomedical text encoder (BiomedVLP-CXR-BERT) and enhances existing models by adding additional training objectives and optimizations. These include the inclusion of new training objectives and optimizations to the CLIP paradigm to account for limited data availability. This multi-objective training approach optimizes both global semantic alignment and dense feature learning. After pre-training, the model is evaluated comprehensively on downstream medical imaging tasks such as classification, retrieval, and semantic segmentation. COLIPRI is not a generative model. It cannot generate any images or text.
Alignment approach
N/A
Usage
Primary use cases
COLIPRI is primarily intended for 3D medical imaging tasks such as segmentation and classification. It is intended for research only.
Out-of-scope use cases
The COLIPRI model family was trained on 3D CT volumes and is therefore best suited for inputs of this type. It is intended for research only. The model is for research use only; clinical or deployed use cases are out of scope.
Distribution channels
The model will be released with open access via Hugging Face.
Input formats
The vision encoder takes 3D chest CT scans, and the text encoder takes radiology reports.
Technical requirements and integration guidance
The model requires a Python-based environment with PyTorch. The only hardware requirement is one GPU.
Responsible AI considerations
The model may underperform or produce unreliable outputs for non-English languages, out-of-distribution or adversarial inputs, or clinical scenarios not well represented in the training data. The model should be used in an AI-assisted research setup with human oversight, rather than as a sole diagnostic tool.
Data overview
Training, testing, and validation datasets
The COLIPRI model family was pre-trained on the CT-RATE dataset, consisting of approximately 25.7k CT acquisitions with associated reports, and the National Lung Screening Trial (NLST) dataset, consisting of chest CT images from approximately 26k patients. Classification performance was evaluated on a withheld test set of CT-RATE and the publicly available subset of RAD-ChestCT, which comprises 3.6k chest CT volumes with 16 multi-abnormality labels. Semantic segmentation was evaluated by training a five-fold cross validation on four datasets:
- LiTS: Task 3 of the Medical Segmentation Decathlon (MSD), containing segmentations for liver and liver tumors
- Lung: Task 6 of MSD, containing segmentations of primary lung cancers
- HVS: Task 8 of the MSD, containing segmentations of hepatic vessels and tumors next to such vessels
- KiTS23: containing segmentations of tumors, cysts, and the kidney
Quality and performance evaluation
Results summary
Classification Probes
The COLIPRI family exceeded all third-party models on both the CT-RATE and RAD-ChestCT datasets across all metrics.
Zero-Shot Classification
COLIPRI encoders exceed the state of the art using short prompts on the CT-RATE and RAD-ChestCT datasets.
Report-to-Image Retrieval
COLIPRI encoders are substantially better in retrieving the associated image given a report, evaluated on the CT-RATE test set.
Segmentation
Across four datasets (LiTs, Lung, HVS, KiTS23), COLIPRI-CM and COLIPRI-CRM perform on-par or better than the state of the art.
Qualitative Analysis
COLIPRI embeddings generates sharper and more coherent features than third-party models.
Long context
N/A – not an LLM.
Safety evaluation and red-teaming
N/A - model is limited to the medical imaging domain. Model outputs were compared to ground-truth images. The standard quality metrics were computed.
Tracked capability evaluations
N/A This model is not a frontier model.
Additional information
Requests for additional information may be directed to MSFTAIActRequest@microsoft.com.