Model card

Date effective: September 2025 Owner: ORA, Internal Policy & Practice

Model summary


Developer	Microsoft Research
Description	COLIPRI is composed of a vision encoder for 3D CT scans and a text encoder for clinical reports. The encoders may be leveraged for downstream tasks such as classification or segmentation. The models were trained using CT-RATE and NLST. COLIPRI is not a generative model.
Model architecture	The vision encoder is Primus-M, a 3D vision transformer. The text encoder architecture is BERT.
Parameters	1-500M
Inputs	3D chest CT scans and/or radiology reports
Context length	N/A (not an LLM)
Outputs	Image and/or text embeddings, not directly human-interpretable.
GPUs	4 x A100 GPUs
Training time	1.75 Days
Public data summary (or summaries)	Training Data: National Lung Screening Trial (NLST) dataset: Chest CT images from approximately 26k patients CT-RATE dataset: Approximately 25.7k CT acquisitions with associate reports
Training dates	Dates of training: August 2024. Intended model weight release date: Dec 5, 2025
Status	Static
Release date	Intended release date: Dec 5, 2025
Release date in the EU (if different)	Intended release date: Dec 5, 2025
License	Free/open source
Model dependencies	COLIPRI’s text decoder is fine-tuned from microsoft/BiomedVLP-CXR-BERT-specialized · Hugging Face.
List and link to any additional related assets	N/A
Acceptable use policy	N/A

Model overview

COLIPRI is an encoder designed for 3D CT scans. Its architecture combines a 3D vision encoder (PRIMUS-M transformer) with a biomedical text encoder (BiomedVLP-CXR-BERT) and enhances existing models by adding additional training objectives and optimizations. These include the inclusion of new training objectives and optimizations to the CLIP paradigm to account for limited data availability. This multi-objective training approach optimizes both global semantic alignment and dense feature learning. After pre-training, the model is evaluated comprehensively on downstream medical imaging tasks such as classification, retrieval, and semantic segmentation. COLIPRI is not a generative model. It cannot generate any images or text.

Alignment approach

N/A

Usage

Primary use cases

COLIPRI is primarily intended for 3D medical imaging tasks such as segmentation and classification. It is intended for research only.

Out-of-scope use cases

The COLIPRI model family was trained on 3D CT volumes and is therefore best suited for inputs of this type. It is intended for research only. The model is for research use only; clinical or deployed use cases are out of scope.

Distribution channels

The model will be released with open access via Hugging Face.

Input formats

The vision encoder takes 3D chest CT scans, and the text encoder takes radiology reports.

Technical requirements and integration guidance

The model requires a Python-based environment with PyTorch. The only hardware requirement is one GPU.

Responsible AI considerations

The model may underperform or produce unreliable outputs for non-English languages, out-of-distribution or adversarial inputs, or clinical scenarios not well represented in the training data. The model should be used in an AI-assisted research setup with human oversight, rather than as a sole diagnostic tool.

Data overview

Training, testing, and validation datasets

The COLIPRI model family was pre-trained on the CT-RATE dataset, consisting of approximately 25.7k CT acquisitions with associated reports, and the National Lung Screening Trial (NLST) dataset, consisting of chest CT images from approximately 26k patients. Classification performance was evaluated on a withheld test set of CT-RATE and the publicly available subset of RAD-ChestCT, which comprises 3.6k chest CT volumes with 16 multi-abnormality labels. Semantic segmentation was evaluated by training a five-fold cross validation on four datasets:

LiTS: Task 3 of the Medical Segmentation Decathlon (MSD), containing segmentations for liver and liver tumors
Lung: Task 6 of MSD, containing segmentations of primary lung cancers
HVS: Task 8 of the MSD, containing segmentations of hepatic vessels and tumors next to such vessels
KiTS23: containing segmentations of tumors, cysts, and the kidney

Quality and performance evaluation

Results summary

Classification Probes

The COLIPRI family exceeded all third-party models on both the CT-RATE and RAD-ChestCT datasets across all metrics.

Zero-Shot Classification

COLIPRI encoders exceed the state of the art using short prompts on the CT-RATE and RAD-ChestCT datasets.

Report-to-Image Retrieval

COLIPRI encoders are substantially better in retrieving the associated image given a report, evaluated on the CT-RATE test set.

Segmentation

Across four datasets (LiTs, Lung, HVS, KiTS23), COLIPRI-CM and COLIPRI-CRM perform on-par or better than the state of the art.

Qualitative Analysis

COLIPRI embeddings generates sharper and more coherent features than third-party models.

Long context

N/A – not an LLM.

Safety evaluation and red-teaming

N/A - model is limited to the medical imaging domain. Model outputs were compared to ground-truth images. The standard quality metrics were computed.

Tracked capability evaluations

N/A This model is not a frontier model.

Additional information

Requests for additional information may be directed to MSFTAIActRequest@microsoft.com.