OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis

📄 Paper 🤖 OmniCT-3B 🤖 OmniCT-7B 💻 GitHub

Overview

OmniCT is a unified Slice-Volume Large Vision-Language Model (LVLM) designed for comprehensive Computed Tomography (CT) understanding and analysis. Traditional medical vision-language models typically process 2D CT slices or 3D volumes separately. OmniCT introduces a unified framework capable of handling both slice-level and volume-level CT data, enabling flexible and scalable multimodal abilities across diverse CT analysis scenarios.

Model Variants

This repository provides weights of two model variants:

Model	Parameters	Description
OmniCT-3B	~3B	Ultra-lightweight CT understanding model
OmniCT-7B	~7B	Lightweight model with stronger capability

Model links:

OmniCT-3B
https://huggingface.co/Alibaba-DAMO-Academy/OmniCT-3B
OmniCT-7B
https://huggingface.co/Alibaba-DAMO-Academy/OmniCT-7B

Training Strategy

OmniCT adopts a two-stage training pipeline.

Stage 1: Projection Alignment Pre-training

In the first stage, the vision encoder is aligned with the LLM backbone through a projection layer, enabling effective multimodal representation learning.

Projection layer weights are provided for reproducibility:

OmniCT-3B Projection Weights
https://huggingface.co/Alibaba-DAMO-Academy/OmniCT-3B/blob/main/projection-weights/model.safetensors
OmniCT-7B Projection Weights
https://huggingface.co/Alibaba-DAMO-Academy/OmniCT-7B/blob/main/projection-weights/model.safetensors

Stage 2: Instruction Fine-tuning

The model is further trained using CT-related multimodal instructions, enabling advanced abilities such as:

CT visual question answering
CT report generation

Usage

Please refer to the official GitHub repository for inference and training instructions:

https://github.com/alibaba-damo-academy/OmniCT

The repository provides:

environment setup
inference scripts
training pipelines
dataset preparation

Citation

If OmniCT is helpful for your research, please consider citing:


@article{lin2026omnict,
title={OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis},
author={Lin, Tianwei and Qiu, Zhongwei and Zhang, Wenqiao and Liu, Jiang and Xie, Yihan and Gao, Mingjian and Fan, Zhenxuan and Li, Zhaocheng and Li, Sijing and Xie, Zhongle and others},
journal={arXiv preprint arXiv:2602.16110},
year={2026}
}