OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis

📄 Paper    🤖 OmniCT-3B    🤖 OmniCT-7B    💻 GitHub

Overview

OmniCT is a unified Slice-Volume Large Vision-Language Model (LVLM) designed for comprehensive Computed Tomography (CT) understanding and analysis. Traditional medical vision-language models typically process 2D CT slices or 3D volumes separately. OmniCT introduces a unified framework capable of handling both slice-level and volume-level CT data, enabling flexible and scalable multimodal abilities across diverse CT analysis scenarios.

Model Variants

This repository provides weights of two model variants:

Model Parameters Description
OmniCT-3B ~3B Ultra-lightweight CT understanding model
OmniCT-7B ~7B Lightweight model with stronger capability

Model links:

Training Strategy

OmniCT adopts a two-stage training pipeline.

Stage 1: Projection Alignment Pre-training

In the first stage, the vision encoder is aligned with the LLM backbone through a projection layer, enabling effective multimodal representation learning.

Projection layer weights are provided for reproducibility:

Stage 2: Instruction Fine-tuning

The model is further trained using CT-related multimodal instructions, enabling advanced abilities such as:

  • CT visual question answering
  • CT report generation

Usage

Please refer to the official GitHub repository for inference and training instructions:

https://github.com/alibaba-damo-academy/OmniCT

The repository provides:

  • environment setup
  • inference scripts
  • training pipelines
  • dataset preparation

Citation

If OmniCT is helpful for your research, please consider citing:


@article{lin2026omnict,
title={OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis},
author={Lin, Tianwei and Qiu, Zhongwei and Zhang, Wenqiao and Liu, Jiang and Xie, Yihan and Gao, Mingjian and Fan, Zhenxuan and Li, Zhaocheng and Li, Sijing and Xie, Zhongle and others},
journal={arXiv preprint arXiv:2602.16110},
year={2026}
}
Downloads last month
26
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Alibaba-DAMO-Academy/OmniCT-7B

Base model

Qwen/Qwen2.5-7B
Finetuned
(2700)
this model

Collection including Alibaba-DAMO-Academy/OmniCT-7B

Paper for Alibaba-DAMO-Academy/OmniCT-7B