OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis
📄 Paper 🤖 OmniCT-3B 🤖 OmniCT-7B 💻 GitHub
Overview
OmniCT is a unified Slice-Volume Large Vision-Language Model (LVLM) designed for comprehensive Computed Tomography (CT) understanding and analysis. Traditional medical vision-language models typically process 2D CT slices or 3D volumes separately. OmniCT introduces a unified framework capable of handling both slice-level and volume-level CT data, enabling flexible and scalable multimodal abilities across diverse CT analysis scenarios.
Model Variants
This repository provides weights of two model variants:
| Model | Parameters | Description |
|---|---|---|
| OmniCT-3B | ~3B | Ultra-lightweight CT understanding model |
| OmniCT-7B | ~7B | Lightweight model with stronger capability |
Model links:
OmniCT-3B
https://huggingface.co/Alibaba-DAMO-Academy/OmniCT-3BOmniCT-7B
https://huggingface.co/Alibaba-DAMO-Academy/OmniCT-7B
Training Strategy
OmniCT adopts a two-stage training pipeline.
Stage 1: Projection Alignment Pre-training
In the first stage, the vision encoder is aligned with the LLM backbone through a projection layer, enabling effective multimodal representation learning.
Projection layer weights are provided for reproducibility:
OmniCT-3B Projection Weights
https://huggingface.co/Alibaba-DAMO-Academy/OmniCT-3B/blob/main/projection-weights/model.safetensorsOmniCT-7B Projection Weights
https://huggingface.co/Alibaba-DAMO-Academy/OmniCT-7B/blob/main/projection-weights/model.safetensors
Stage 2: Instruction Fine-tuning
The model is further trained using CT-related multimodal instructions, enabling advanced abilities such as:
- CT visual question answering
- CT report generation
Usage
Please refer to the official GitHub repository for inference and training instructions:
https://github.com/alibaba-damo-academy/OmniCT
The repository provides:
- environment setup
- inference scripts
- training pipelines
- dataset preparation
Citation
If OmniCT is helpful for your research, please consider citing:
@article{lin2026omnict,
title={OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis},
author={Lin, Tianwei and Qiu, Zhongwei and Zhang, Wenqiao and Liu, Jiang and Xie, Yihan and Gao, Mingjian and Fan, Zhenxuan and Li, Zhaocheng and Li, Sijing and Xie, Zhongle and others},
journal={arXiv preprint arXiv:2602.16110},
year={2026}
}
- Downloads last month
- 26