OmniCT-7B / README.md
lintw's picture
Update README.md
846d4ac verified
metadata
license: apache-2.0
language:
  - en
base_model:
  - Qwen/Qwen2.5-7B-Instruct
pipeline_tag: image-text-to-text
tags:
  - medical
  - multimodal
  - report generation
  - Computed Tomography(CT)
  - VQA

OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis

📄 Paper    🤖 OmniCT-3B    🤖 OmniCT-7B    💻 GitHub

Overview

OmniCT is a unified Slice-Volume Large Vision-Language Model (LVLM) designed for comprehensive Computed Tomography (CT) understanding and analysis. Traditional medical vision-language models typically process 2D CT slices or 3D volumes separately. OmniCT introduces a unified framework capable of handling both slice-level and volume-level CT data, enabling flexible and scalable multimodal abilities across diverse CT analysis scenarios.

Model Variants

This repository provides weights of two model variants:

Model Parameters Description
OmniCT-3B ~3B Ultra-lightweight CT understanding model
OmniCT-7B ~7B Lightweight model with stronger capability

Model links:

Training Strategy

OmniCT adopts a two-stage training pipeline.

Stage 1: Projection Alignment Pre-training

In the first stage, the vision encoder is aligned with the LLM backbone through a projection layer, enabling effective multimodal representation learning.

Projection layer weights are provided for reproducibility:

Stage 2: Instruction Fine-tuning

The model is further trained using CT-related multimodal instructions, enabling advanced abilities such as:

  • CT visual question answering
  • CT report generation

Usage

Please refer to the official GitHub repository for inference and training instructions:

https://github.com/alibaba-damo-academy/OmniCT

The repository provides:

  • environment setup
  • inference scripts
  • training pipelines
  • dataset preparation

Citation

If OmniCT is helpful for your research, please consider citing:


@article{lin2026omnict,
title={OmniCT: Towards a Unified Slice-Volume LVLM for Comprehensive CT Analysis},
author={Lin, Tianwei and Qiu, Zhongwei and Zhang, Wenqiao and Liu, Jiang and Xie, Yihan and Gao, Mingjian and Fan, Zhenxuan and Li, Zhaocheng and Li, Sijing and Xie, Zhongle and others},
journal={arXiv preprint arXiv:2602.16110},
year={2026}
}