Med-2E3-M3D

Introduction

A 3D medical LVLM, Med-2E3, trained on 3D CT volumes and English medical texts (M3D-Cap & M3D-VQA), enabling tasks such as report generation and medical VQA.

Config
3D Image encoder GoodBaiBai88/M3D-CLIP
2D Image encoder google/siglip-large-patch16-256
Connector TG-IS scoring module
LLM Qwen/Qwen2.5-3B-Instruct
Image resolution 32*256*256
Sequence length 768

Quickstart

Please refer to Med-2E3.

Citation

@inproceedings{shi2025med,
  title={Med-2e3: A 2d-enhanced 3d medical multimodal large language model},
  author={Shi, Yiming and Zhu, Xun and Wang, Kaiwen and Hu, Ying and Guo, Chenyi and Li, Miao and Wu, Ji},
  booktitle={2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)},
  pages={2754--2759},
  year={2025},
  organization={IEEE}
}
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shiym2000/Med-2E3-M3D

Finetuned
(1)
this model

Dataset used to train shiym2000/Med-2E3-M3D