File size: 1,288 Bytes
be05124 9d72dff be05124 b4d2c9b ddb8fbd b4d2c9b be05124 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | ---
license: apache-2.0
datasets:
- GoodBaiBai88/M3D-Cap
- GoodBaiBai88/M3D-VQA
language:
- en
base_model:
- microsoft/Phi-3-mini-128k-instruct
- GoodBaiBai88/M3D-CLIP
- google/siglip-large-patch16-256
pipeline_tag: image-text-to-text
---
# Med-2E3-M3D
## Introduction
A 3D medical LVLM, Med-2E3, trained on **3D** CT volumes and English medical texts ([M3D-Cap](https://huggingface.co/datasets/GoodBaiBai88/M3D-Cap) & [M3D-VQA](https://huggingface.co/datasets/GoodBaiBai88/M3D-VQA)), enabling tasks such as **report generation** and medical **VQA**.
| | Config |
| :--- | :---: |
| 3D Image encoder | GoodBaiBai88/M3D-CLIP |
| 2D Image encoder | google/siglip-large-patch16-256 |
| Connector | TG-IS scoring module |
| LLM | Qwen/Qwen2.5-3B-Instruct |
| Image resolution | 32\*256\*256 |
| Sequence length | 768 |
## Quickstart
Please refer to [Med-2E3](https://github.com/MSIIP/Med-2E3).
## Citation
``` bibtex
@inproceedings{shi2025med,
title={Med-2e3: A 2d-enhanced 3d medical multimodal large language model},
author={Shi, Yiming and Zhu, Xun and Wang, Kaiwen and Hu, Ying and Guo, Chenyi and Li, Miao and Wu, Ji},
booktitle={2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)},
pages={2754--2759},
year={2025},
organization={IEEE}
}
```
|