---
license: apache-2.0
datasets:
- GoodBaiBai88/M3D-Cap
- GoodBaiBai88/M3D-VQA
language:
- en
base_model:
- microsoft/Phi-3-mini-128k-instruct
- GoodBaiBai88/M3D-CLIP
- google/siglip-large-patch16-256
pipeline_tag: image-text-to-text
---

# Med-2E3-M3D

## Introduction

A 3D medical LVLM, Med-2E3, trained on **3D** CT volumes and English medical texts ([M3D-Cap](https://huggingface.co/datasets/GoodBaiBai88/M3D-Cap) & [M3D-VQA](https://huggingface.co/datasets/GoodBaiBai88/M3D-VQA)), enabling tasks such as **report generation** and medical **VQA**.

| | Config |
| :--- | :---: |
| 3D Image encoder | GoodBaiBai88/M3D-CLIP |
| 2D Image encoder | google/siglip-large-patch16-256 |
| Connector | TG-IS scoring module |
| LLM | Qwen/Qwen2.5-3B-Instruct |
| Image resolution | 32\*256\*256 |
| Sequence length | 768 |

## Quickstart

Please refer to [Med-2E3](https://github.com/MSIIP/Med-2E3).

## Citation

``` bibtex
@inproceedings{shi2025med,
  title={Med-2e3: A 2d-enhanced 3d medical multimodal large language model},
  author={Shi, Yiming and Zhu, Xun and Wang, Kaiwen and Hu, Ying and Guo, Chenyi and Li, Miao and Wu, Ji},
  booktitle={2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)},
  pages={2754--2759},
  year={2025},
  organization={IEEE}
}
```