File size: 1,288 Bytes
be05124
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9d72dff
be05124
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b4d2c9b
ddb8fbd
 
b4d2c9b
 
 
 
be05124
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
license: apache-2.0
datasets:
- GoodBaiBai88/M3D-Cap
- GoodBaiBai88/M3D-VQA
language:
- en
base_model:
- microsoft/Phi-3-mini-128k-instruct
- GoodBaiBai88/M3D-CLIP
- google/siglip-large-patch16-256
pipeline_tag: image-text-to-text
---

# Med-2E3-M3D

## Introduction

A 3D medical LVLM, Med-2E3, trained on **3D** CT volumes and English medical texts ([M3D-Cap](https://huggingface.co/datasets/GoodBaiBai88/M3D-Cap) & [M3D-VQA](https://huggingface.co/datasets/GoodBaiBai88/M3D-VQA)), enabling tasks such as **report generation** and medical **VQA**.

| | Config |
| :--- | :---: |
| 3D Image encoder | GoodBaiBai88/M3D-CLIP |
| 2D Image encoder | google/siglip-large-patch16-256 |
| Connector | TG-IS scoring module |
| LLM | Qwen/Qwen2.5-3B-Instruct |
| Image resolution | 32\*256\*256 |
| Sequence length | 768 |

## Quickstart

Please refer to [Med-2E3](https://github.com/MSIIP/Med-2E3).

## Citation

``` bibtex
@inproceedings{shi2025med,
  title={Med-2e3: A 2d-enhanced 3d medical multimodal large language model},
  author={Shi, Yiming and Zhu, Xun and Wang, Kaiwen and Hu, Ying and Guo, Chenyi and Li, Miao and Wu, Ji},
  booktitle={2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)},
  pages={2754--2759},
  year={2025},
  organization={IEEE}
}
```