Med-2E3-M3D / README.md
nielsr's picture
nielsr HF Staff
Add `library_name` tag and paper link to model card
a326867 verified
|
raw
history blame
1.35 kB
metadata
base_model:
  - microsoft/Phi-3-mini-128k-instruct
  - GoodBaiBai88/M3D-CLIP
  - google/siglip-large-patch16-256
datasets:
  - GoodBaiBai88/M3D-Cap
  - GoodBaiBai88/M3D-VQA
language:
  - en
license: apache-2.0
pipeline_tag: image-text-to-text
library_name: transformers

Med-2E3-M3D

Introduction

A 3D medical LVLM, Med-2E3, trained on 3D CT volumes and English medical texts (M3D-Cap & M3D-VQA), enabling tasks such as report generation and medical VQA. This model is presented in the paper Med-2E3: A 2D-Enhanced 3D Medical Multimodal Large Language Model.

Config
3D Image encoder GoodBaiBai88/M3D-CLIP
2D Image encoder google/siglip-large-patch16-256
Connector TG-IS scoring module
LLM Qwen/Qwen2.5-3B-Instruct
Image resolution 32*256*256
Sequence length 768

Quickstart

Please refer to Med-2E3.

Citation

@article{shi2024med,
  title={Med-2E3: A 2D-enhanced 3D medical multimodal large language model},
  author={Shi, Yiming and Zhu, Xun and Hu, Ying and Guo, Chenyi and Li, Miao and Wu, Ji},
  journal={arXiv preprint arXiv:2411.12783},
  year={2024}
}