--- license: apache-2.0 datasets: - GoodBaiBai88/M3D-Cap - GoodBaiBai88/M3D-VQA language: - en base_model: - microsoft/Phi-3-mini-128k-instruct - GoodBaiBai88/M3D-CLIP - google/siglip-large-patch16-256 pipeline_tag: image-text-to-text --- # Med-2E3-M3D ## Introduction A 3D medical LVLM, Med-2E3, trained on **3D** CT volumes and English medical texts ([M3D-Cap](https://huggingface.co/datasets/GoodBaiBai88/M3D-Cap) & [M3D-VQA](https://huggingface.co/datasets/GoodBaiBai88/M3D-VQA)), enabling tasks such as **report generation** and medical **VQA**. | | Config | | :--- | :---: | | 3D Image encoder | GoodBaiBai88/M3D-CLIP | | 2D Image encoder | google/siglip-large-patch16-256 | | Connector | TG-IS scoring module | | LLM | Qwen/Qwen2.5-3B-Instruct | | Image resolution | 32\*256\*256 | | Sequence length | 768 | ## Quickstart Please refer to [Med-2E3](https://github.com/MSIIP/Med-2E3). ## Citation ``` bibtex @inproceedings{shi2025med, title={Med-2e3: A 2d-enhanced 3d medical multimodal large language model}, author={Shi, Yiming and Zhu, Xun and Wang, Kaiwen and Hu, Ying and Guo, Chenyi and Li, Miao and Wu, Ji}, booktitle={2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)}, pages={2754--2759}, year={2025}, organization={IEEE} } ```