|
|
--- |
|
|
license: apache-2.0 |
|
|
pipeline_tag: image-text-to-text |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
This repository contains the official implementation of the paper: [Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis](https://huggingface.co/papers/2503.20047). |
|
|
|
|
|
Med3DVLM is a 3D VLM designed to address the challenges of 3D medical image analysis through efficient encoding, improved image-text alignment with a pairwise sigmoid loss, and a dual-stream MLP-Mixer projector for richer multi-modal representations. It achieves superior performance across multiple benchmarks including image-text retrieval, report generation, and open/closed-ended visual question answering. |
|
|
|
|
|
Code: https://github.com/mirthAI/Med3DVLM |
|
|
|
|
|
 |
|
|
|
|
|
## Installation |
|
|
First, clone the repository to your local machine: |
|
|
```bash |
|
|
git clone https://github.com/mirthAI/Med3DVLM.git |
|
|
cd Med3DVLM |
|
|
``` |
|
|
To install the required packages, you can use the following command: |
|
|
```bash |
|
|
conda create -n Med3DVLM -f env.yaml |
|
|
conda activate Med3DVLM |
|
|
``` |
|
|
or |
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
``` |
|
|
|
|
|
You need to set the `PYTHONPATH` environment variable to the root directory of the project. You can do this by running the following command in your terminal: |
|
|
|
|
|
```bash |
|
|
export PYTHONPATH=$(pwd):$PYTHONPATH |
|
|
``` |
|
|
|
|
|
## Sample Usage |
|
|
|
|
|
To run a demo in the terminal, use the following command (replace `path_to_model` and `path_to_image` with your actual paths): |
|
|
|
|
|
```bash |
|
|
python scr/demo/demo.py --model_name_or_path path_to_model --image_path path_to_image --question "Describe the findings of the medical image you see." |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
If you use our code or find our work helpful, please consider citing our paper: |
|
|
```bibtex |
|
|
@article{xin2025med3dvlm, |
|
|
title={Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis}, |
|
|
author={Xin, Yu and Ates, Gorkem Can and Gong, Kuang and Shao, Wei}, |
|
|
journal={IEEE Journal of Biomedical and Health Informatics}, |
|
|
year={2025} |
|
|
} |
|
|
``` |