DCFormer_SigLIP / README.md
nielsr's picture
nielsr HF Staff
Enhance model card with metadata and usage example
3eb72e3 verified
|
raw
history blame
2.04 kB
metadata
license: apache-2.0
pipeline_tag: image-text-to-text
library_name: transformers

This repository contains the official implementation of the paper: Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis.

Med3DVLM is a 3D VLM designed to address the challenges of 3D medical image analysis through efficient encoding, improved image-text alignment with a pairwise sigmoid loss, and a dual-stream MLP-Mixer projector for richer multi-modal representations. It achieves superior performance across multiple benchmarks including image-text retrieval, report generation, and open/closed-ended visual question answering.

Code: https://github.com/mirthAI/Med3DVLM

Med3DVLM Architecture

Installation

First, clone the repository to your local machine:

git clone https://github.com/mirthAI/Med3DVLM.git
cd Med3DVLM

To install the required packages, you can use the following command:

conda create -n Med3DVLM -f env.yaml
conda activate Med3DVLM

or

pip install -r requirements.txt

You need to set the PYTHONPATH environment variable to the root directory of the project. You can do this by running the following command in your terminal:

export PYTHONPATH=$(pwd):$PYTHONPATH

Sample Usage

To run a demo in the terminal, use the following command (replace path_to_model and path_to_image with your actual paths):

python scr/demo/demo.py --model_name_or_path path_to_model --image_path path_to_image --question "Describe the findings of the medical image you see."

Citation

If you use our code or find our work helpful, please consider citing our paper:

@article{xin2025med3dvlm,
  title={Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis},
  author={Xin, Yu and Ates, Gorkem Can and Gong, Kuang and Shao, Wei},
  journal={IEEE Journal of Biomedical and Health Informatics},
  year={2025}
}