Enhance model card with metadata and usage example

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +54 -3
README.md CHANGED
@@ -1,3 +1,54 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-text-to-text
4
+ library_name: transformers
5
+ ---
6
+
7
+ This repository contains the official implementation of the paper: [Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis](https://huggingface.co/papers/2503.20047).
8
+
9
+ Med3DVLM is a 3D VLM designed to address the challenges of 3D medical image analysis through efficient encoding, improved image-text alignment with a pairwise sigmoid loss, and a dual-stream MLP-Mixer projector for richer multi-modal representations. It achieves superior performance across multiple benchmarks including image-text retrieval, report generation, and open/closed-ended visual question answering.
10
+
11
+ Code: https://github.com/mirthAI/Med3DVLM
12
+
13
+ ![Med3DVLM Architecture](https://github.com/mirthAI/Med3DVLM/raw/main/docs/pipeline.png)
14
+
15
+ ## Installation
16
+ First, clone the repository to your local machine:
17
+ ```bash
18
+ git clone https://github.com/mirthAI/Med3DVLM.git
19
+ cd Med3DVLM
20
+ ```
21
+ To install the required packages, you can use the following command:
22
+ ```bash
23
+ conda create -n Med3DVLM -f env.yaml
24
+ conda activate Med3DVLM
25
+ ```
26
+ or
27
+ ```bash
28
+ pip install -r requirements.txt
29
+ ```
30
+
31
+ You need to set the `PYTHONPATH` environment variable to the root directory of the project. You can do this by running the following command in your terminal:
32
+
33
+ ```bash
34
+ export PYTHONPATH=$(pwd):$PYTHONPATH
35
+ ```
36
+
37
+ ## Sample Usage
38
+
39
+ To run a demo in the terminal, use the following command (replace `path_to_model` and `path_to_image` with your actual paths):
40
+
41
+ ```bash
42
+ python scr/demo/demo.py --model_name_or_path path_to_model --image_path path_to_image --question "Describe the findings of the medical image you see."
43
+ ```
44
+
45
+ ## Citation
46
+ If you use our code or find our work helpful, please consider citing our paper:
47
+ ```bibtex
48
+ @article{xin2025med3dvlm,
49
+ title={Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis},
50
+ author={Xin, Yu and Ates, Gorkem Can and Gong, Kuang and Shao, Wei},
51
+ journal={IEEE Journal of Biomedical and Health Informatics},
52
+ year={2025}
53
+ }
54
+ ```