File size: 3,814 Bytes
f62ac4d 42e6a29 f62ac4d 42e6a29 f62ac4d 42e6a29 5768098 42e6a29 badf752 42e6a29 badf752 42e6a29 7b01585 d1e7c93 96fa694 d1e7c93 96fa694 d1e7c93 96fa694 42e6a29 5b0eba9 d1e7c93 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | ---
---
license: apache-2.0
tags:
- medical-imaging
- vision-language-model
- vlm
- lora
- graph-neural-networks
- zero-shot
metrics:
- accuracy
---
# ACE-LoRA: Graph-Attentive Context Enhancement for Medical VLMs
<div align="center">
<a href="https://arxiv.org/pdf/2603.17079">
<img src="https://img.shields.io/badge/arXiv-2603.17079-b31b1b.svg" alt="arXiv">
</a>
</div>
**ACE-LoRA** is a parameter-efficient adaptation framework designed for generalist medical Vision-Language Models (VLMs). It addresses the specialization–generalization trade-off by integrating Low-Rank Adaptation (LoRA) with a novel **Attention-based Context Enhancement Hypergraph Neural Network (ACE-HGNN)**.
## Model Description
Existing medical VLMs often struggle to balance broad semantic understanding with fine-grained diagnostic cues. ACE-LoRA bridges this gap by adding only **0.95M** trainable parameters to frozen image-text encoders.
### Key Features:
* **ACE-HGNN Module:** Captures higher-order contextual interactions beyond pairwise similarity, enriching global representations with localized diagnostic details.
* **Label-Guided InfoNCE Loss:** A specialized loss formulation designed to suppress false negatives between semantically related image-text pairs, improving cross-modal alignment.
* **Efficiency:** Achieves state-of-the-art performance across multiple domains while keeping the backbone frozen.
### Environment Setup
The framework was developed using `Python 3.10.18` and `PyTorch 2.1.0` with `CUDA 11.8`.
```
conda create -n ace_lora python=3.10.18
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt
```
### Inference
We provide an inference code sample (`hf_model_inference.py`) for the RSNA dataset.
## Datasets
**MIMIC-CXR:** For pretraining, we use the MIMIC-CXR dataset and exclude lateral images. Access to the dataset is available at the following link (note that you must satisfy the dataset provider’s requirements to download the data): [[`link`](https://physionet.org/content/mimic-cxr-jpg/2.1.0/)]
**NIH Chest X-ray:** For validation, we use the NIH Chest X-ray dataset. The dataset can be accessed at the following link: [[`link`](https://nihcc.app.box.com/v/ChestXray-NIHCC)]. After downloading, run ```dataset_prep/chestx-ray_14_prep.py``` from our github repo to split the data and prepare it in the required format.
**CheXpert 5x200:** For zero-shot classification, we use the CheXpert 5×200 dataset. The dataset can be accessed at the following link: [[`link`](https://stanfordmedicine.app.box.com/s/j5h7q99f3pfi7enc0dom73m4nsm6yzvh)].
**RSNA:** We use the RSNA dataset for both zero-shot classification and object detection. The dataset can be accessed at the following link: [[`link`](https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge/data)]. After downloading, run ```dataset_prep/rsna_dataset_create.py``` from our github repo to split the data and prepare it in the required format for both tasks.
**SIIM:** We use the SIIM dataset for both zero-shot classification and semantic segmentation. The dataset can be accessed at the following link: [[`link`](https://www.kaggle.com/competitions/siim-acr-pneumothorax-segmentation/data)]. After downloading, run ```dataset_prep/SIIM_generate_class_labels.py``` from our github repo to prepare the data for zero-shot classification, and ```dataset_prep/SIIM_generate_mask.py``` for semantic segmentation.
- Code: https://github.com/icon-lab/ACE-LoRA
- Paper: https://arxiv.org/pdf/2603.17079
## 🤝 Acknowledgments
This implementation builds upon [CLIP-LoRA](https://github.com/MaxZanella/CLIP-LoRA) and [LoRA](https://github.com/microsoft/LoRA). We gratefully acknowledge their valuable contributions. |