File size: 3,814 Bytes

f62ac4d
42e6a29
 
f62ac4d
42e6a29
 
 
 
 
 
 
 
f62ac4d
 
42e6a29
 
 
 
 
 
 
 
 
 
 
 
5768098
42e6a29
 
 
 
badf752
42e6a29
 
 
 
 
badf752
42e6a29
 
7b01585
 
 
d1e7c93
 
 
 
 
 
 
96fa694
d1e7c93
 
 
96fa694
d1e7c93
96fa694
42e6a29
5b0eba9
d1e7c93

---
---
license: apache-2.0
tags:
- medical-imaging
- vision-language-model
- vlm
- lora
- graph-neural-networks
- zero-shot
metrics:
- accuracy
---

# ACE-LoRA: Graph-Attentive Context Enhancement for Medical VLMs

<div align="center">
  <a href="https://arxiv.org/pdf/2603.17079">
    <img src="https://img.shields.io/badge/arXiv-2603.17079-b31b1b.svg" alt="arXiv">
  </a>
</div>

**ACE-LoRA** is a parameter-efficient adaptation framework designed for generalist medical Vision-Language Models (VLMs). It addresses the specialization–generalization trade-off by integrating Low-Rank Adaptation (LoRA) with a novel **Attention-based Context Enhancement Hypergraph Neural Network (ACE-HGNN)**.

## Model Description

Existing medical VLMs often struggle to balance broad semantic understanding with fine-grained diagnostic cues. ACE-LoRA bridges this gap by adding only **0.95M** trainable parameters to frozen image-text encoders.

### Key Features:
* **ACE-HGNN Module:** Captures higher-order contextual interactions beyond pairwise similarity, enriching global representations with localized diagnostic details.
* **Label-Guided InfoNCE Loss:** A specialized loss formulation designed to suppress false negatives between semantically related image-text pairs, improving cross-modal alignment.
* **Efficiency:** Achieves state-of-the-art performance across multiple domains while keeping the backbone frozen.


### Environment Setup
The framework was developed using `Python 3.10.18` and `PyTorch 2.1.0` with `CUDA 11.8`.

```
conda create -n ace_lora python=3.10.18
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
pip install -r requirements.txt

```
### Inference
We provide an inference code sample (`hf_model_inference.py`) for the RSNA dataset.

## Datasets

**MIMIC-CXR:** For pretraining, we use the MIMIC-CXR dataset and exclude lateral images. Access to the dataset is available at the following link (note that you must satisfy the dataset provider’s requirements to download the data): [[`link`](https://physionet.org/content/mimic-cxr-jpg/2.1.0/)] 

**NIH Chest X-ray:** For validation, we use the NIH Chest X-ray dataset. The dataset can be accessed at the following link: [[`link`](https://nihcc.app.box.com/v/ChestXray-NIHCC)]. After downloading, run ```dataset_prep/chestx-ray_14_prep.py``` from our github repo to split the data and prepare it in the required format.

**CheXpert 5x200:** For zero-shot classification, we use the CheXpert 5×200 dataset. The dataset can be accessed at the following link: [[`link`](https://stanfordmedicine.app.box.com/s/j5h7q99f3pfi7enc0dom73m4nsm6yzvh)].

**RSNA:** We use the RSNA dataset for both zero-shot classification and object detection. The dataset can be accessed at the following link: [[`link`](https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge/data)]. After downloading, run ```dataset_prep/rsna_dataset_create.py``` from our github repo to split the data and prepare it in the required format for both tasks.

**SIIM:** We use the SIIM dataset for both zero-shot classification and semantic segmentation. The dataset can be accessed at the following link: [[`link`](https://www.kaggle.com/competitions/siim-acr-pneumothorax-segmentation/data)]. After downloading, run ```dataset_prep/SIIM_generate_class_labels.py``` from our github repo to prepare the data for zero-shot classification, and ```dataset_prep/SIIM_generate_mask.py``` for semantic segmentation.

- Code: https://github.com/icon-lab/ACE-LoRA
- Paper: https://arxiv.org/pdf/2603.17079

## 🤝 Acknowledgments
This implementation builds upon [CLIP-LoRA](https://github.com/MaxZanella/CLIP-LoRA) and [LoRA](https://github.com/microsoft/LoRA). We gratefully acknowledge their valuable contributions.