aydnarda commited on
Commit
42e6a29
·
1 Parent(s): a56c7bf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -4
README.md CHANGED
@@ -1,10 +1,58 @@
1
  ---
2
- license: mit
3
- pipeline_tag: zero-shot-classification
4
  tags:
5
- - model_hub_mixin
6
- - pytorch_model_hub_mixin
 
 
 
 
 
 
 
 
7
  ---
8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - Code: https://github.com/icon-lab/ACE-LoRA
10
  - Paper: https://arxiv.org/pdf/2603.17079
 
1
  ---
2
+ ---
3
+ license: apache-2.0
4
  tags:
5
+ - medical-imaging
6
+ - vision-language-model
7
+ - vlm
8
+ - lora
9
+ - graph-neural-networks
10
+ - zero-shot
11
+ datasets:
12
+ - physionet/mimic-cxr-jpg
13
+ metrics:
14
+ - accuracy
15
  ---
16
 
17
+ # ACE-LoRA: Graph-Attentive Context Enhancement for Medical VLMs
18
+
19
+ <div align="center">
20
+ <a href="https://arxiv.org/pdf/2603.17079">
21
+ <img src="https://img.shields.io/badge/arXiv-2603.17079-b31b1b.svg" alt="arXiv">
22
+ </a>
23
+ </div>
24
+
25
+ **ACE-LoRA** is a parameter-efficient adaptation framework designed for generalist medical Vision-Language Models (VLMs). It addresses the specialization–generalization trade-off by integrating Low-Rank Adaptation (LoRA) with a novel **Attention-based Context Enhancement Hypergraph Neural Network (ACE-HGNN)**.
26
+
27
+ ## Model Description
28
+
29
+ Existing medical VLMs often struggle to balance broad semantic understanding with fine-grained diagnostic cues. ACE-LoRA bridges this gap by adding only **$0.95M$** trainable parameters to frozen image-text encoders.
30
+
31
+ ### Key Features:
32
+ * **ACE-HGNN Module:** Captures higher-order contextual interactions beyond pairwise similarity, enriching global representations with localized diagnostic details.
33
+ * **Label-Guided InfoNCE Loss:** A specialized loss formulation designed to suppress false negatives between semantically related image-text pairs, improving cross-modal alignment.
34
+ * **Efficiency:** Achieves state-of-the-art (SOTA) performance across multiple domains while keeping the backbone frozen.
35
+
36
+ > [!NOTE]
37
+ > **Abstract:** ACE-LoRA integrates LoRA modules into frozen image-text encoders and introduces a Hypergraph Neural Network to capture contextual interactions. Despite its minimal parameter footprint, it consistently outperforms SOTA medical VLMs and PEFT baselines in zero-shot classification, segmentation, and detection tasks.
38
+
39
+ ---
40
+
41
+ ## Technical Specifications
42
+
43
+ ### Architecture Overview
44
+ The model utilizes a frozen CLIP-like backbone enhanced with:
45
+ 1. **LoRA Adapters:** Plotted within the transformer layers of the vision and text encoders.
46
+ 2. **ACE-HGNN:** A hypergraph-based module that processes localized features to capture complex diagnostic patterns.
47
+
48
+
49
+ ### Environment Setup
50
+ The framework was developed using `Python 3.10.18` and `PyTorch 2.1.0` with `CUDA 11.8`.
51
+
52
+ ```bash
53
+ conda create -n ace_lora python=3.10.18
54
+ conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=11.8 -c pytorch -c nvidia
55
+ pip install -r requirements.txt
56
+
57
  - Code: https://github.com/icon-lab/ACE-LoRA
58
  - Paper: https://arxiv.org/pdf/2603.17079