Add model card for CoGaze
#1
by nielsr HF Staff - opened
README.md
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: image-text-to-text
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# CoGaze: Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays
|
| 6 |
+
|
| 7 |
+
CoGaze is a vision-language pretraining framework designed for **chest X-ray understanding**, inspired by the diagnostic workflow of professional radiologists. It integrates clinical context (patient history, symptoms) and gaze probabilistic priors to improve cross-modal alignment and diagnostic reasoning.
|
| 8 |
+
|
| 9 |
+
[**Paper**](https://arxiv.org/abs/2603.26049) | [**GitHub**](https://github.com/mk-runner/CoGaze)
|
| 10 |
+
|
| 11 |
+
## ✨ Overview
|
| 12 |
+
- **Context-aware reasoning:** A context-infused vision encoder models how radiologists integrate clinical context (including patient history and diagnostic intent) to guide reasoning.
|
| 13 |
+
- **Gaze-guided attention:** Radiologists' gaze data is used as probabilistic priors during pretraining to guide the model's attention toward diagnostically salient regions.
|
| 14 |
+
- **Versatile tasks:** Supports free-text and structured report generation, supervised and zero-shot classification, segmentation, and image-text retrieval.
|
| 15 |
+
|
| 16 |
+
## 🧩 Model Zoo
|
| 17 |
+
The official repository includes the following artifacts:
|
| 18 |
+
- **CoGaze Pretrained Checkpoint:** Based on MIMIC-CXR.
|
| 19 |
+
- **Report Generation Model:** Fine-tuned on DistilGPT2.
|
| 20 |
+
- **Annotations:** Gaze heatmaps, image-text pairs, and SRRG annotations.
|
| 21 |
+
|
| 22 |
+
## ⚙️ Installation
|
| 23 |
+
To use the official implementation, you can set up the environment as follows:
|
| 24 |
+
```bash
|
| 25 |
+
conda create -n cogaze python=3.10.16
|
| 26 |
+
conda activate cogaze
|
| 27 |
+
pip install transformers==4.43.3 radgraph==0.09 pytorch-lighting==2.5.1.post0 torch==2.4.1 torchvision==0.19.1
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
## 🧠 Training & Inference
|
| 31 |
+
For detailed instructions on pretraining, report generation, and evaluation, please refer to the [official GitHub repository](https://github.com/mk-runner/CoGaze).
|
| 32 |
+
|
| 33 |
+
## 📖 Citation
|
| 34 |
+
```bibtex
|
| 35 |
+
@misc{2026-cogaze,
|
| 36 |
+
title={Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays},
|
| 37 |
+
author={Kang Liu and Zhuoqi Ma and Siyu Liang and Yunan Li and Xiyue Gao and Chao Liang and Kun Xie and Qiguang Miao},
|
| 38 |
+
year={2026},
|
| 39 |
+
eprint={2603.26049},
|
| 40 |
+
archivePrefix={arXiv},
|
| 41 |
+
primaryClass={cs.CV},
|
| 42 |
+
url={https://arxiv.org/abs/2603.26049},
|
| 43 |
+
}
|
| 44 |
+
```
|