Add model card for CoGaze

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-text-to-text
3
+ ---
4
+
5
+ # CoGaze: Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays
6
+
7
+ CoGaze is a vision-language pretraining framework designed for **chest X-ray understanding**, inspired by the diagnostic workflow of professional radiologists. It integrates clinical context (patient history, symptoms) and gaze probabilistic priors to improve cross-modal alignment and diagnostic reasoning.
8
+
9
+ [**Paper**](https://arxiv.org/abs/2603.26049) | [**GitHub**](https://github.com/mk-runner/CoGaze)
10
+
11
+ ## ✨ Overview
12
+ - **Context-aware reasoning:** A context-infused vision encoder models how radiologists integrate clinical context (including patient history and diagnostic intent) to guide reasoning.
13
+ - **Gaze-guided attention:** Radiologists' gaze data is used as probabilistic priors during pretraining to guide the model's attention toward diagnostically salient regions.
14
+ - **Versatile tasks:** Supports free-text and structured report generation, supervised and zero-shot classification, segmentation, and image-text retrieval.
15
+
16
+ ## 🧩 Model Zoo
17
+ The official repository includes the following artifacts:
18
+ - **CoGaze Pretrained Checkpoint:** Based on MIMIC-CXR.
19
+ - **Report Generation Model:** Fine-tuned on DistilGPT2.
20
+ - **Annotations:** Gaze heatmaps, image-text pairs, and SRRG annotations.
21
+
22
+ ## ⚙️ Installation
23
+ To use the official implementation, you can set up the environment as follows:
24
+ ```bash
25
+ conda create -n cogaze python=3.10.16
26
+ conda activate cogaze
27
+ pip install transformers==4.43.3 radgraph==0.09 pytorch-lighting==2.5.1.post0 torch==2.4.1 torchvision==0.19.1
28
+ ```
29
+
30
+ ## 🧠 Training & Inference
31
+ For detailed instructions on pretraining, report generation, and evaluation, please refer to the [official GitHub repository](https://github.com/mk-runner/CoGaze).
32
+
33
+ ## 📖 Citation
34
+ ```bibtex
35
+ @misc{2026-cogaze,
36
+ title={Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays},
37
+ author={Kang Liu and Zhuoqi Ma and Siyu Liang and Yunan Li and Xiyue Gao and Chao Liang and Kun Xie and Qiguang Miao},
38
+ year={2026},
39
+ eprint={2603.26049},
40
+ archivePrefix={arXiv},
41
+ primaryClass={cs.CV},
42
+ url={https://arxiv.org/abs/2603.26049},
43
+ }
44
+ ```