MK-runner
/

CoGaze

Model card Files Files and versions

xet

Community

Add model card for CoGaze

by nielsr HF Staff - opened Mar 30

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+44

-0

Files changed (1) hide show

README.md +44 -0

README.md ADDED Viewed

	@@ -0,0 +1,44 @@

+---
+pipeline_tag: image-text-to-text
+---
+# CoGaze: Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays
+CoGaze is a vision-language pretraining framework designed for **chest X-ray understanding**, inspired by the diagnostic workflow of professional radiologists. It integrates clinical context (patient history, symptoms) and gaze probabilistic priors to improve cross-modal alignment and diagnostic reasoning.
+[**Paper**](https://arxiv.org/abs/2603.26049) | [**GitHub**](https://github.com/mk-runner/CoGaze)
+## ✨ Overview
+- **Context-aware reasoning:** A context-infused vision encoder models how radiologists integrate clinical context (including patient history and diagnostic intent) to guide reasoning.
+- **Gaze-guided attention:** Radiologists' gaze data is used as probabilistic priors during pretraining to guide the model's attention toward diagnostically salient regions.
+- **Versatile tasks:** Supports free-text and structured report generation, supervised and zero-shot classification, segmentation, and image-text retrieval.
+## 🧩 Model Zoo
+The official repository includes the following artifacts:
+- **CoGaze Pretrained Checkpoint:** Based on MIMIC-CXR.
+- **Report Generation Model:** Fine-tuned on DistilGPT2.
+- **Annotations:** Gaze heatmaps, image-text pairs, and SRRG annotations.
+## ⚙️ Installation
+To use the official implementation, you can set up the environment as follows:
+```bash
+conda create -n cogaze python=3.10.16
+conda activate cogaze
+pip install transformers==4.43.3 radgraph==0.09 pytorch-lighting==2.5.1.post0 torch==2.4.1 torchvision==0.19.1
+```
+## 🧠 Training & Inference
+For detailed instructions on pretraining, report generation, and evaluation, please refer to the [official GitHub repository](https://github.com/mk-runner/CoGaze).
+## 📖 Citation
+```bibtex
+@misc{2026-cogaze,
+      title={Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays},
+      author={Kang Liu and Zhuoqi Ma and Siyu Liang and Yunan Li and Xiyue Gao and Chao Liang and Kun Xie and Qiguang Miao},
+      year={2026},
+      eprint={2603.26049},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2603.26049},
+}
+```