CoGaze / README.md
MK-runner's picture
Create README.md
333084b verified
# 🩺 CoGaze: Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays
## ✨ Overview
**CoGaze** is a vision-language pretraining framework designed for **chest X-ray understanding**, inspired by how radiologists interpret medical images.
It integrates:
- πŸ‘οΈ Gaze information is used during pretraining, while downstream tasks (report generation, classification, and segmentation) do not require gaze data.
- 🧠 Context-aware reasoning
- πŸ“ Free-text & structured report generation, supervised & zero-shot classification, segmentation, image-text retrieval
---
## πŸ“° News
- **[2026-03-28]** πŸš€ Official code and pretrained models are released on [Hugging Face](https://huggingface.co/MK-runner/CoGaze)
- **Github** https://github.com/mk-runner/CoGaze
---
## βš™οΈ Installation
```bash
# Create conda environment
conda create -n cogaze python=3.10.16
conda activate cogaze
````
### πŸ“¦ Core Dependencies
```txt
transformers==4.43.3
radgraph==0.09
pytorch-lighting==2.5.1.post0
torch==2.4.1
torchvision==0.19.1
```
---
## 🧩 Model Zoo
| Dataset | Pretrained Model | Report Generation Model | Outputs |
| ------------- | ------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
| **MIMIC-CXR** | [CoGaze Pretrained Checkpoint](https://huggingface.co/MK-runner/CoGaze/blob/main/mimic_pretrain_best_model.pt) | [CoGaze (DistilGPT2)](https://huggingface.co/MK-runner/CoGaze/blob/main/distilgpt2_mimic_free_text_report_generation_best_model.pt) | [Generated Reports](https://github.com/mk-runner/CoGaze/tree/main/generated_reports) |
---
## πŸ“ Dataset Preparation
### 1️⃣ MIMIC-CXR Images
Dataset source: [PhysioNet](https://physionet.org/content/mimic-cxr/2.0.0/)
```
data/
β”œβ”€β”€ p10/
β”‚ └── p10000032/
β”‚ └── s50414267/
β”‚ β”œβ”€β”€ image1.jpg
β”‚ └── image2.jpg
β”œβ”€β”€ p11/
└── ...
```
---
### 2️⃣ Annotations & Reports
Available on πŸ€— Hugging Face:
* Gaze heatmap
* Image-text pairs
* SRRG annotations
πŸ‘‰ [https://huggingface.co/MK-runner/CoGaze/tree/main/mimic-annotation](https://huggingface.co/MK-runner/CoGaze/tree/main/mimic-annotation)
---
### 3️⃣ Checkpoint Structure
```
ckpt_zoo_dir/
β”œβ”€β”€ chexbert.pth
β”œβ”€β”€ radgraph/
β”œβ”€β”€ google-bert/
β”œβ”€β”€ microsoft/
└── distilgpt2/
```
⚠️ **Manual download required:**
* `chexbert.pth`
* `radgraph`
See: [https://github.com/mk-runner/MLRG](https://github.com/mk-runner/MLRG)
πŸ’‘ Tip: Enable automatic download during training:
```bash
--online_ckpt "Yes"
```
---
### 4️⃣ Additional Datasets
| Task | Dataset |
| -------------- | ----------------------------------------------------------------------------------------------- |
| Classification | [NIH Chest X-rays](https://huggingface.co/datasets/alkzar90/NIH-Chest-X-ray-dataset) |
| Detection | [RSNA Pneumonia](https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge) |
| Segmentation | [SIIM-ACR](https://www.kaggle.com/datasets/vbookshelf/pneumothorax-chest-xray-images-and-masks) |
| Tuberculosis | [TBX11K](https://www.kaggle.com/datasets/vbookshelf/tbx11k-simplified) |
| External | [Shenzhen Dataset](https://openi.nlm.nih.gov/imgs/collections/ChinaSet_AllFiles.zip) |
---
## 🧠 Training & Inference
### πŸ”Ή Pretraining
```bash
bash script/pretrain.sh
```
---
### πŸ”Ή Report Generation
#### Free-text (Training)
```bash
bash script/free-text-report-generation-gpt2.sh
bash script/free-text-report-generation-llm.sh
```
#### Free-text (Inference)
```bash
bash script/free-text-report-generation-gpt2-inference.sh
```
#### Structured Reports
```bash
bash script/structured-report-generation-gpt2.sh
```
---
## πŸ“Š Evaluation
### πŸ”Ή Compute Metrics
```python
from tools.metrics.metrics import compute_all_scores
import pandas as pd
data = pd.read_csv("generated_reports/xxx.csv")
gts = data['reference_report'].tolist()
gens = data['generated_report'].tolist()
scores = compute_all_scores(gts, gens, args)
print(scores)
```
---
### πŸ“ˆ Performance (DistilGPT2)
```python
{
'BertScore': 0.5956377387046814,
'Radgraph-simple': 0.30690433233898795,
'Radgraph-partial': 0.28076371917819565,
'Radgraph-complete': 0.22603009157065043,
'SemScore': 0.45877182483673096,
'1/RadCliQ-V1': 1.082196619824061,
'RATEScore': 0.5787309255637078,
'chexbert_5_micro_f1': 0.5708835341365461,
'chexbert_5_macro_f1': 0.49498245207765257,
'chexbert_all_micro_p': 0.5544458762886598,
'chexbert_all_micro_r': 0.4980706154736639,
'chexbert_all_micro_f1': 0.5247484500457363,
'chexbert_all_macro_p': 0.44258976034375364,
'chexbert_all_macro_r': 0.37672752858687886,
'chexbert_all_macro_f1': 0.3883859770668801,
'BLEU_1': 0.4103171077382396,
'BLEU_2': 0.28970066408787387,
'BLEU_3': 0.22010546378006685,
'BLEU_4': 0.17481171574606008,
'METEOR': 0.19054219748683743,
'ROUGE_L': 0.3257898419599922,
'CIDer': 0.3962696560568994
}
```
---
## πŸ“š Citation
```bibtex
@misc{2026-cogaze,
title={Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays},
author={Kang Liu and Zhuoqi Ma and Siyu Liang and Yunan Li and Xiyue Gao and Chao Liang and Kun Xie and Qiguang Miao},
year={2026},
eprint={2603.26049},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.26049},
}
```
---
## πŸ™ Acknowledgements
* [MLRG](https://github.com/mk-runner/MLRG) β€” dataset & evaluation tools
* [cvt2distilgpt2](https://github.com/aehrc/cvt2distilgpt2) β€” text generation initialization
---
## ⭐ Support
If you find this project useful:
* ⭐ Star this repository
* πŸ› Open issues for questions or bugs
* πŸ“¬ Contact Kang Liu (kangliu422@gmail.com) for collaboration
---