CoGaze / README.md

Create README.md

333084b verified 1 day ago

6.56 kB

	# 🩺 CoGaze: Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays

	## ✨ Overview

	CoGaze is a vision-language pretraining framework designed for chest X-ray understanding, inspired by how radiologists interpret medical images.

	It integrates:

	- 👁️ Gaze information is used during pretraining, while downstream tasks (report generation, classification, and segmentation) do not require gaze data.
	- 🧠 Context-aware reasoning
	- 📝 Free-text & structured report generation, supervised & zero-shot classification, segmentation, image-text retrieval

	---

	## 📰 News

	- [2026-03-28] 🚀 Official code and pretrained models are released on [Hugging Face](https://huggingface.co/MK-runner/CoGaze)
	- Github https://github.com/mk-runner/CoGaze

	---

	## ⚙️ Installation

	```bash
	# Create conda environment
	conda create -n cogaze python=3.10.16
	conda activate cogaze
	````

	### 📦 Core Dependencies

	```txt
	transformers==4.43.3
	radgraph==0.09
	pytorch-lighting==2.5.1.post0
	torch==2.4.1
	torchvision==0.19.1
	```

	---

	## 🧩 Model Zoo

	\| Dataset \| Pretrained Model \| Report Generation Model \| Outputs \|
	\| ------------- \| ------------------------------------------------------------------------------------------------------- \| ----------------------------------------------------------------------------------------------------------- \| ------------------------------------------------------------------------------------ \|
	\| MIMIC-CXR \| [CoGaze Pretrained Checkpoint](https://huggingface.co/MK-runner/CoGaze/blob/main/mimic_pretrain_best_model.pt) \| [CoGaze (DistilGPT2)](https://huggingface.co/MK-runner/CoGaze/blob/main/distilgpt2_mimic_free_text_report_generation_best_model.pt) \| [Generated Reports](https://github.com/mk-runner/CoGaze/tree/main/generated_reports) \|

	---

	## 📁 Dataset Preparation

	### 1️⃣ MIMIC-CXR Images

	Dataset source: [PhysioNet](https://physionet.org/content/mimic-cxr/2.0.0/)

	```
	data/
	├── p10/
	│ └── p10000032/
	│ └── s50414267/
	│ ├── image1.jpg
	│ └── image2.jpg
	├── p11/
	└── ...
	```

	---

	### 2️⃣ Annotations & Reports

	Available on 🤗 Hugging Face:

	* Gaze heatmap
	* Image-text pairs
	* SRRG annotations

	👉 [https://huggingface.co/MK-runner/CoGaze/tree/main/mimic-annotation](https://huggingface.co/MK-runner/CoGaze/tree/main/mimic-annotation)

	---

	### 3️⃣ Checkpoint Structure

	```
	ckpt_zoo_dir/
	├── chexbert.pth
	├── radgraph/
	├── google-bert/
	├── microsoft/
	└── distilgpt2/
	```

	⚠️ Manual download required:

	* `chexbert.pth`
	* `radgraph`

	See: [https://github.com/mk-runner/MLRG](https://github.com/mk-runner/MLRG)

	💡 Tip: Enable automatic download during training:

	```bash
	--online_ckpt "Yes"
	```

	---

	### 4️⃣ Additional Datasets

	\| Task \| Dataset \|
	\| -------------- \| ----------------------------------------------------------------------------------------------- \|
	\| Classification \| [NIH Chest X-rays](https://huggingface.co/datasets/alkzar90/NIH-Chest-X-ray-dataset) \|
	\| Detection \| [RSNA Pneumonia](https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge) \|
	\| Segmentation \| [SIIM-ACR](https://www.kaggle.com/datasets/vbookshelf/pneumothorax-chest-xray-images-and-masks) \|
	\| Tuberculosis \| [TBX11K](https://www.kaggle.com/datasets/vbookshelf/tbx11k-simplified) \|
	\| External \| [Shenzhen Dataset](https://openi.nlm.nih.gov/imgs/collections/ChinaSet_AllFiles.zip) \|

	---

	## 🧠 Training & Inference

	### 🔹 Pretraining

	```bash
	bash script/pretrain.sh
	```

	---

	### 🔹 Report Generation

	#### Free-text (Training)

	```bash
	bash script/free-text-report-generation-gpt2.sh
	bash script/free-text-report-generation-llm.sh
	```

	#### Free-text (Inference)

	```bash
	bash script/free-text-report-generation-gpt2-inference.sh
	```

	#### Structured Reports

	```bash
	bash script/structured-report-generation-gpt2.sh
	```

	---

	## 📊 Evaluation

	### 🔹 Compute Metrics

	```python
	from tools.metrics.metrics import compute_all_scores
	import pandas as pd

	data = pd.read_csv("generated_reports/xxx.csv")
	gts = data['reference_report'].tolist()
	gens = data['generated_report'].tolist()

	scores = compute_all_scores(gts, gens, args)
	print(scores)
	```

	---

	### 📈 Performance (DistilGPT2)

	```python
	{
	'BertScore': 0.5956377387046814,
	'Radgraph-simple': 0.30690433233898795,
	'Radgraph-partial': 0.28076371917819565,
	'Radgraph-complete': 0.22603009157065043,
	'SemScore': 0.45877182483673096,
	'1/RadCliQ-V1': 1.082196619824061,
	'RATEScore': 0.5787309255637078,
	'chexbert_5_micro_f1': 0.5708835341365461,
	'chexbert_5_macro_f1': 0.49498245207765257,
	'chexbert_all_micro_p': 0.5544458762886598,
	'chexbert_all_micro_r': 0.4980706154736639,
	'chexbert_all_micro_f1': 0.5247484500457363,
	'chexbert_all_macro_p': 0.44258976034375364,
	'chexbert_all_macro_r': 0.37672752858687886,
	'chexbert_all_macro_f1': 0.3883859770668801,
	'BLEU_1': 0.4103171077382396,
	'BLEU_2': 0.28970066408787387,
	'BLEU_3': 0.22010546378006685,
	'BLEU_4': 0.17481171574606008,
	'METEOR': 0.19054219748683743,
	'ROUGE_L': 0.3257898419599922,
	'CIDer': 0.3962696560568994
	}
	```

	---

	## 📚 Citation

	```bibtex
	@misc{2026-cogaze,
	title={Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays},
	author={Kang Liu and Zhuoqi Ma and Siyu Liang and Yunan Li and Xiyue Gao and Chao Liang and Kun Xie and Qiguang Miao},
	year={2026},
	eprint={2603.26049},
	archivePrefix={arXiv},
	primaryClass={cs.CV},
	url={https://arxiv.org/abs/2603.26049},
	}
	```

	---

	## 🙏 Acknowledgements

	* [MLRG](https://github.com/mk-runner/MLRG) — dataset & evaluation tools
	* [cvt2distilgpt2](https://github.com/aehrc/cvt2distilgpt2) — text generation initialization

	---

	## ⭐ Support

	If you find this project useful:

	* ⭐ Star this repository
	* 🐛 Open issues for questions or bugs
	* 📬 Contact Kang Liu (kangliu422@gmail.com) for collaboration

	---