MK-runner
/

CoGaze

Model card Files Files and versions

xet

Community

MK-runner commited on 3 days ago

Commit

333084b

verified ·

1 Parent(s): 6c6854f

Create README.md

Browse files

Files changed (1) hide show

README.md +232 -0

README.md ADDED Viewed

	@@ -0,0 +1,232 @@

+# 🩺 CoGaze: Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays
+## ✨ Overview
+**CoGaze** is a vision-language pretraining framework designed for **chest X-ray understanding**, inspired by how radiologists interpret medical images.
+It integrates:
+- 👁️ Gaze information is used during pretraining, while downstream tasks (report generation, classification, and segmentation) do not require gaze data.
+- 🧠 Context-aware reasoning
+- 📝 Free-text & structured report generation, supervised & zero-shot classification, segmentation, image-text retrieval
+---
+## 📰 News
+- **[2026-03-28]** 🚀 Official code and pretrained models are released on [Hugging Face](https://huggingface.co/MK-runner/CoGaze)
+- **Github** https://github.com/mk-runner/CoGaze
+---
+## ⚙️ Installation
+```bash
+# Create conda environment
+conda create -n cogaze python=3.10.16
+conda activate cogaze
+````
+### 📦 Core Dependencies
+```txt
+transformers==4.43.3
+radgraph==0.09
+pytorch-lighting==2.5.1.post0
+torch==2.4.1
+torchvision==0.19.1
+```
+---
+## 🧩 Model Zoo
+| Dataset       | Pretrained Model                                                                                        | Report Generation Model                                                                                     | Outputs                                                                              |
+| ------------- | ------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
+| **MIMIC-CXR** | [CoGaze Pretrained Checkpoint](https://huggingface.co/MK-runner/CoGaze/blob/main/mimic_pretrain_best_model.pt) | [CoGaze (DistilGPT2)](https://huggingface.co/MK-runner/CoGaze/blob/main/distilgpt2_mimic_free_text_report_generation_best_model.pt) | [Generated Reports](https://github.com/mk-runner/CoGaze/tree/main/generated_reports) |
+---
+## 📁 Dataset Preparation
+### 1️⃣ MIMIC-CXR Images
+Dataset source: [PhysioNet](https://physionet.org/content/mimic-cxr/2.0.0/)
+```
+data/
+├── p10/
+│   └── p10000032/
+│       └── s50414267/
+│           ├── image1.jpg
+│           └── image2.jpg
+├── p11/
+└── ...
+```
+---
+### 2️⃣ Annotations & Reports
+Available on 🤗 Hugging Face:
+* Gaze heatmap
+* Image-text pairs
+* SRRG annotations
+👉 [https://huggingface.co/MK-runner/CoGaze/tree/main/mimic-annotation](https://huggingface.co/MK-runner/CoGaze/tree/main/mimic-annotation)
+---
+### 3️⃣ Checkpoint Structure
+```
+ckpt_zoo_dir/
+├── chexbert.pth
+├── radgraph/
+├── google-bert/
+├── microsoft/
+└── distilgpt2/
+```
+⚠️ **Manual download required:**
+* `chexbert.pth`
+* `radgraph`
+See: [https://github.com/mk-runner/MLRG](https://github.com/mk-runner/MLRG)
+💡 Tip: Enable automatic download during training:
+```bash
+--online_ckpt "Yes"
+```
+---
+### 4️⃣ Additional Datasets
+| Task           | Dataset                                                                                         |
+| -------------- | ----------------------------------------------------------------------------------------------- |
+| Classification | [NIH Chest X-rays](https://huggingface.co/datasets/alkzar90/NIH-Chest-X-ray-dataset)            |
+| Detection      | [RSNA Pneumonia](https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge)        |
+| Segmentation   | [SIIM-ACR](https://www.kaggle.com/datasets/vbookshelf/pneumothorax-chest-xray-images-and-masks) |
+| Tuberculosis   | [TBX11K](https://www.kaggle.com/datasets/vbookshelf/tbx11k-simplified)                          |
+| External       | [Shenzhen Dataset](https://openi.nlm.nih.gov/imgs/collections/ChinaSet_AllFiles.zip)            |
+---
+## 🧠 Training & Inference
+### 🔹 Pretraining
+```bash
+bash script/pretrain.sh
+```
+---
+### 🔹 Report Generation
+#### Free-text (Training)
+```bash
+bash script/free-text-report-generation-gpt2.sh
+bash script/free-text-report-generation-llm.sh
+```
+#### Free-text (Inference)
+```bash
+bash script/free-text-report-generation-gpt2-inference.sh
+```
+#### Structured Reports
+```bash
+bash script/structured-report-generation-gpt2.sh
+```
+---
+## 📊 Evaluation
+### 🔹 Compute Metrics
+```python
+from tools.metrics.metrics import compute_all_scores
+import pandas as pd
+data = pd.read_csv("generated_reports/xxx.csv")
+gts = data['reference_report'].tolist()
+gens = data['generated_report'].tolist()
+scores = compute_all_scores(gts, gens, args)
+print(scores)
+```
+---
+### 📈 Performance (DistilGPT2)
+```python
+{
+    'BertScore': 0.5956377387046814,
+    'Radgraph-simple': 0.30690433233898795,
+    'Radgraph-partial': 0.28076371917819565,
+    'Radgraph-complete': 0.22603009157065043,
+    'SemScore': 0.45877182483673096,
+    '1/RadCliQ-V1': 1.082196619824061,
+    'RATEScore': 0.5787309255637078,
+    'chexbert_5_micro_f1': 0.5708835341365461,
+    'chexbert_5_macro_f1': 0.49498245207765257,
+    'chexbert_all_micro_p': 0.5544458762886598,
+    'chexbert_all_micro_r': 0.4980706154736639,
+    'chexbert_all_micro_f1': 0.5247484500457363,
+    'chexbert_all_macro_p': 0.44258976034375364,
+    'chexbert_all_macro_r': 0.37672752858687886,
+    'chexbert_all_macro_f1': 0.3883859770668801,
+    'BLEU_1': 0.4103171077382396,
+    'BLEU_2': 0.28970066408787387,
+    'BLEU_3': 0.22010546378006685,
+    'BLEU_4': 0.17481171574606008,
+    'METEOR': 0.19054219748683743,
+    'ROUGE_L': 0.3257898419599922,
+    'CIDer': 0.3962696560568994
+}
+```
+---
+## 📚 Citation
+```bibtex
+@misc{2026-cogaze,
+      title={Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays},
+      author={Kang Liu and Zhuoqi Ma and Siyu Liang and Yunan Li and Xiyue Gao and Chao Liang and Kun Xie and Qiguang Miao},
+      year={2026},
+      eprint={2603.26049},
+      archivePrefix={arXiv},
+      primaryClass={cs.CV},
+      url={https://arxiv.org/abs/2603.26049},
+}
+```
+---
+## 🙏 Acknowledgements
+* [MLRG](https://github.com/mk-runner/MLRG) — dataset & evaluation tools
+* [cvt2distilgpt2](https://github.com/aehrc/cvt2distilgpt2) — text generation initialization
+---
+## ⭐ Support
+If you find this project useful:
+* ⭐ Star this repository
+* 🐛 Open issues for questions or bugs
+* 📬 Contact Kang Liu (kangliu422@gmail.com) for collaboration
+---