MK-runner commited on
Commit
333084b
Β·
verified Β·
1 Parent(s): 6c6854f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +232 -0
README.md ADDED
@@ -0,0 +1,232 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🩺 CoGaze: Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays
2
+
3
+ ## ✨ Overview
4
+
5
+ **CoGaze** is a vision-language pretraining framework designed for **chest X-ray understanding**, inspired by how radiologists interpret medical images.
6
+
7
+ It integrates:
8
+
9
+ - πŸ‘οΈ Gaze information is used during pretraining, while downstream tasks (report generation, classification, and segmentation) do not require gaze data.
10
+ - 🧠 Context-aware reasoning
11
+ - πŸ“ Free-text & structured report generation, supervised & zero-shot classification, segmentation, image-text retrieval
12
+
13
+ ---
14
+
15
+ ## πŸ“° News
16
+
17
+ - **[2026-03-28]** πŸš€ Official code and pretrained models are released on [Hugging Face](https://huggingface.co/MK-runner/CoGaze)
18
+ - **Github** https://github.com/mk-runner/CoGaze
19
+
20
+ ---
21
+
22
+ ## βš™οΈ Installation
23
+
24
+ ```bash
25
+ # Create conda environment
26
+ conda create -n cogaze python=3.10.16
27
+ conda activate cogaze
28
+ ````
29
+
30
+ ### πŸ“¦ Core Dependencies
31
+
32
+ ```txt
33
+ transformers==4.43.3
34
+ radgraph==0.09
35
+ pytorch-lighting==2.5.1.post0
36
+ torch==2.4.1
37
+ torchvision==0.19.1
38
+ ```
39
+
40
+ ---
41
+
42
+ ## 🧩 Model Zoo
43
+
44
+ | Dataset | Pretrained Model | Report Generation Model | Outputs |
45
+ | ------------- | ------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
46
+ | **MIMIC-CXR** | [CoGaze Pretrained Checkpoint](https://huggingface.co/MK-runner/CoGaze/blob/main/mimic_pretrain_best_model.pt) | [CoGaze (DistilGPT2)](https://huggingface.co/MK-runner/CoGaze/blob/main/distilgpt2_mimic_free_text_report_generation_best_model.pt) | [Generated Reports](https://github.com/mk-runner/CoGaze/tree/main/generated_reports) |
47
+
48
+ ---
49
+
50
+ ## πŸ“ Dataset Preparation
51
+
52
+ ### 1️⃣ MIMIC-CXR Images
53
+
54
+ Dataset source: [PhysioNet](https://physionet.org/content/mimic-cxr/2.0.0/)
55
+
56
+ ```
57
+ data/
58
+ β”œβ”€β”€ p10/
59
+ β”‚ └── p10000032/
60
+ β”‚ └── s50414267/
61
+ β”‚ β”œβ”€β”€ image1.jpg
62
+ β”‚ └── image2.jpg
63
+ β”œβ”€β”€ p11/
64
+ └── ...
65
+ ```
66
+
67
+ ---
68
+
69
+ ### 2️⃣ Annotations & Reports
70
+
71
+ Available on πŸ€— Hugging Face:
72
+
73
+ * Gaze heatmap
74
+ * Image-text pairs
75
+ * SRRG annotations
76
+
77
+ πŸ‘‰ [https://huggingface.co/MK-runner/CoGaze/tree/main/mimic-annotation](https://huggingface.co/MK-runner/CoGaze/tree/main/mimic-annotation)
78
+
79
+ ---
80
+
81
+ ### 3️⃣ Checkpoint Structure
82
+
83
+ ```
84
+ ckpt_zoo_dir/
85
+ β”œβ”€β”€ chexbert.pth
86
+ β”œβ”€β”€ radgraph/
87
+ β”œβ”€β”€ google-bert/
88
+ β”œβ”€β”€ microsoft/
89
+ └── distilgpt2/
90
+ ```
91
+
92
+ ⚠️ **Manual download required:**
93
+
94
+ * `chexbert.pth`
95
+ * `radgraph`
96
+
97
+ See: [https://github.com/mk-runner/MLRG](https://github.com/mk-runner/MLRG)
98
+
99
+ πŸ’‘ Tip: Enable automatic download during training:
100
+
101
+ ```bash
102
+ --online_ckpt "Yes"
103
+ ```
104
+
105
+ ---
106
+
107
+ ### 4️⃣ Additional Datasets
108
+
109
+ | Task | Dataset |
110
+ | -------------- | ----------------------------------------------------------------------------------------------- |
111
+ | Classification | [NIH Chest X-rays](https://huggingface.co/datasets/alkzar90/NIH-Chest-X-ray-dataset) |
112
+ | Detection | [RSNA Pneumonia](https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge) |
113
+ | Segmentation | [SIIM-ACR](https://www.kaggle.com/datasets/vbookshelf/pneumothorax-chest-xray-images-and-masks) |
114
+ | Tuberculosis | [TBX11K](https://www.kaggle.com/datasets/vbookshelf/tbx11k-simplified) |
115
+ | External | [Shenzhen Dataset](https://openi.nlm.nih.gov/imgs/collections/ChinaSet_AllFiles.zip) |
116
+
117
+ ---
118
+
119
+ ## 🧠 Training & Inference
120
+
121
+ ### πŸ”Ή Pretraining
122
+
123
+ ```bash
124
+ bash script/pretrain.sh
125
+ ```
126
+
127
+ ---
128
+
129
+ ### πŸ”Ή Report Generation
130
+
131
+ #### Free-text (Training)
132
+
133
+ ```bash
134
+ bash script/free-text-report-generation-gpt2.sh
135
+ bash script/free-text-report-generation-llm.sh
136
+ ```
137
+
138
+ #### Free-text (Inference)
139
+
140
+ ```bash
141
+ bash script/free-text-report-generation-gpt2-inference.sh
142
+ ```
143
+
144
+ #### Structured Reports
145
+
146
+ ```bash
147
+ bash script/structured-report-generation-gpt2.sh
148
+ ```
149
+
150
+ ---
151
+
152
+ ## πŸ“Š Evaluation
153
+
154
+ ### πŸ”Ή Compute Metrics
155
+
156
+ ```python
157
+ from tools.metrics.metrics import compute_all_scores
158
+ import pandas as pd
159
+
160
+ data = pd.read_csv("generated_reports/xxx.csv")
161
+ gts = data['reference_report'].tolist()
162
+ gens = data['generated_report'].tolist()
163
+
164
+ scores = compute_all_scores(gts, gens, args)
165
+ print(scores)
166
+ ```
167
+
168
+ ---
169
+
170
+ ### πŸ“ˆ Performance (DistilGPT2)
171
+
172
+ ```python
173
+ {
174
+ 'BertScore': 0.5956377387046814,
175
+ 'Radgraph-simple': 0.30690433233898795,
176
+ 'Radgraph-partial': 0.28076371917819565,
177
+ 'Radgraph-complete': 0.22603009157065043,
178
+ 'SemScore': 0.45877182483673096,
179
+ '1/RadCliQ-V1': 1.082196619824061,
180
+ 'RATEScore': 0.5787309255637078,
181
+ 'chexbert_5_micro_f1': 0.5708835341365461,
182
+ 'chexbert_5_macro_f1': 0.49498245207765257,
183
+ 'chexbert_all_micro_p': 0.5544458762886598,
184
+ 'chexbert_all_micro_r': 0.4980706154736639,
185
+ 'chexbert_all_micro_f1': 0.5247484500457363,
186
+ 'chexbert_all_macro_p': 0.44258976034375364,
187
+ 'chexbert_all_macro_r': 0.37672752858687886,
188
+ 'chexbert_all_macro_f1': 0.3883859770668801,
189
+ 'BLEU_1': 0.4103171077382396,
190
+ 'BLEU_2': 0.28970066408787387,
191
+ 'BLEU_3': 0.22010546378006685,
192
+ 'BLEU_4': 0.17481171574606008,
193
+ 'METEOR': 0.19054219748683743,
194
+ 'ROUGE_L': 0.3257898419599922,
195
+ 'CIDer': 0.3962696560568994
196
+ }
197
+ ```
198
+
199
+ ---
200
+
201
+ ## πŸ“š Citation
202
+
203
+ ```bibtex
204
+ @misc{2026-cogaze,
205
+ title={Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays},
206
+ author={Kang Liu and Zhuoqi Ma and Siyu Liang and Yunan Li and Xiyue Gao and Chao Liang and Kun Xie and Qiguang Miao},
207
+ year={2026},
208
+ eprint={2603.26049},
209
+ archivePrefix={arXiv},
210
+ primaryClass={cs.CV},
211
+ url={https://arxiv.org/abs/2603.26049},
212
+ }
213
+ ```
214
+
215
+ ---
216
+
217
+ ## πŸ™ Acknowledgements
218
+
219
+ * [MLRG](https://github.com/mk-runner/MLRG) β€” dataset & evaluation tools
220
+ * [cvt2distilgpt2](https://github.com/aehrc/cvt2distilgpt2) β€” text generation initialization
221
+
222
+ ---
223
+
224
+ ## ⭐ Support
225
+
226
+ If you find this project useful:
227
+
228
+ * ⭐ Star this repository
229
+ * πŸ› Open issues for questions or bugs
230
+ * πŸ“¬ Contact Kang Liu (kangliu422@gmail.com) for collaboration
231
+
232
+ ---