| # π©Ί CoGaze: Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays |
|
|
| ## β¨ Overview |
|
|
| **CoGaze** is a vision-language pretraining framework designed for **chest X-ray understanding**, inspired by how radiologists interpret medical images. |
|
|
| It integrates: |
|
|
| - ποΈ Gaze information is used during pretraining, while downstream tasks (report generation, classification, and segmentation) do not require gaze data. |
| - π§ Context-aware reasoning |
| - π Free-text & structured report generation, supervised & zero-shot classification, segmentation, image-text retrieval |
|
|
| --- |
|
|
| ## π° News |
|
|
| - **[2026-03-28]** π Official code and pretrained models are released on [Hugging Face](https://huggingface.co/MK-runner/CoGaze) |
| - **Github** https://github.com/mk-runner/CoGaze |
|
|
| --- |
|
|
| ## βοΈ Installation |
|
|
| ```bash |
| # Create conda environment |
| conda create -n cogaze python=3.10.16 |
| conda activate cogaze |
| ```` |
|
|
| ### π¦ Core Dependencies |
|
|
| ```txt |
| transformers==4.43.3 |
| radgraph==0.09 |
| pytorch-lighting==2.5.1.post0 |
| torch==2.4.1 |
| torchvision==0.19.1 |
| ``` |
|
|
| --- |
|
|
| ## π§© Model Zoo |
|
|
| | Dataset | Pretrained Model | Report Generation Model | Outputs | |
| | ------------- | ------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ | |
| | **MIMIC-CXR** | [CoGaze Pretrained Checkpoint](https://huggingface.co/MK-runner/CoGaze/blob/main/mimic_pretrain_best_model.pt) | [CoGaze (DistilGPT2)](https://huggingface.co/MK-runner/CoGaze/blob/main/distilgpt2_mimic_free_text_report_generation_best_model.pt) | [Generated Reports](https://github.com/mk-runner/CoGaze/tree/main/generated_reports) | |
|
|
| --- |
|
|
| ## π Dataset Preparation |
|
|
| ### 1οΈβ£ MIMIC-CXR Images |
|
|
| Dataset source: [PhysioNet](https://physionet.org/content/mimic-cxr/2.0.0/) |
|
|
| ``` |
| data/ |
| βββ p10/ |
| β βββ p10000032/ |
| β βββ s50414267/ |
| β βββ image1.jpg |
| β βββ image2.jpg |
| βββ p11/ |
| βββ ... |
| ``` |
|
|
| --- |
|
|
| ### 2οΈβ£ Annotations & Reports |
|
|
| Available on π€ Hugging Face: |
|
|
| * Gaze heatmap |
| * Image-text pairs |
| * SRRG annotations |
|
|
| π [https://huggingface.co/MK-runner/CoGaze/tree/main/mimic-annotation](https://huggingface.co/MK-runner/CoGaze/tree/main/mimic-annotation) |
|
|
| --- |
|
|
| ### 3οΈβ£ Checkpoint Structure |
|
|
| ``` |
| ckpt_zoo_dir/ |
| βββ chexbert.pth |
| βββ radgraph/ |
| βββ google-bert/ |
| βββ microsoft/ |
| βββ distilgpt2/ |
| ``` |
|
|
| β οΈ **Manual download required:** |
|
|
| * `chexbert.pth` |
| * `radgraph` |
|
|
| See: [https://github.com/mk-runner/MLRG](https://github.com/mk-runner/MLRG) |
|
|
| π‘ Tip: Enable automatic download during training: |
|
|
| ```bash |
| --online_ckpt "Yes" |
| ``` |
|
|
| --- |
|
|
| ### 4οΈβ£ Additional Datasets |
|
|
| | Task | Dataset | |
| | -------------- | ----------------------------------------------------------------------------------------------- | |
| | Classification | [NIH Chest X-rays](https://huggingface.co/datasets/alkzar90/NIH-Chest-X-ray-dataset) | |
| | Detection | [RSNA Pneumonia](https://www.kaggle.com/competitions/rsna-pneumonia-detection-challenge) | |
| | Segmentation | [SIIM-ACR](https://www.kaggle.com/datasets/vbookshelf/pneumothorax-chest-xray-images-and-masks) | |
| | Tuberculosis | [TBX11K](https://www.kaggle.com/datasets/vbookshelf/tbx11k-simplified) | |
| | External | [Shenzhen Dataset](https://openi.nlm.nih.gov/imgs/collections/ChinaSet_AllFiles.zip) | |
|
|
| --- |
|
|
| ## π§ Training & Inference |
|
|
| ### πΉ Pretraining |
|
|
| ```bash |
| bash script/pretrain.sh |
| ``` |
|
|
| --- |
|
|
| ### πΉ Report Generation |
|
|
| #### Free-text (Training) |
|
|
| ```bash |
| bash script/free-text-report-generation-gpt2.sh |
| bash script/free-text-report-generation-llm.sh |
| ``` |
|
|
| #### Free-text (Inference) |
|
|
| ```bash |
| bash script/free-text-report-generation-gpt2-inference.sh |
| ``` |
|
|
| #### Structured Reports |
|
|
| ```bash |
| bash script/structured-report-generation-gpt2.sh |
| ``` |
|
|
| --- |
|
|
| ## π Evaluation |
|
|
| ### πΉ Compute Metrics |
|
|
| ```python |
| from tools.metrics.metrics import compute_all_scores |
| import pandas as pd |
| |
| data = pd.read_csv("generated_reports/xxx.csv") |
| gts = data['reference_report'].tolist() |
| gens = data['generated_report'].tolist() |
| |
| scores = compute_all_scores(gts, gens, args) |
| print(scores) |
| ``` |
|
|
| --- |
|
|
| ### π Performance (DistilGPT2) |
|
|
| ```python |
| { |
| 'BertScore': 0.5956377387046814, |
| 'Radgraph-simple': 0.30690433233898795, |
| 'Radgraph-partial': 0.28076371917819565, |
| 'Radgraph-complete': 0.22603009157065043, |
| 'SemScore': 0.45877182483673096, |
| '1/RadCliQ-V1': 1.082196619824061, |
| 'RATEScore': 0.5787309255637078, |
| 'chexbert_5_micro_f1': 0.5708835341365461, |
| 'chexbert_5_macro_f1': 0.49498245207765257, |
| 'chexbert_all_micro_p': 0.5544458762886598, |
| 'chexbert_all_micro_r': 0.4980706154736639, |
| 'chexbert_all_micro_f1': 0.5247484500457363, |
| 'chexbert_all_macro_p': 0.44258976034375364, |
| 'chexbert_all_macro_r': 0.37672752858687886, |
| 'chexbert_all_macro_f1': 0.3883859770668801, |
| 'BLEU_1': 0.4103171077382396, |
| 'BLEU_2': 0.28970066408787387, |
| 'BLEU_3': 0.22010546378006685, |
| 'BLEU_4': 0.17481171574606008, |
| 'METEOR': 0.19054219748683743, |
| 'ROUGE_L': 0.3257898419599922, |
| 'CIDer': 0.3962696560568994 |
| } |
| ``` |
|
|
| --- |
|
|
| ## π Citation |
|
|
| ```bibtex |
| @misc{2026-cogaze, |
| title={Seeing Like Radiologists: Context- and Gaze-Guided Vision-Language Pretraining for Chest X-rays}, |
| author={Kang Liu and Zhuoqi Ma and Siyu Liang and Yunan Li and Xiyue Gao and Chao Liang and Kun Xie and Qiguang Miao}, |
| year={2026}, |
| eprint={2603.26049}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CV}, |
| url={https://arxiv.org/abs/2603.26049}, |
| } |
| ``` |
|
|
| --- |
|
|
| ## π Acknowledgements |
|
|
| * [MLRG](https://github.com/mk-runner/MLRG) β dataset & evaluation tools |
| * [cvt2distilgpt2](https://github.com/aehrc/cvt2distilgpt2) β text generation initialization |
|
|
| --- |
|
|
| ## β Support |
|
|
| If you find this project useful: |
|
|
| * β Star this repository |
| * π Open issues for questions or bugs |
| * π¬ Contact Kang Liu (kangliu422@gmail.com) for collaboration |
|
|
| --- |
|
|