| # CXR-VLM: Unified Vision-Language Model for Chest X-ray Interpretation |
|
|
| A lightweight Vision-Language Model for **three tasks** on chest X-rays: |
| 1. **Findings Generation** — detailed radiological findings |
| 2. **Impression Generation** — concise clinical summary |
| 3. **Visual Question Answering (VQA)** — answer specific clinical questions |
|
|
| ## Architecture (based on RaDialog) |
|
|
| ``` |
| CXR Image |
| │ |
| BioViL-T Encoder (frozen) ← domain-specific CXR encoder |
| │ [768-dim patch features] |
| MLP Projection Layer (trained) ← align to LLM space |
| │ [32 image tokens] |
| + CheXpert Findings (structured labels, optional) |
| + Task Instruction Prompt |
| │ |
| Vicuna-7B + LoRA (LLM trained with LoRA) |
| │ |
| Output Text (findings / impression / answer) |
| ``` |
|
|
| ## Project Structure |
|
|
| ``` |
| cxr_vlm/ |
| ├── configs/ |
| │ ├── model_config.yaml # model hyperparameters |
| │ └── train_config.yaml # training hyperparameters |
| ├── model/ |
| │ ├── __init__.py |
| │ ├── image_encoder.py # BioViL-T wrapper |
| │ ├── projection.py # MLP alignment layer |
| │ ├── chexpert_classifier.py # CheXpert structured findings classifier |
| │ └── cxr_vlm.py # full model (encoder + projection + LLM) |
| ├── data/ |
| │ ├── __init__.py |
| │ ├── dataset.py # CXRInstructDataset (load later) |
| │ ├── prompt_templates.py # instruction templates for 3 tasks |
| │ └── collator.py # DataCollator for variable-length inputs |
| ├── training/ |
| │ ├── __init__.py |
| │ ├── trainer.py # custom HuggingFace Trainer |
| │ └── train.py # main training entry point |
| ├── evaluation/ |
| │ ├── __init__.py |
| │ ├── metrics.py # BLEU, ROUGE, ClinicalF1, BERTScore |
| │ └── evaluate.py # evaluation entry point |
| ├── utils/ |
| │ ├── __init__.py |
| │ ├── logger.py # logging setup |
| │ └── checkpoint.py # save/load utilities |
| ├── scripts/ |
| │ ├── train.sh # training shell script |
| │ └── evaluate.sh # evaluation shell script |
| └── README.md |
| ``` |
|
|
| ## Setup |
|
|
| ```bash |
| conda create -n cxr_vlm python=3.10 |
| conda activate cxr_vlm |
| conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.7 -c pytorch -c nvidia |
| pip install -r requirements.txt |
| ``` |
|
|
| ## Training |
|
|
| ```bash |
| bash scripts/train.sh |
| ``` |
|
|
| ## Evaluation |
|
|
| ```bash |
| bash scripts/evaluate.sh |
| ``` |
|
|