File size: 2,579 Bytes
28b13fc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | # CXR-VLM: Unified Vision-Language Model for Chest X-ray Interpretation
A lightweight Vision-Language Model for **three tasks** on chest X-rays:
1. **Findings Generation** — detailed radiological findings
2. **Impression Generation** — concise clinical summary
3. **Visual Question Answering (VQA)** — answer specific clinical questions
## Architecture (based on RaDialog)
```
CXR Image
│
BioViL-T Encoder (frozen) ← domain-specific CXR encoder
│ [768-dim patch features]
MLP Projection Layer (trained) ← align to LLM space
│ [32 image tokens]
+ CheXpert Findings (structured labels, optional)
+ Task Instruction Prompt
│
Vicuna-7B + LoRA (LLM trained with LoRA)
│
Output Text (findings / impression / answer)
```
## Project Structure
```
cxr_vlm/
├── configs/
│ ├── model_config.yaml # model hyperparameters
│ └── train_config.yaml # training hyperparameters
├── model/
│ ├── __init__.py
│ ├── image_encoder.py # BioViL-T wrapper
│ ├── projection.py # MLP alignment layer
│ ├── chexpert_classifier.py # CheXpert structured findings classifier
│ └── cxr_vlm.py # full model (encoder + projection + LLM)
├── data/
│ ├── __init__.py
│ ├── dataset.py # CXRInstructDataset (load later)
│ ├── prompt_templates.py # instruction templates for 3 tasks
│ └── collator.py # DataCollator for variable-length inputs
├── training/
│ ├── __init__.py
│ ├── trainer.py # custom HuggingFace Trainer
│ └── train.py # main training entry point
├── evaluation/
│ ├── __init__.py
│ ├── metrics.py # BLEU, ROUGE, ClinicalF1, BERTScore
│ └── evaluate.py # evaluation entry point
├── utils/
│ ├── __init__.py
│ ├── logger.py # logging setup
│ └── checkpoint.py # save/load utilities
├── scripts/
│ ├── train.sh # training shell script
│ └── evaluate.sh # evaluation shell script
└── README.md
```
## Setup
```bash
conda create -n cxr_vlm python=3.10
conda activate cxr_vlm
conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt
```
## Training
```bash
bash scripts/train.sh
```
## Evaluation
```bash
bash scripts/evaluate.sh
```
|