File size: 2,579 Bytes

28b13fc

# CXR-VLM: Unified Vision-Language Model for Chest X-ray Interpretation

A lightweight Vision-Language Model for **three tasks** on chest X-rays:
1. **Findings Generation** — detailed radiological findings
2. **Impression Generation** — concise clinical summary
3. **Visual Question Answering (VQA)** — answer specific clinical questions

## Architecture (based on RaDialog)

```
CXR Image
    │
BioViL-T Encoder (frozen)     ← domain-specific CXR encoder
    │  [768-dim patch features]
MLP Projection Layer (trained) ← align to LLM space
    │  [32 image tokens]
    + CheXpert Findings (structured labels, optional)
    + Task Instruction Prompt
    │
Vicuna-7B + LoRA (LLM trained with LoRA)
    │
Output Text (findings / impression / answer)
```

## Project Structure

```
cxr_vlm/
├── configs/
│   ├── model_config.yaml       # model hyperparameters
│   └── train_config.yaml       # training hyperparameters
├── model/
│   ├── __init__.py
│   ├── image_encoder.py        # BioViL-T wrapper
│   ├── projection.py           # MLP alignment layer
│   ├── chexpert_classifier.py  # CheXpert structured findings classifier
│   └── cxr_vlm.py              # full model (encoder + projection + LLM)
├── data/
│   ├── __init__.py
│   ├── dataset.py              # CXRInstructDataset (load later)
│   ├── prompt_templates.py     # instruction templates for 3 tasks
│   └── collator.py             # DataCollator for variable-length inputs
├── training/
│   ├── __init__.py
│   ├── trainer.py              # custom HuggingFace Trainer
│   └── train.py                # main training entry point
├── evaluation/
│   ├── __init__.py
│   ├── metrics.py              # BLEU, ROUGE, ClinicalF1, BERTScore
│   └── evaluate.py             # evaluation entry point
├── utils/
│   ├── __init__.py
│   ├── logger.py               # logging setup
│   └── checkpoint.py           # save/load utilities
├── scripts/
│   ├── train.sh                # training shell script
│   └── evaluate.sh             # evaluation shell script
└── README.md
```

## Setup

```bash
conda create -n cxr_vlm python=3.10
conda activate cxr_vlm
conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt
```

## Training

```bash
bash scripts/train.sh
```

## Evaluation

```bash
bash scripts/evaluate.sh
```