CXR-VLM: Unified Vision-Language Model for Chest X-ray Interpretation

A lightweight Vision-Language Model for three tasks on chest X-rays:

Findings Generation — detailed radiological findings
Impression Generation — concise clinical summary
Visual Question Answering (VQA) — answer specific clinical questions

Architecture (based on RaDialog)

CXR Image
    │
BioViL-T Encoder (frozen)     ← domain-specific CXR encoder
    │  [768-dim patch features]
MLP Projection Layer (trained) ← align to LLM space
    │  [32 image tokens]
    + CheXpert Findings (structured labels, optional)
    + Task Instruction Prompt
    │
Vicuna-7B + LoRA (LLM trained with LoRA)
    │
Output Text (findings / impression / answer)

Project Structure

cxr_vlm/
├── configs/
│   ├── model_config.yaml       # model hyperparameters
│   └── train_config.yaml       # training hyperparameters
├── model/
│   ├── __init__.py
│   ├── image_encoder.py        # BioViL-T wrapper
│   ├── projection.py           # MLP alignment layer
│   ├── chexpert_classifier.py  # CheXpert structured findings classifier
│   └── cxr_vlm.py              # full model (encoder + projection + LLM)
├── data/
│   ├── __init__.py
│   ├── dataset.py              # CXRInstructDataset (load later)
│   ├── prompt_templates.py     # instruction templates for 3 tasks
│   └── collator.py             # DataCollator for variable-length inputs
├── training/
│   ├── __init__.py
│   ├── trainer.py              # custom HuggingFace Trainer
│   └── train.py                # main training entry point
├── evaluation/
│   ├── __init__.py
│   ├── metrics.py              # BLEU, ROUGE, ClinicalF1, BERTScore
│   └── evaluate.py             # evaluation entry point
├── utils/
│   ├── __init__.py
│   ├── logger.py               # logging setup
│   └── checkpoint.py           # save/load utilities
├── scripts/
│   ├── train.sh                # training shell script
│   └── evaluate.sh             # evaluation shell script
└── README.md

Setup

conda create -n cxr_vlm python=3.10
conda activate cxr_vlm
conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt

Training

bash scripts/train.sh

Evaluation

bash scripts/evaluate.sh