--- tags: - chest-xray - radiology - report-generation - mimic-cxr - vision-encoder license: apache-2.0 --- # LAPVQA — Pretrain (Captioning) Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa). ## Description A **ViT-L/14 encoder + 6-layer causal decoder** trained from scratch on [MIMIC-CXR](https://physionet.org/content/mimic-cxr) to generate full radiology reports from chest X-ray images. Unlike the contrastive pretrain variants, the generative objective forces the encoder to retain fine-grained spatial information sufficient for region-level text generation. The encoder weights (`encoder_final.pt`) serve as the strongest feature extractor in the LAPVQA downstream tasks. ## Architecture | Component | Detail | |---|---| | Vision backbone | ViT-L/14, 24-layer, 1024-dim, 16-head, patch 14, 384 px | | Captioning decoder | 6-layer causal transformer, 512-dim, GPT-2 vocab (50 257) | | Loss | Cross-entropy over report tokens | | Training data | MIMIC-CXR (physionet.org/content/mimic-cxr) | ## Downstream Evaluation (frozen encoder + linear probe) | Dataset | Mean AUC | |---|---| | NIH CXR-14 (14-class) | 0.686 | | CheXpert-5 (5-class) | 0.808 | The captioning-pretrained encoder matches or exceeds the contrastive variants on both classification benchmarks, and is the best-performing encoder on DiffVQA when used downstream. ## Files | File | Description | |---|---| | `encoder_final.pt` | Vision encoder weights (used as frozen feature extractor downstream) | | `model_best.pt` | Full encoder + decoder at best validation loss | ## Usage ```python import torch from lapvqa.pretrain.model import CaptioningModel ckpt = torch.load("model_best.pt", map_location="cpu") model = CaptioningModel() model.load_state_dict(ckpt) model.eval() # To use only the encoder as a feature extractor: enc_weights = torch.load("encoder_final.pt", map_location="cpu") model.vision_encoder.load_state_dict(enc_weights) # vis_tokens = model.vision_encoder(images) # [B, 256, 1024] ``` ## Citation If you use these weights please cite MIMIC-CXR: ```bibtex @article{johnson2019mimic, title = {MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports}, author = {Johnson, Alistair EW and others}, journal = {Scientific data}, volume = {6}, pages = {317}, year = {2019} } ```