File size: 1,219 Bytes
a8c616d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1b042c5
 
 
 
 
 
 
 
 
 
a8c616d
 
 
 
 
 
 
1b042c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
---
tags:
- chest-xray
- radiology
- report-generation
- mimic-cxr
license: apache-2.0
---

# LAPVQA — Radiology Report Generation (Native / End-to-end)

Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapvqa).

## Description

RRG decoders trained **end-to-end** alongside their vision encoders.
Each checkpoint is a dict: `{state_dict, vis_dim, d_model, num_layers, nhead, encoder, epoch, val_bleu4}`.

| File | Encoder | vis_dim |
|---|---|---|
| `clip-vit-l14.pt` | CLIP ViT-L/14 (fine-tuned) | 1024 |
| `siglip.pt` | SigLIP (fine-tuned) | 1152 |
| `florence2.pt` | Florence-2 (fine-tuned) | 1024 |
| `coca.pt` | CoCa (fine-tuned) | 768 |
| `mae-vit-l16.pt` | MAE ViT-L/16 (fine-tuned) | 1024 |

## Results (MIMIC-CXR test set, MAE-ViT-L/16)

| BLEU-4 | ROUGE-L | RadGraph-s |
|---|---|---|
| 0.032 | 0.164 | 0.195 |

## Loading

```python
import torch
from lapvqa.rrg.heads import ReportGenerationHead

ckpt = torch.load("mae-vit-l16.pt", map_location="cpu")
head = ReportGenerationHead(
    vis_dim    = ckpt["vis_dim"],
    d_model    = ckpt["d_model"],
    num_layers = ckpt["num_layers"],
    nhead      = ckpt["nhead"],
)
head.load_state_dict(ckpt["state_dict"])
head.eval()
```