dmusingu
/

lapvqa-diffvqa-pretrain-captioning

@@ -14,9 +14,9 @@ Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapv
 ## Description
-DiffVQA task head trained on top of the **LAPVQA captioning-pretrained encoder**
 ([`lapvqa-pretrain-captioning`](https://huggingface.co/dmusingu/lapvqa-pretrain-captioning)).
-The encoder is kept frozen; this file contains the task head only.
 ## Results (test set)
@@ -24,8 +24,15 @@ The encoder is kept frozen; this file contains the task head only.
 |---|---|---|---|
 | 0.468 | 0.562 | 0.303 | 0.938 |
-## Files
-| File | Description |
-|---|---|
-| `pretrain-captioning_best.pt` | DiffVQA head (encoder not included) |

 ## Description
+DiffVQA head trained on the frozen **LAPVQA captioning-pretrained encoder**
 ([`lapvqa-pretrain-captioning`](https://huggingface.co/dmusingu/lapvqa-pretrain-captioning)).
+Checkpoint is a plain `DiffVQAHead` state dict (vis_dim=1024).
 ## Results (test set)
 |---|---|---|---|
 | 0.468 | 0.562 | 0.303 | 0.938 |
+## Loading
+```python
+import torch
+from lapvqa.diffvqa.model import DiffVQAHead
+ckpt = torch.load("pretrain-captioning_best.pt", map_location="cpu")
+head = DiffVQAHead(vis_dim=1024)
+head.load_state_dict(ckpt)
+head.eval()
+# pair with encoder_final.pt from lapvqa-pretrain-captioning
+```