dmusingu commited on
Commit
804f5c7
·
verified ·
1 Parent(s): 259e8d5

Update README with model loading code

Browse files
Files changed (1) hide show
  1. README.md +29 -13
README.md CHANGED
@@ -13,16 +13,32 @@ Part of the [LAPVQA collection](https://huggingface.co/collections/dmusingu/lapv
13
 
14
  ## Description
15
 
16
- VQA task heads trained with **end-to-end fine-tuning** — the encoder weights are
17
- updated jointly with the task head, providing a baseline for how much improvement
18
- domain adaptation yields over the frozen-encoder setup in [`lapvqa-vqa`](https://huggingface.co/dmusingu/lapvqa-vqa).
19
-
20
- ## Files
21
-
22
- | File | Encoder backbone |
23
- |---|---|
24
- | `clip-vit-l14_best.pt` | CLIP ViT-L/14 (fine-tuned) |
25
- | `siglip_best.pt` | SigLIP (fine-tuned) |
26
- | `florence2_best.pt` | Florence-2 (fine-tuned) |
27
- | `coca_best.pt` | CoCa (fine-tuned) |
28
- | `mae-vit-l16_best.pt` | MAE ViT-L/16 (fine-tuned) |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  ## Description
15
 
16
+ VQA task heads trained with **end-to-end fine-tuning** (encoder + head jointly).
17
+ Provides a baseline for comparison with the frozen-encoder variant
18
+ [`lapvqa-vqa`](https://huggingface.co/dmusingu/lapvqa-vqa).
19
+ Each `.pt` file is a plain state dict of `VQAHead`.
20
+
21
+ | File | Encoder | vis_dim |
22
+ |---|---|---|
23
+ | `clip-vit-l14_best.pt` | CLIP ViT-L/14 (fine-tuned) | 1024 |
24
+ | `siglip_best.pt` | SigLIP (fine-tuned) | 1152 |
25
+ | `florence2_best.pt` | Florence-2 (fine-tuned) | 1024 |
26
+ | `coca_best.pt` | CoCa (fine-tuned) | 768 |
27
+ | `mae-vit-l16_best.pt` | MAE ViT-L/16 (fine-tuned) | 1024 |
28
+
29
+ ## Loading
30
+
31
+ ```python
32
+ import torch
33
+ from lapvqa.vqa.model import VQAHead
34
+
35
+ VIS_DIMS = {
36
+ "clip-vit-l14": 1024, "siglip": 1152,
37
+ "florence2": 1024, "coca": 768, "mae-vit-l16": 1024,
38
+ }
39
+ encoder = "siglip"
40
+ ckpt = torch.load(f"{encoder}_best.pt", map_location="cpu")
41
+ head = VQAHead(vis_dim=VIS_DIMS[encoder])
42
+ head.load_state_dict(ckpt)
43
+ head.eval()
44
+ ```