File size: 750 Bytes
09b2c2d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0bfba7a
09b2c2d
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
---
library_name: transformers
tags:
- vision
- image-reconstruction
- siglip2
- safetensors
---

# F2P Decoder

Hugging Face `AutoModel` wrapper for the SigLIP2 feature-to-pixel decoder used in this repository.

```python
import torch
from transformers import AutoModel

model = AutoModel.from_pretrained(
    "toilaluan/f2p_decoder",
    trust_remote_code=True,
).eval()

features = torch.randn(1, 257, 1152)
reconstruction = model(features)
print(reconstruction.shape)  # (1, 3, 224, 224)
```

The model expects SigLIP2 patch features with a CLS token, for example from
`google/siglip2-so400m-patch14-224`. The output is an image tensor in the
decoder's reconstructed pixel space.

Source `.pt` checkpoint: `nyu-visionx/siglip2_decoder/model.pt`.