| --- |
| library_name: transformers |
| tags: |
| - vision |
| - image-reconstruction |
| - siglip2 |
| - safetensors |
| --- |
| |
| # F2P Decoder |
|
|
| Hugging Face `AutoModel` wrapper for the SigLIP2 feature-to-pixel decoder used in this repository. |
|
|
| ```python |
| import torch |
| from transformers import AutoModel |
| |
| model = AutoModel.from_pretrained( |
| "toilaluan/f2p_decoder", |
| trust_remote_code=True, |
| ).eval() |
| |
| features = torch.randn(1, 257, 1152) |
| reconstruction = model(features) |
| print(reconstruction.shape) # (1, 3, 224, 224) |
| ``` |
|
|
| The model expects SigLIP2 patch features with a CLS token, for example from |
| `google/siglip2-so400m-patch14-224`. The output is an image tensor in the |
| decoder's reconstructed pixel space. |
|
|
| Source `.pt` checkpoint: `nyu-visionx/siglip2_decoder/model.pt`. |
|
|