toilaluan
/

f2p_decoder

Feature Extraction

image-reconstruction

Model card Files Files and versions

f2p_decoder / README.md

toilaluan's picture

Update README.md

0bfba7a verified 10 days ago

|

history blame contribute delete

750 Bytes

	---
	library_name: transformers
	tags:
	- vision
	- image-reconstruction
	- siglip2
	- safetensors
	---

	# F2P Decoder

	Hugging Face `AutoModel` wrapper for the SigLIP2 feature-to-pixel decoder used in this repository.

	```python
	import torch
	from transformers import AutoModel

	model = AutoModel.from_pretrained(
	"toilaluan/f2p_decoder",
	trust_remote_code=True,
	).eval()

	features = torch.randn(1, 257, 1152)
	reconstruction = model(features)
	print(reconstruction.shape) # (1, 3, 224, 224)
	```

	The model expects SigLIP2 patch features with a CLS token, for example from
	`google/siglip2-so400m-patch14-224`. The output is an image tensor in the
	decoder's reconstructed pixel space.

	Source `.pt` checkpoint: `nyu-visionx/siglip2_decoder/model.pt`.