Image Feature Extraction
Transformers
Safetensors
siglip2_navit
feature-extraction
vision
siglip
navit
google
custom_code
Instructions to use wtzhang-nlp/siglip2-navit-google with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use wtzhang-nlp/siglip2-navit-google with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-feature-extraction", model="wtzhang-nlp/siglip2-navit-google", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("wtzhang-nlp/siglip2-navit-google", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
SigLIP2 NaViT Vision Encoder (with Google Pretrained Weights)
This is a SigLIP2 NaViT (Native Resolution Vision Transformer) vision encoder model, initialized with Google's pretrained SigLIP2 checkpoint. The vision encoder weights are from Google's checkpoint, while the merger layer is randomly initialized.
Model Details
- Model Type: Vision Encoder
- Architecture: SigLIP2 with Native Resolution ViT
- Base Checkpoint: Google SigLIP2
- Precision: FP16 (float16) for reduced storage
- Hidden Size: 768
- Number of Layers: 12
- Number of Attention Heads: 12
- Patch Size: 16
- Spatial Merge Size: 2
- Output Hidden Size: 896
Initialization
- Vision Encoder: Initialized from Google's SigLIP2 pretrained checkpoint
- Vision Merger: Randomly initialized (ready for fine-tuning)
Usage
from transformers import AutoModel, AutoImageProcessor
from PIL import Image
import torch
# Load model and processor
model = AutoModel.from_pretrained("wtzhang-nlp/siglip2-navit-google", trust_remote_code=True)
processor = AutoImageProcessor.from_pretrained("wtzhang-nlp/siglip2-navit-google", trust_remote_code=True)
# Load and process image
image = Image.open("path/to/image.jpg")
inputs = processor(images=image, return_tensors="pt")
# Forward pass
with torch.no_grad():
outputs = model(**inputs)
print(f"Output shape: {outputs.last_hidden_state.shape}")
# Expected: [batch_size, num_patches, 896]
License
Apache 2.0
- Downloads last month
- 9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support