SigLIP Base GGUF
GGUF format of google/siglip-base-patch16-384 vision encoder for use with CrispEmbed.
768-dimensional image embeddings. Encodes 384x384 images via a 12-layer ViT with 16x16 patches.
Parity
| Quant | Cosine vs HF | Size |
|---|---|---|
| F32 | 0.996 (mean pool) | 355 MB |
Quick Start
# Download
huggingface-cli download cstr/siglip-base-GGUF siglip-base.gguf --local-dir .
# Encode image
./crispembed -m siglip-base.gguf --image photo.jpg
# Print dimension
./crispembed -m siglip-base.gguf --dim # โ 768
Architecture
- Model: SigLIP ViT-B/16 (Google, Apache 2.0)
- Vision: 12 layers, 768-D, 12 heads, 3072 intermediate
- Image: 384ร384, 16ร16 patches โ 576 tokens
- Pooling: Mean pool over patch tokens
- Normalization: L2 normalized output
- Downloads last month
- 89
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.
Model tree for cstr/siglip-base-GGUF
Base model
google/siglip-base-patch16-384