diegohh's picture
Upload README.md with huggingface_hub
902b496 verified
---
license: apache-2.0
base_model: google/siglip2-base-patch16-224
tags:
- onnx
- vision
- image-text-matching
- nebula
---
# siglip2-base-patch16-224 (ONNX)
This is [Google's SigLIP 2 base/224](https://huggingface.co/google/siglip2-base-patch16-224) exported to ONNX format for CPU inference, used by [Nebula](https://github.com/diegohh0411/nebula) for local, offline image search.
## What's inside
| File | Description |
|---|---|
| `model.onnx` | Combined vision + text encoder (~110 MB) |
| `tokenizer.json` | SigLIP tokenizer |
## Model inputs & outputs
The single `model.onnx` file contains both encoders. You can run either independently by passing a dummy tensor for the unused branch.
**Inputs**
| Name | Shape | dtype |
|---|---|---|
| `pixel_values` | `[image_batch, 3, 224, 224]` | float32 |
| `input_ids` | `[text_batch, seq_len]` | int64 |
**Outputs**
| Name | Shape | dtype | Description |
|---|---|---|---|
| `image_embeds` | `[image_batch, 768]` | float32 | L2-normalizable image embedding |
| `text_embeds` | `[text_batch, 768]` | float32 | L2-normalizable text embedding |
| `logits_per_image` | `[image_batch, text_batch]` | float32 | Cosine similarity scores |
| `logits_per_text` | `[text_batch, image_batch]` | float32 | Cosine similarity scores (transposed) |
## How it was exported
```bash
optimum-cli export onnx \
--model google/siglip2-base-patch16-224 \
--task zero-shot-image-classification \
--opset 18 \
./models/
```
Requires `optimum[onnxruntime]` and `transformers`.
## License
Inherits [Apache 2.0](https://huggingface.co/google/siglip2-base-patch16-224) from the original Google SigLIP 2 model.