license: mit
language:
- en
base_model:
- google/siglip2-base-patch16-224
pipeline_tag: zero-shot-classification
SigLIP2
SigLIP 2 extends the pretraining objective of SigLIP with prior, independently developed techniques into a unified recipe, for improved semantic understanding, localization, and dense features. You can use the raw model for tasks like zero-shot image classification and image-text retrieval, or as a vision encoder for VLMs (and other vision tasks).
The Original repo is https://huggingface.co/google/siglip2-base-patch16-224.
This model of SigLIP has been converted to run on the Axera NPU using w8a16 quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: 5.1
Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through
The repo of AXera Platform, which you can get the detial of guide
Support Platform
| Models | latency |
|---|---|
| Image Encoder | 11.1ms |
| Text Encoder | 4.56ms |
How to use
Download all files from this repository to the device
root@ax650:~/siglip2-base-patch16-224# tree -L 2
.
βββ 000000039769.jpg
βββ README.md
βββ ax650
β βββ siglip2-base-patch16-224_text.axmodel
β βββ siglip2-base-patch16-224_vision.axmodel
βββ config.json
βββ model_convert
β βββ imagenet-calib.tar
β βββ siglip2-base-patch16-224_text.json
β βββ siglip2-base-patch16-224_vision.json
βββ onnx
β βββ siglip2-base-patch16-224_text.onnx
β βββ siglip2-base-patch16-224_vision.onnx
βββ python
β βββ axmodel_infer.py
β βββ export_onnx.py
β βββ onnx_infer.py
β βββ requirements.txt
β βββ test.py
βββ tokenizer
βββ config.json
βββ preprocessor_config.json
βββ special_tokens_map.json
βββ tokenizer.json
βββ tokenizer_config.json
5 directories, 20 files
python env requirement
pyaxengine
https://github.com/AXERA-TECH/pyaxengine
wget https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3rc0/axengine-0.1.3-py3-none-any.whl
pip install axengine-0.1.3-py3-none-any.whl
others
pip install -r python/requirements.txt
Inputs
Test
"a photo of 2 cats", "a photo of 2 dogs"
Inference with AX650 Host, such as M4N-Dock(η±θ―ζ΄ΎPro)
root@ax650:~/siglip2-base-patch16-224# python3 python/axmodel_infer.py
[INFO] Available providers: ['AxEngineExecutionProvider']
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.12.0s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1 430ee3be
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1 430ee3be
[[1.0596762e-01 1.9978019e-05]]
10.6% that image 0 is 'a photo of 2 cats'
