Configuration Parsing Warning:Invalid JSON for config file config.json

SigLIP2

SigLIP 2 extends the pretraining objective of SigLIP with prior, independently developed techniques into a unified recipe, for improved semantic understanding, localization, and dense features. You can use the raw model for tasks like zero-shot image classification and image-text retrieval, or as a vision encoder for VLMs (and other vision tasks).

The Original repo is https://huggingface.co/google/siglip2-base-patch16-224.

This model of SigLIP has been converted to run on the Axera NPU using w8a16 quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: 5.1

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through

The repo of AXera Platform, which you can get the detial of guide
Pulsar2 Link, How to Convert ONNX to axmodel

Support Platform

AX650
- M4N-Dock(爱芯派Pro)
- M.2 Accelerator card

Models	latency
Image Encoder	11.1ms
Text Encoder	4.56ms

How to use

Download all files from this repository to the device

root@ax650:~/siglip2-base-patch16-224# tree -L 2
.
├── 000000039769.jpg
├── README.md
├── ax650
│   ├── siglip2-base-patch16-224_text.axmodel
│   └── siglip2-base-patch16-224_vision.axmodel
├── config.json
├── model_convert
│   ├── imagenet-calib.tar
│   ├── siglip2-base-patch16-224_text.json
│   └── siglip2-base-patch16-224_vision.json
├── onnx
│   ├── siglip2-base-patch16-224_text.onnx
│   └── siglip2-base-patch16-224_vision.onnx
├── python
│   ├── axmodel_infer.py
│   ├── export_onnx.py
│   ├── onnx_infer.py
│   ├── requirements.txt
│   └── test.py
└── tokenizer
    ├── config.json
    ├── preprocessor_config.json
    ├── special_tokens_map.json
    ├── tokenizer.json
    └── tokenizer_config.json

5 directories, 20 files

python env requirement

pyaxengine

https://github.com/AXERA-TECH/pyaxengine

wget https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3rc0/axengine-0.1.3-py3-none-any.whl
pip install axengine-0.1.3-py3-none-any.whl

others

pip install -r python/requirements.txt

Inputs

Test

"a photo of 2 cats", "a photo of 2 dogs"

Image

Inference with AX650 Host, such as M4N-Dock(爱芯派Pro)

root@ax650:~/siglip2-base-patch16-224# python3 python/axmodel_infer.py
[INFO] Available providers:  ['AxEngineExecutionProvider']
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.12.0s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1 430ee3be
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1 430ee3be
[[1.0596762e-01 1.9978019e-05]]
10.6% that image 0 is 'a photo of 2 cats'

Downloads last month: 12

Model tree for AXERA-TECH/siglip2-base-patch16-224

Base model

google/siglip2-base-patch16-224

Quantized

(8)

this model