siglip2-base-patch16-224 / README.md

wzf19947

update readme

ad55e40 4 days ago

preview code

raw

history blame contribute delete

3.51 kB

metadata

license: mit
language:
  - en
base_model:
  - google/siglip2-base-patch16-224
pipeline_tag: zero-shot-classification

SigLIP2

SigLIP 2 extends the pretraining objective of SigLIP with prior, independently developed techniques into a unified recipe, for improved semantic understanding, localization, and dense features. You can use the raw model for tasks like zero-shot image classification and image-text retrieval, or as a vision encoder for VLMs (and other vision tasks).

The Original repo is https://huggingface.co/google/siglip2-base-patch16-224.

This model of SigLIP has been converted to run on the Axera NPU using w8a16 quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: 5.1

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through

The repo of AXera Platform, which you can get the detial of guide
Pulsar2 Link, How to Convert ONNX to axmodel

Support Platform

AX650
- M4N-Dock(爱芯派Pro)
- M.2 Accelerator card

Models	latency
Image Encoder	11.1ms
Text Encoder	4.56ms

How to use

Download all files from this repository to the device

root@ax650:~/siglip2-base-patch16-224# tree -L 2
.
├── 000000039769.jpg
├── README.md
├── ax650
│   ├── siglip2-base-patch16-224_text.axmodel
│   └── siglip2-base-patch16-224_vision.axmodel
├── config.json
├── model_convert
│   ├── imagenet-calib.tar
│   ├── siglip2-base-patch16-224_text.json
│   └── siglip2-base-patch16-224_vision.json
├── onnx
│   ├── siglip2-base-patch16-224_text.onnx
│   └── siglip2-base-patch16-224_vision.onnx
├── python
│   ├── axmodel_infer.py
│   ├── export_onnx.py
│   ├── onnx_infer.py
│   ├── requirements.txt
│   └── test.py
└── tokenizer
    ├── config.json
    ├── preprocessor_config.json
    ├── special_tokens_map.json
    ├── tokenizer.json
    └── tokenizer_config.json

5 directories, 20 files

python env requirement

pyaxengine

https://github.com/AXERA-TECH/pyaxengine

wget https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3rc0/axengine-0.1.3-py3-none-any.whl
pip install axengine-0.1.3-py3-none-any.whl

others

pip install -r python/requirements.txt

Inputs

Test

"a photo of 2 cats", "a photo of 2 dogs"

Image

Inference with AX650 Host, such as M4N-Dock(爱芯派Pro)

root@ax650:~/siglip2-base-patch16-224# python3 python/axmodel_infer.py
[INFO] Available providers:  ['AxEngineExecutionProvider']
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.12.0s
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1 430ee3be
[INFO] Using provider: AxEngineExecutionProvider
[INFO] Model type: 2 (triple core)
[INFO] Compiler version: 5.1-patch1 430ee3be
[[1.0596762e-01 1.9978019e-05]]
10.6% that image 0 is 'a photo of 2 cats'