|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- google/siglip2-base-patch16-224 |
|
|
pipeline_tag: zero-shot-classification |
|
|
--- |
|
|
|
|
|
# SigLIP2 |
|
|
|
|
|
SigLIP 2 extends the pretraining objective of SigLIP with prior, independently developed techniques into a unified recipe, for improved semantic understanding, localization, and dense features. |
|
|
You can use the raw model for tasks like zero-shot image classification and image-text retrieval, or as a vision encoder for VLMs (and other vision tasks). |
|
|
|
|
|
The Original repo is https://huggingface.co/google/siglip2-base-patch16-224. |
|
|
|
|
|
This model of SigLIP has been converted to run on the Axera NPU using **w8a16** quantization. |
|
|
|
|
|
This model has been optimized with the following LoRA: |
|
|
|
|
|
Compatible with Pulsar2 version: 5.1 |
|
|
|
|
|
## Convert tools links: |
|
|
|
|
|
For those who are interested in model conversion, you can try to export axmodel through |
|
|
|
|
|
|
|
|
- [The repo of AXera Platform](https://github.com/AXERA-TECH/SigLIP2.axera), which you can get the detial of guide |
|
|
|
|
|
- [Pulsar2 Link, How to Convert ONNX to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/pulsar2/introduction.html) |
|
|
|
|
|
|
|
|
## Support Platform |
|
|
|
|
|
- AX650 |
|
|
- [M4N-Dock(η±θ―ζ΄ΎPro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html) |
|
|
- [M.2 Accelerator card](https://docs.m5stack.com/zh_CN/ai_hardware/LLM-8850_Card) |
|
|
|
|
|
|
|
|
| Models | latency | |
|
|
| -------------| ------------- | |
|
|
| Image Encoder| 11.1ms | |
|
|
| Text Encoder | 4.56ms | |
|
|
|
|
|
## How to use |
|
|
|
|
|
Download all files from this repository to the device |
|
|
|
|
|
``` |
|
|
root@ax650:~/siglip2-base-patch16-224# tree -L 2 |
|
|
. |
|
|
βββ 000000039769.jpg |
|
|
βββ README.md |
|
|
βββ ax650 |
|
|
βΒ Β βββ siglip2-base-patch16-224_text.axmodel |
|
|
βΒ Β βββ siglip2-base-patch16-224_vision.axmodel |
|
|
βββ config.json |
|
|
βββ model_convert |
|
|
βΒ Β βββ imagenet-calib.tar |
|
|
βΒ Β βββ siglip2-base-patch16-224_text.json |
|
|
βΒ Β βββ siglip2-base-patch16-224_vision.json |
|
|
βββ onnx |
|
|
βΒ Β βββ siglip2-base-patch16-224_text.onnx |
|
|
βΒ Β βββ siglip2-base-patch16-224_vision.onnx |
|
|
βββ python |
|
|
βΒ Β βββ axmodel_infer.py |
|
|
βΒ Β βββ export_onnx.py |
|
|
βΒ Β βββ onnx_infer.py |
|
|
βΒ Β βββ requirements.txt |
|
|
βΒ Β βββ test.py |
|
|
βββ tokenizer |
|
|
βββ config.json |
|
|
βββ preprocessor_config.json |
|
|
βββ special_tokens_map.json |
|
|
βββ tokenizer.json |
|
|
βββ tokenizer_config.json |
|
|
|
|
|
5 directories, 20 files |
|
|
|
|
|
``` |
|
|
|
|
|
### python env requirement |
|
|
|
|
|
#### pyaxengine |
|
|
|
|
|
https://github.com/AXERA-TECH/pyaxengine |
|
|
|
|
|
``` |
|
|
wget https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3rc0/axengine-0.1.3-py3-none-any.whl |
|
|
pip install axengine-0.1.3-py3-none-any.whl |
|
|
``` |
|
|
|
|
|
#### others |
|
|
|
|
|
``` |
|
|
pip install -r python/requirements.txt |
|
|
``` |
|
|
|
|
|
## Inputs |
|
|
|
|
|
**Test** |
|
|
``` |
|
|
"a photo of 2 cats", "a photo of 2 dogs" |
|
|
``` |
|
|
|
|
|
**Image** |
|
|
 |
|
|
|
|
|
## Inference with AX650 Host, such as M4N-Dock(η±θ―ζ΄ΎPro) |
|
|
|
|
|
``` |
|
|
root@ax650:~/siglip2-base-patch16-224# python3 python/axmodel_infer.py |
|
|
[INFO] Available providers: ['AxEngineExecutionProvider'] |
|
|
[INFO] Using provider: AxEngineExecutionProvider |
|
|
[INFO] Chip type: ChipType.MC50 |
|
|
[INFO] VNPU type: VNPUType.DISABLED |
|
|
[INFO] Engine version: 2.12.0s |
|
|
[INFO] Model type: 2 (triple core) |
|
|
[INFO] Compiler version: 5.1-patch1 430ee3be |
|
|
[INFO] Using provider: AxEngineExecutionProvider |
|
|
[INFO] Model type: 2 (triple core) |
|
|
[INFO] Compiler version: 5.1-patch1 430ee3be |
|
|
[[1.0596762e-01 1.9978019e-05]] |
|
|
10.6% that image 0 is 'a photo of 2 cats' |
|
|
``` |