AXERA-TECH
/

siglip2-base-patch16-224

Zero-Shot Classification

Model card Files Files and versions

siglip2-base-patch16-224 / README.md

wzf19947's picture

update readme

ad55e40 7 days ago

|

history blame contribute delete

3.51 kB

	---
	license: mit
	language:
	- en
	base_model:
	- google/siglip2-base-patch16-224
	pipeline_tag: zero-shot-classification
	---

	# SigLIP2

	SigLIP 2 extends the pretraining objective of SigLIP with prior, independently developed techniques into a unified recipe, for improved semantic understanding, localization, and dense features.
	You can use the raw model for tasks like zero-shot image classification and image-text retrieval, or as a vision encoder for VLMs (and other vision tasks).

	The Original repo is https://huggingface.co/google/siglip2-base-patch16-224.

	This model of SigLIP has been converted to run on the Axera NPU using w8a16 quantization.

	This model has been optimized with the following LoRA:

	Compatible with Pulsar2 version: 5.1

	## Convert tools links:

	For those who are interested in model conversion, you can try to export axmodel through


	- [The repo of AXera Platform](https://github.com/AXERA-TECH/SigLIP2.axera), which you can get the detial of guide

	- [Pulsar2 Link, How to Convert ONNX to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/pulsar2/introduction.html)


	## Support Platform

	- AX650
	- [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
	- [M.2 Accelerator card](https://docs.m5stack.com/zh_CN/ai_hardware/LLM-8850_Card)


	\| Models \| latency \|
	\| -------------\| ------------- \|
	\| Image Encoder\| 11.1ms \|
	\| Text Encoder \| 4.56ms \|

	## How to use

	Download all files from this repository to the device

	```
	root@ax650:~/siglip2-base-patch16-224# tree -L 2
	.
	├── 000000039769.jpg
	├── README.md
	├── ax650
	│ ├── siglip2-base-patch16-224_text.axmodel
	│ └── siglip2-base-patch16-224_vision.axmodel
	├── config.json
	├── model_convert
	│ ├── imagenet-calib.tar
	│ ├── siglip2-base-patch16-224_text.json
	│ └── siglip2-base-patch16-224_vision.json
	├── onnx
	│ ├── siglip2-base-patch16-224_text.onnx
	│ └── siglip2-base-patch16-224_vision.onnx
	├── python
	│ ├── axmodel_infer.py
	│ ├── export_onnx.py
	│ ├── onnx_infer.py
	│ ├── requirements.txt
	│ └── test.py
	└── tokenizer
	├── config.json
	├── preprocessor_config.json
	├── special_tokens_map.json
	├── tokenizer.json
	└── tokenizer_config.json

	5 directories, 20 files

	```

	### python env requirement

	#### pyaxengine

	https://github.com/AXERA-TECH/pyaxengine

	```
	wget https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3rc0/axengine-0.1.3-py3-none-any.whl
	pip install axengine-0.1.3-py3-none-any.whl
	```

	#### others

	```
	pip install -r python/requirements.txt
	```

	## Inputs

	Test
	```
	"a photo of 2 cats", "a photo of 2 dogs"
	```

	Image
	![](000000039769.jpg)

	## Inference with AX650 Host, such as M4N-Dock(爱芯派Pro)

	```
	root@ax650:~/siglip2-base-patch16-224# python3 python/axmodel_infer.py
	[INFO] Available providers: ['AxEngineExecutionProvider']
	[INFO] Using provider: AxEngineExecutionProvider
	[INFO] Chip type: ChipType.MC50
	[INFO] VNPU type: VNPUType.DISABLED
	[INFO] Engine version: 2.12.0s
	[INFO] Model type: 2 (triple core)
	[INFO] Compiler version: 5.1-patch1 430ee3be
	[INFO] Using provider: AxEngineExecutionProvider
	[INFO] Model type: 2 (triple core)
	[INFO] Compiler version: 5.1-patch1 430ee3be
	[[1.0596762e-01 1.9978019e-05]]
	10.6% that image 0 is 'a photo of 2 cats'
	```