AXERA-TECH
/

MobileCLIP

Image-Text-to-Text

Model card Files Files and versions

MobileCLIP / README.md

jordan0811's picture

Create README.md

6eca370 verified 4 months ago

|

history blame contribute delete

1.45 kB

	---
	license: apache-2.0
	language:
	- en
	base_model:
	- apple/MobileCLIP2-S4
	- apple/MobileCLIP2-S2
	pipeline_tag: image-text-to-text
	tags:
	- MobileCLIP
	- MobileCLIP2
	- CLIP
	- Classification
	---

	# MobileCLIP2

	The following versions of MobileCLIP2 have been converted to run on the Axera NPU using w8a16 quantization. Compatible with Pulsar2 version: 4.2
	- MobileCLIP2-S2
	- MobileCLIP2-S4

	If you want to know how to convert the MobileCLIP2 model into an axmodel that can run on the axera npu board, please read [this link](https://github.com/AXERA-TECH/axera.ml-mobileclip) in detail.

	## Support Platform
	- AX650

	## End-of-board inference time
	- MobileCLIP2-S2
	\| Stage \| Time \|
	\|------\|------\|
	\| image encoder \| 19.146 ms \|
	\| text encoder \| 5.675 ms \|

	- MobileCLIP2-S4
	\| Stage \| Time \|
	\|------\|------\|
	\| image encoder \| 65.328 ms \|
	\| text encoder \| 12.663 ms \|


	## How to use

	Download all files from this repository to the device

	Run the following command:
	```bash
	python3 run_axmodel.py -ie ./mobileclip2_s4_image_encoder.axmodel -te ./mobileclip2_s4_text_encoder.axmodel -i ./zebra.jpg -t "a zebra" "a dog" "two zebras"
	```

	Model input and output examples are as follows:
	1. the image you want to input:

	![](zebra.jpg)

	3. The description of the text you want to categorize:

	["a zebra", "a dog", "two zebras"]

	4. Model output class confidence scores:

	Label probs: [[6.095444e-02 5.628616e-14 9.390456e-01]]