LocateAnything-3B

This version of LocateAnything-3B have been converted to run on the Axera NPU using w4a16 quantization.

Compatible with Pulsar2 version: 6.0

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo :

https://huggingface.co/nvidia/LocateAnything-3B

Pulsar2 Link, How to Convert LLM from Huggingface to axmodel

AXera NPU HOST LLM Runtime

Support Platform

AX650
- AX650N DEMO Board
- M4N-Dock(爱芯派Pro)
- M.2 Accelerator card

Image Process

Chips	input size	image num	image encoder	ttft(493 tokens)	w4a16	CMM	Flash
AX650	560*560	1	1152.583 ms	2072.06 ms	10.61 tokens/sec	2.9GiB	3.2GiB

The DDR capacity refers to the CMM memory that needs to be consumed. Ensure that the CMM memory allocation on the development board is greater than this value.

模型下载（Hugging Face）

mkdir -p AXERA-TECH/LocateAnything-3B
cd AXERA-TECH/LocateAnything-3B
hf download AXERA-TECH/LocateAnything-3B --local-dir .

# structure of the downloaded files
tree -L 3
.
└── AXERA-TECH
    └── LocateAnything-3B
        |-- assert
        |-- config.json
        |-- gradio_locateanything_axengine.py
        |-- image_encoder_mlp.axmodel
        |-- infer_locateanything_axengine.py
        |-- model.embed_tokens.weight.bfloat16.bin
        |-- post_config.json
        |-- qwen2.5_tokenizer
        |-- qwen2_5_tokenizer.txt
        |-- qwen2_p128_l0_together.axmodel
        |-- qwen2_p128_l10_together.axmodel
        |-- qwen2_p128_l11_together.axmodel
        |-- qwen2_p128_l12_together.axmodel
        |-- qwen2_p128_l13_together.axmodel
        |-- qwen2_p128_l14_together.axmodel
        |-- qwen2_p128_l15_together.axmodel
        |-- qwen2_p128_l16_together.axmodel
        |-- qwen2_p128_l17_together.axmodel
        |-- qwen2_p128_l18_together.axmodel
        |-- qwen2_p128_l19_together.axmodel
        |-- qwen2_p128_l1_together.axmodel
        |-- qwen2_p128_l20_together.axmodel
        |-- qwen2_p128_l21_together.axmodel
        |-- qwen2_p128_l22_together.axmodel
        |-- qwen2_p128_l23_together.axmodel
        |-- qwen2_p128_l24_together.axmodel
        |-- qwen2_p128_l25_together.axmodel
        |-- qwen2_p128_l26_together.axmodel
        |-- qwen2_p128_l27_together.axmodel
        |-- qwen2_p128_l28_together.axmodel
        |-- qwen2_p128_l29_together.axmodel
        |-- qwen2_p128_l2_together.axmodel
        |-- qwen2_p128_l30_together.axmodel
        |-- qwen2_p128_l31_together.axmodel
        |-- qwen2_p128_l32_together.axmodel
        |-- qwen2_p128_l33_together.axmodel
        |-- qwen2_p128_l34_together.axmodel
        |-- qwen2_p128_l35_together.axmodel
        |-- qwen2_p128_l3_together.axmodel
        |-- qwen2_p128_l4_together.axmodel
        |-- qwen2_p128_l5_together.axmodel
        |-- qwen2_p128_l6_together.axmodel
        |-- qwen2_p128_l7_together.axmodel
        |-- qwen2_p128_l8_together.axmodel
        |-- qwen2_p128_l9_together.axmodel
        |-- qwen2_post.axmodel
        |-- results
        `-- test_data

4 directories, 44 files

Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board

Gradio Demo

(base) root@ax650:~/LocateAnything# python gradio_locateanything_axengine.py
[INFO] Available providers:  ['AxEngineExecutionProvider', 'AXCLRTExecutionProvider']
[Gradio] starting LocateAnything UI
[Gradio] local: http://127.0.0.1:7860
[Gradio] LAN:   http://10.126.29.50:7860
[Gradio] LAN:   http://10.126.29.68:7860
[Gradio] LAN:   http://172.17.0.1:7860
[Gradio] Use another computer in the same LAN to open the LAN URL.
* Running on local URL:  http://0.0.0.0:7860
* To create a public link, set `share=True` in `launch()`.

Output:

detection: