--- license: mit language: - en - zh base_model: - OpenGVLab/InternVL3-2B pipeline_tag: visual-question-answering tags: - OpenGVLab - InternVL3-2B --- # InternVL3-2B This version of InternVL3-2B has been converted to run on the Axera NPU using **w8a16** quantization. This model has been optimized with the following LoRA: Compatible with Pulsar2 version: 4.2 ## Convert tools links: For those who are interested in model conversion, you can try to export axmodel through the original repo : https://huggingface.co/OpenGVLab/InternVL3-2B [How to Convert LLM from Huggingface to axmodel](https://github.com/AXERA-TECH/InternVL3-2B.axera/tree/master/model_convert) [AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/ax-internvl) [AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl-internvl) ## Support Platform - AX650 - AX650N DEMO Board - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html) - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html) |chips|Image num|image encoder 448 | ttft | w8a16 | |--|--|--|--|--| |AX650N | 0 | 0 ms | 221 ms (128 tokens) | 10 tokens/sec | |AX650N | 1 | 364 ms | 862 ms (384 tokens) | 10 tokens/sec | |AX650N | 4 | 1456 ms | 4589 ms (1152 tokens) | 10 tokens/sec | |AX650N | 8 | 2912 ms | 13904 ms (2176 tokens) | 10 tokens/sec | ## How to use Download all files from this repository to the device. ```bash root@ax650:~/huggingface/InternVL3-2B# tree -L 1 . |-- README.md |-- config.json |-- examples |-- gradio_demo.py |-- gradio_demo_c_api.py |-- gradio_demo_python_api.py |-- infer.py |-- infer_video.py |-- internvl3_2b_axmodel |-- internvl3_2b_tokenizer |-- internvl3_tokenizer.py |-- llm.py |-- main_api_ax650 |-- main_api_axcl_aarch64 |-- main_api_axcl_x86 |-- main_ax650 |-- main_axcl_aarch64 |-- main_axcl_x86 |-- post_config.json |-- requirements.txt |-- run_internvl_3_2b_448_api_ax650.sh |-- run_internvl_3_2b_448_api_axcl_aarch64.sh |-- run_internvl_3_2b_448_api_axcl_x86.sh |-- run_internvl_3_2b_448_ax650.sh |-- run_internvl_3_2b_448_axcl_aarch64.sh |-- run_internvl_3_2b_448_axcl_x86.sh |-- vit_axmodel `-- webgui.png 4 directories, 24 files ``` ### python env requirement #### pyaxengine https://github.com/AXERA-TECH/pyaxengine ``` wget https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc1/axengine-0.1.3-py3-none-any.whl pip install axengine-0.1.3-py3-none-any.whl ``` #### others ``` pip install -r requirements.txt ``` #### Inference with Raspberry Pi 5 Host using AXCL EP(such as M.2 AI Card or HAT AI Module) ``` cd InternVL3-2B python gradio_demo_python_api.py --hf_model internvl3_2b_tokenizer/ \ --axmodel_path internvl3_2b_axmodel/ \ --vit_model vit_axmodel/internvl3_2b_vit_slim.axmodel [INFO] Available providers: ['AXCLRTExecutionProvider'] Init InferenceSession: 0%| | 0/28 [00:00 151665 151667 context_len is 256 prompt is <|im_start|>system 你是书生·万象, 英文名是InternVL, 是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型.<|im_end| ... http://0.0.0.0:12345 ``` Open another terminal and run `run_internvl_3_2b_448_ax650.sh` ``` root@ax650:~/wangli/huggingface/InternVL3-2B# ./run_internvl_3_2b_448_ax650.sh [I][ Init][ 134]: LLM init start [I][ Init][ 34]: connect http://0.0.0.0:12345 ok bos_id: -1, eos_id: 151645 img_start_token: 151665 img_context_token: 151667 3% | ██ | 1 / 31 [0.01s<0.37s, 83.33 count/s] tokenizer init ok[I][ Init][ 45]: LLaMaEmbedSelector use mmap 6% | ███ | 2 / 31 [0.01s<0.19s, 166.67 count/s] embed_selector init ok 100% | ████████████████████████████████ | 31 / 31 [6.26s<6.26s, 4.95 count/s] init post axmodel ok,remain_cmm(7416 MB)[I][ Init][ 226]: IMAGE_CONTEXT_TOKEN: 151667, IMAGE_START_TOKEN: 151665 [I][ Init][ 251]: image encoder input nchw@float32 [I][ Init][ 281]: image encoder output float32 [I][ Init][ 291]: image_encoder_height : 448, image_encoder_width: 448 [I][ Init][ 293]: max_token_len : 2559 [I][ Init][ 296]: kv_cache_size : 256, kv_cache_num: 2559 [I][ Init][ 304]: prefill_token_num : 128 [I][ Init][ 308]: grp: 1, prefill_max_token_num : 1 [I][ Init][ 308]: grp: 2, prefill_max_token_num : 128 [I][ Init][ 308]: grp: 3, prefill_max_token_num : 256 [I][ Init][ 308]: grp: 4, prefill_max_token_num : 384 [I][ Init][ 308]: grp: 5, prefill_max_token_num : 512 [I][ Init][ 308]: grp: 6, prefill_max_token_num : 640 [I][ Init][ 308]: grp: 7, prefill_max_token_num : 768 [I][ Init][ 308]: grp: 8, prefill_max_token_num : 896 [I][ Init][ 308]: grp: 9, prefill_max_token_num : 1024 [I][ Init][ 312]: prefill_max_token_num : 1024 [I][ load_config][ 282]: load config: { "enable_repetition_penalty": false, "enable_temperature": true, "enable_top_k_sampling": true, "enable_top_p_sampling": false, "penalty_window": 20, "repetition_penalty": 1.2, "temperature": 0.9, "top_k": 10, "top_p": 0.8 } [I][ Init][ 321]: LLM init ok Type "q" to exit, Ctrl+c to stop current running prompt >> 你是谁 image >> [I][ Run][ 551]: input token num : 46, prefill_split_num : 1 [I][ Run][ 566]: prefill grpid 2 [I][ Run][ 593]: input_num_token:46 [I][ Run][ 717]: ttft: 311.26 ms 你好!我是商汤科技开发的多模态大模型,英文名叫InternVL。很高兴为你服务!请问有什么可以帮助你的吗? [N][ Run][ 826]: hit eos,avg 10.69 token/s prompt >> 描述一下这张图片 image >> examples/image_0.jpg [I][ Encode][ 415]: image encode time : 408.81 ms, size : 393216 [I][ Encode][ 524]: idx:0 offset : 49 out_embed.size() : 477696 [I][ Run][ 551]: input token num : 311, prefill_split_num : 3 [I][ Run][ 566]: prefill grpid 4 [I][ Run][ 593]: input_num_token:128 [I][ Run][ 593]: input_num_token:128 [I][ Run][ 593]: input_num_token:55 [I][ Run][ 717]: ttft: 1325.82 ms 这张图片展示了一只可爱的红熊猫。红熊猫是一种生活在亚洲森林中的熊科动物,以其红棕色的毛皮和白脸而闻名。图片中的红熊猫正趴在木板上,身体的一部分探出木板,显得有些放松和好奇。它的眼睛圆圆的,黑色的,看起来非常可爱。毛皮主要是棕红色的,耳朵和腹部是白色的,形成了鲜明的对比。背景中可以看到一些树木和绿色的叶子,暗示这可能是在自然的森林环境中拍摄的。整体上,这张图片传达出一种温暖和亲近自然的感觉。 [N][ Run][ 826]: hit eos,avg 10.70 token/s prompt >> q ```