File size: 9,404 Bytes
56ed422 c7b9cf2 56ed422 49a09e2 56ed422 c7b9cf2 56ed422 c7b9cf2 134374c c7b9cf2 134374c 56ed422 c7b9cf2 56ed422 49a09e2 56ed422 c7b9cf2 56ed422 c7b9cf2 56ed422 49a09e2 56ed422 49a09e2 56ed422 c7b9cf2 56ed422 c7b9cf2 56ed422 c7b9cf2 56ed422 c7b9cf2 56ed422 c7b9cf2 56ed422 c7b9cf2 56ed422 c7b9cf2 56ed422 c7b9cf2 56ed422 c7b9cf2 56ed422 c7b9cf2 5abe660 c7b9cf2 5abe660 c7b9cf2 49a09e2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 |
---
license: mit
language:
- en
- zh
base_model:
- OpenGVLab/InternVL3-2B
pipeline_tag: visual-question-answering
tags:
- OpenGVLab
- InternVL3-2B
---
# InternVL3-2B
This version of InternVL3-2B has been converted to run on the Axera NPU using **w8a16** quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: 4.2
## Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through the original repo :
https://huggingface.co/OpenGVLab/InternVL3-2B
[How to Convert LLM from Huggingface to axmodel](https://github.com/AXERA-TECH/InternVL3-2B.axera/tree/master/model_convert)
[AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/ax-internvl)
[AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl-internvl)
## Support Platform
- AX650
- AX650N DEMO Board
- [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
- [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
|chips|Image num|image encoder 448 | ttft | w8a16 |
|--|--|--|--|--|
|AX650N | 0 | 0 ms | 221 ms (128 tokens) | 10 tokens/sec |
|AX650N | 1 | 364 ms | 862 ms (384 tokens) | 10 tokens/sec |
|AX650N | 4 | 1456 ms | 4589 ms (1152 tokens) | 10 tokens/sec |
|AX650N | 8 | 2912 ms | 13904 ms (2176 tokens) | 10 tokens/sec |
## How to use
Download all files from this repository to the device.
```bash
root@ax650:~/huggingface/InternVL3-2B# tree -L 1
.
|-- README.md
|-- config.json
|-- examples
|-- gradio_demo.py
|-- gradio_demo_c_api.py
|-- gradio_demo_python_api.py
|-- infer.py
|-- infer_video.py
|-- internvl3_2b_axmodel
|-- internvl3_2b_tokenizer
|-- internvl3_tokenizer.py
|-- llm.py
|-- main_api_ax650
|-- main_api_axcl_aarch64
|-- main_api_axcl_x86
|-- main_ax650
|-- main_axcl_aarch64
|-- main_axcl_x86
|-- post_config.json
|-- requirements.txt
|-- run_internvl_3_2b_448_api_ax650.sh
|-- run_internvl_3_2b_448_api_axcl_aarch64.sh
|-- run_internvl_3_2b_448_api_axcl_x86.sh
|-- run_internvl_3_2b_448_ax650.sh
|-- run_internvl_3_2b_448_axcl_aarch64.sh
|-- run_internvl_3_2b_448_axcl_x86.sh
|-- vit_axmodel
`-- webgui.png
4 directories, 24 files
```
### python env requirement
#### pyaxengine
https://github.com/AXERA-TECH/pyaxengine
```
wget https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc1/axengine-0.1.3-py3-none-any.whl
pip install axengine-0.1.3-py3-none-any.whl
```
#### others
```
pip install -r requirements.txt
```
#### Inference with Raspberry Pi 5 Host using AXCL EP(such as M.2 AI Card or HAT AI Module)
```
cd InternVL3-2B
python gradio_demo_python_api.py --hf_model internvl3_2b_tokenizer/ \
--axmodel_path internvl3_2b_axmodel/ \
--vit_model vit_axmodel/internvl3_2b_vit_slim.axmodel
[INFO] Available providers: ['AXCLRTExecutionProvider']
Init InferenceSession: 0%| | 0/28 [00:00<?, ?it/s]
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
Init InferenceSession: 4%|███▏ | 1/28 [00:01<00:43, 1.61s/it]
[INFO] Using provider: AXCLRTExecutionProvider
......
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
Init InferenceSession: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:34<00:00, 1.23s/it]
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
model load done!
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
chatbot = gr.Chatbot(height=650)
HTTP 服务地址: http://xxx.xxx.xxx.xxx:7860
* Running on local URL: http://xxx.xxx.xxx.xxx:7860
* To create a public link, set `share=True` in `launch()`.
```
Access `http://xxx.xxx.xxx.xxx:7860` using Chrome or another browser.

#### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board(C++ sample)
#### Start the Tokenizer service
```
root@ax650:~/huggingface/InternVL3-2B# python3 internvl3_tokenizer.py
None None 151645 <|im_end|> 151665 151667
context_len is 256
prompt is <|im_start|>system
你是书生·万象, 英文名是InternVL, 是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型.<|im_end|
...
http://0.0.0.0:12345
```
Open another terminal and run `run_internvl_3_2b_448_ax650.sh`
```
root@ax650:~/wangli/huggingface/InternVL3-2B# ./run_internvl_3_2b_448_ax650.sh
[I][ Init][ 134]: LLM init start
[I][ Init][ 34]: connect http://0.0.0.0:12345 ok
bos_id: -1, eos_id: 151645
img_start_token: 151665
img_context_token: 151667
3% | ██ | 1 / 31 [0.01s<0.37s, 83.33 count/s] tokenizer init ok[I][ Init][ 45]: LLaMaEmbedSelector use mmap
6% | ███ | 2 / 31 [0.01s<0.19s, 166.67 count/s] embed_selector init ok
100% | ████████████████████████████████ | 31 / 31 [6.26s<6.26s, 4.95 count/s] init post axmodel ok,remain_cmm(7416 MB)[I][ Init][ 226]: IMAGE_CONTEXT_TOKEN: 151667, IMAGE_START_TOKEN: 151665
[I][ Init][ 251]: image encoder input nchw@float32
[I][ Init][ 281]: image encoder output float32
[I][ Init][ 291]: image_encoder_height : 448, image_encoder_width: 448
[I][ Init][ 293]: max_token_len : 2559
[I][ Init][ 296]: kv_cache_size : 256, kv_cache_num: 2559
[I][ Init][ 304]: prefill_token_num : 128
[I][ Init][ 308]: grp: 1, prefill_max_token_num : 1
[I][ Init][ 308]: grp: 2, prefill_max_token_num : 128
[I][ Init][ 308]: grp: 3, prefill_max_token_num : 256
[I][ Init][ 308]: grp: 4, prefill_max_token_num : 384
[I][ Init][ 308]: grp: 5, prefill_max_token_num : 512
[I][ Init][ 308]: grp: 6, prefill_max_token_num : 640
[I][ Init][ 308]: grp: 7, prefill_max_token_num : 768
[I][ Init][ 308]: grp: 8, prefill_max_token_num : 896
[I][ Init][ 308]: grp: 9, prefill_max_token_num : 1024
[I][ Init][ 312]: prefill_max_token_num : 1024
[I][ load_config][ 282]: load config:
{
"enable_repetition_penalty": false,
"enable_temperature": true,
"enable_top_k_sampling": true,
"enable_top_p_sampling": false,
"penalty_window": 20,
"repetition_penalty": 1.2,
"temperature": 0.9,
"top_k": 10,
"top_p": 0.8
}
[I][ Init][ 321]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
prompt >> 你是谁
image >>
[I][ Run][ 551]: input token num : 46, prefill_split_num : 1
[I][ Run][ 566]: prefill grpid 2
[I][ Run][ 593]: input_num_token:46
[I][ Run][ 717]: ttft: 311.26 ms
你好!我是商汤科技开发的多模态大模型,英文名叫InternVL。很高兴为你服务!请问有什么可以帮助你的吗?
[N][ Run][ 826]: hit eos,avg 10.69 token/s
prompt >> 描述一下这张图片
image >> examples/image_0.jpg
[I][ Encode][ 415]: image encode time : 408.81 ms, size : 393216
[I][ Encode][ 524]: idx:0 offset : 49 out_embed.size() : 477696
[I][ Run][ 551]: input token num : 311, prefill_split_num : 3
[I][ Run][ 566]: prefill grpid 4
[I][ Run][ 593]: input_num_token:128
[I][ Run][ 593]: input_num_token:128
[I][ Run][ 593]: input_num_token:55
[I][ Run][ 717]: ttft: 1325.82 ms
这张图片展示了一只可爱的红熊猫。红熊猫是一种生活在亚洲森林中的熊科动物,以其红棕色的毛皮和白脸而闻名。图片中的红熊猫正趴在木板上,身体的一部分探出木板,显得有些放松和好奇。它的眼睛圆圆的,黑色的,看起来非常可爱。毛皮主要是棕红色的,耳朵和腹部是白色的,形成了鲜明的对比。背景中可以看到一些树木和绿色的叶子,暗示这可能是在自然的森林环境中拍摄的。整体上,这张图片传达出一种温暖和亲近自然的感觉。
[N][ Run][ 826]: hit eos,avg 10.70 token/s
prompt >> q
```
|