File size: 9,404 Bytes
56ed422
 
 
 
 
 
 
 
 
 
 
 
 
c7b9cf2
56ed422
 
 
 
 
49a09e2
56ed422
 
 
c7b9cf2
 
56ed422
c7b9cf2
 
134374c
c7b9cf2
134374c
56ed422
 
c7b9cf2
56ed422
49a09e2
56ed422
c7b9cf2
56ed422
 
 
c7b9cf2
 
 
 
56ed422
 
 
 
 
 
49a09e2
56ed422
49a09e2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56ed422
 
c7b9cf2
56ed422
c7b9cf2
56ed422
c7b9cf2
56ed422
c7b9cf2
 
 
56ed422
 
c7b9cf2
56ed422
 
c7b9cf2
56ed422
 
c7b9cf2
56ed422
c7b9cf2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56ed422
c7b9cf2
56ed422
c7b9cf2
 
 
 
 
 
 
 
5abe660
 
c7b9cf2
5abe660
c7b9cf2
49a09e2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
---
license: mit
language:
  - en
  - zh
base_model:
  - OpenGVLab/InternVL3-2B
pipeline_tag: visual-question-answering
tags:
  - OpenGVLab
  - InternVL3-2B
---

# InternVL3-2B

This version of InternVL3-2B has been converted to run on the Axera NPU using **w8a16** quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: 4.2

## Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo : 
https://huggingface.co/OpenGVLab/InternVL3-2B

[How to Convert LLM from Huggingface to axmodel](https://github.com/AXERA-TECH/InternVL3-2B.axera/tree/master/model_convert) 

[AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/ax-internvl) 

[AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl-internvl)

## Support Platform

- AX650
  - AX650N DEMO Board
  - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
  - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)

|chips|Image num|image encoder 448 | ttft | w8a16 |
|--|--|--|--|--|
|AX650N | 0 | 0 ms | 221 ms (128 tokens) | 10 tokens/sec |
|AX650N | 1 | 364 ms | 862 ms (384 tokens) | 10 tokens/sec |
|AX650N | 4 | 1456 ms | 4589 ms (1152 tokens) | 10 tokens/sec |
|AX650N | 8 | 2912 ms | 13904 ms (2176 tokens) | 10 tokens/sec |

## How to use

Download all files from this repository to the device.

```bash
root@ax650:~/huggingface/InternVL3-2B# tree -L 1
.
|-- README.md
|-- config.json
|-- examples
|-- gradio_demo.py
|-- gradio_demo_c_api.py
|-- gradio_demo_python_api.py
|-- infer.py
|-- infer_video.py
|-- internvl3_2b_axmodel
|-- internvl3_2b_tokenizer
|-- internvl3_tokenizer.py
|-- llm.py
|-- main_api_ax650
|-- main_api_axcl_aarch64
|-- main_api_axcl_x86
|-- main_ax650
|-- main_axcl_aarch64
|-- main_axcl_x86
|-- post_config.json
|-- requirements.txt
|-- run_internvl_3_2b_448_api_ax650.sh
|-- run_internvl_3_2b_448_api_axcl_aarch64.sh
|-- run_internvl_3_2b_448_api_axcl_x86.sh
|-- run_internvl_3_2b_448_ax650.sh
|-- run_internvl_3_2b_448_axcl_aarch64.sh
|-- run_internvl_3_2b_448_axcl_x86.sh
|-- vit_axmodel
`-- webgui.png

4 directories, 24 files
```

### python env requirement

#### pyaxengine

https://github.com/AXERA-TECH/pyaxengine

```
wget https://github.com/AXERA-TECH/pyaxengine/releases/download/0.1.3.rc1/axengine-0.1.3-py3-none-any.whl
pip install axengine-0.1.3-py3-none-any.whl
```

#### others

```
pip install -r requirements.txt
```

#### Inference with Raspberry Pi 5 Host using AXCL EP(such as M.2 AI Card or HAT AI Module)

```
cd InternVL3-2B
python gradio_demo_python_api.py --hf_model internvl3_2b_tokenizer/ \
                                 --axmodel_path internvl3_2b_axmodel/ \
                                 --vit_model vit_axmodel/internvl3_2b_vit_slim.axmodel

[INFO] Available providers:  ['AXCLRTExecutionProvider']
Init InferenceSession:   0%|                                                                                 | 0/28 [00:00<?, ?it/s]
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
Init InferenceSession:   4%|███▏                                                                             | 1/28 [00:01<00:43,  1.61s/it]
[INFO] Using provider: AXCLRTExecutionProvider
......
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
Init InferenceSession: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:34<00:00,  1.23s/it]
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
model load done!
[INFO] Using provider: AXCLRTExecutionProvider
[INFO] SOC Name: AX650N
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Compiler version: 3.4 162fdaa8
  chatbot = gr.Chatbot(height=650)
HTTP 服务地址: http://xxx.xxx.xxx.xxx:7860
* Running on local URL:  http://xxx.xxx.xxx.xxx:7860
* To create a public link, set `share=True` in `launch()`.
```

Access `http://xxx.xxx.xxx.xxx:7860` using Chrome or another browser.

![](webgui.png)

#### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board(C++ sample)
#### Start the Tokenizer service

```
root@ax650:~/huggingface/InternVL3-2B# python3 internvl3_tokenizer.py
None None 151645 <|im_end|> 151665 151667
context_len is  256
prompt is <|im_start|>system
你是书生·万象, 英文名是InternVL, 是由上海人工智能实验室、清华大学及多家合作单位联合开发的多模态大语言模型.<|im_end|
...
http://0.0.0.0:12345

```
Open another terminal and run `run_internvl_3_2b_448_ax650.sh`
```
root@ax650:~/wangli/huggingface/InternVL3-2B# ./run_internvl_3_2b_448_ax650.sh
[I][                            Init][ 134]: LLM init start
[I][                            Init][  34]: connect http://0.0.0.0:12345 ok
bos_id: -1, eos_id: 151645
img_start_token: 151665
img_context_token: 151667
  3% | ██                                |   1 /  31 [0.01s<0.37s, 83.33 count/s] tokenizer init ok[I][                            Init][  45]: LLaMaEmbedSelector use mmap
  6% | ███                               |   2 /  31 [0.01s<0.19s, 166.67 count/s] embed_selector init ok
100% | ████████████████████████████████ |  31 /  31 [6.26s<6.26s, 4.95 count/s] init post axmodel ok,remain_cmm(7416 MB)[I][                            Init][ 226]: IMAGE_CONTEXT_TOKEN: 151667, IMAGE_START_TOKEN: 151665
[I][                            Init][ 251]: image encoder input nchw@float32
[I][                            Init][ 281]: image encoder output float32

[I][                            Init][ 291]: image_encoder_height : 448, image_encoder_width: 448
[I][                            Init][ 293]: max_token_len : 2559
[I][                            Init][ 296]: kv_cache_size : 256, kv_cache_num: 2559
[I][                            Init][ 304]: prefill_token_num : 128
[I][                            Init][ 308]: grp: 1, prefill_max_token_num : 1
[I][                            Init][ 308]: grp: 2, prefill_max_token_num : 128
[I][                            Init][ 308]: grp: 3, prefill_max_token_num : 256
[I][                            Init][ 308]: grp: 4, prefill_max_token_num : 384
[I][                            Init][ 308]: grp: 5, prefill_max_token_num : 512
[I][                            Init][ 308]: grp: 6, prefill_max_token_num : 640
[I][                            Init][ 308]: grp: 7, prefill_max_token_num : 768
[I][                            Init][ 308]: grp: 8, prefill_max_token_num : 896
[I][                            Init][ 308]: grp: 9, prefill_max_token_num : 1024
[I][                            Init][ 312]: prefill_max_token_num : 1024
[I][                     load_config][ 282]: load config:
{
    "enable_repetition_penalty": false,
    "enable_temperature": true,
    "enable_top_k_sampling": true,
    "enable_top_p_sampling": false,
    "penalty_window": 20,
    "repetition_penalty": 1.2,
    "temperature": 0.9,
    "top_k": 10,
    "top_p": 0.8
}

[I][                            Init][ 321]: LLM init ok
Type "q" to exit, Ctrl+c to stop current running
prompt >> 你是谁
image >>
[I][                             Run][ 551]: input token num : 46, prefill_split_num : 1
[I][                             Run][ 566]: prefill grpid 2
[I][                             Run][ 593]: input_num_token:46
[I][                             Run][ 717]: ttft: 311.26 ms
你好!我是商汤科技开发的多模态大模型,英文名叫InternVL。很高兴为你服务!请问有什么可以帮助你的吗?

[N][                             Run][ 826]: hit eos,avg 10.69 token/s

prompt >> 描述一下这张图片
image >> examples/image_0.jpg
[I][                          Encode][ 415]: image encode time : 408.81 ms, size : 393216
[I][                          Encode][ 524]: idx:0 offset : 49 out_embed.size() : 477696
[I][                             Run][ 551]: input token num : 311, prefill_split_num : 3
[I][                             Run][ 566]: prefill grpid 4
[I][                             Run][ 593]: input_num_token:128
[I][                             Run][ 593]: input_num_token:128
[I][                             Run][ 593]: input_num_token:55
[I][                             Run][ 717]: ttft: 1325.82 ms
这张图片展示了一只可爱的红熊猫。红熊猫是一种生活在亚洲森林中的熊科动物,以其红棕色的毛皮和白脸而闻名。图片中的红熊猫正趴在木板上,身体的一部分探出木板,显得有些放松和好奇。它的眼睛圆圆的,黑色的,看起来非常可爱。毛皮主要是棕红色的,耳朵和腹部是白色的,形成了鲜明的对比。背景中可以看到一些树木和绿色的叶子,暗示这可能是在自然的森林环境中拍摄的。整体上,这张图片传达出一种温暖和亲近自然的感觉。

[N][                             Run][ 826]: hit eos,avg 10.70 token/s

prompt >> q
```