qqc1989 commited on
Commit
d967363
·
verified ·
1 Parent(s): 0f72562

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -53
README.md CHANGED
@@ -40,14 +40,14 @@ https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct
40
  - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
41
 
42
  **Image Process**
43
- |Chips| input size | image num | image encoder | ttft(320 tokens) | w8a16 | DDR | Flash |
44
  |--|--|--|--|--|--|--|--|
45
- |AX650| 448*448 | 1 | 780 ms | 2857 ms | 6.2 tokens/sec| 4.3 GiB | 4.6 GiB |
46
 
47
  **Video Process**
48
  |Chips| input size | image num | image encoder |ttft(512 tokens) | w8a16 | DDR | Flash |
49
  |--|--|--|--|--|--|--|--|
50
- |AX650| 308*308 | 8 | 1400 ms | 5400 ms | 6.1 tokens/sec| 4.4 GiB | 4.7 GiB |
51
 
52
  The DDR capacity refers to the CMM memory that needs to be consumed. Ensure that the CMM memory allocation on the development board is greater than this value.
53
 
@@ -141,65 +141,90 @@ python3 qwen2_tokenizer_images.py --port 12345
141
  ![](./image/ssd_car.jpg)
142
 
143
  ```
144
- root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# ./run_qwen2_5_vl_image.sh
145
- [I][ Init][ 129]: LLM init start
146
- bos_id: -1, eos_id: 151645
147
- 2% | █ | 1 / 40 [0.01s<0.24s, 166.67 count/s] tokenizer init ok
 
148
  [I][ Init][ 26]: LLaMaEmbedSelector use mmap
149
- 100% | ████████████████████████████████ | 40 / 40 [38.23s<38.23s, 1.05 count/s] init vpm axmodel ok,remain_cmm(7600 MB)
150
- [I][ Init][ 277]: max_token_len : 1023
151
- [I][ Init][ 282]: kv_cache_size : 256, kv_cache_num: 1023
152
- [I][ Init][ 290]: prefill_token_num : 320
153
- [I][ Init][ 292]: vpm_height : 1024,vpm_width : 392
154
- [I][ Init][ 301]: LLM init ok
 
 
 
 
 
 
 
 
 
 
 
 
155
  Type "q" to exit, Ctrl+c to stop current running
156
-
157
- prompt >> who are you?
158
- image >>
159
- [I][ Run][ 638]: ttft: 2854.47 ms
160
- I am a large language model created by Alibaba Cloud. I am called Qwen.
161
-
162
- [N][ Run][ 779]: hit eos,avg 6.05 token/s
163
-
164
- prompt >> 描述下图片
165
  image >> image/ssd_car.jpg
166
- [I][ Encode][ 416]: image encode time : 795.614014 ms, size : 524288
167
- [I][ Run][ 638]: ttft: 2856.88 ms
168
- 这张图片展示了一条繁忙的城市街道。前景中,一名女子站在人行道上,她穿着黑色外套,面带微笑。她旁边是一辆红色的双层巴士,巴士上有一个广告,
169
- 上面写着“THINGS GET MORE EXITING WHEN YOU SAY ‘YES’”。巴士的车牌号是“L15”。巴士旁边停着一辆黑色的小型货车。背景中可以看到一些商店和行人,
170
- 街道两旁的建筑物是现代的玻璃幕墙建筑。整体氛围显得繁忙而充满活力。
171
-
172
- [N][ Run][ 779]: hit eos,avg 5.96 token/s
 
 
 
 
 
 
 
 
 
 
 
 
 
 
173
  ```
174
 
175
  #### Video understand demo
176
 
177
  Please pre-process the image of the video file into a 308x308 size picture
178
 
179
- ##### start tokenizer server for image understand demo
180
-
181
- ```
182
- python qwen2_tokenizer_video_308.py --port 12345
183
- ```
184
-
185
  ##### run image understand demo
186
 
187
  ```
188
- root@ax650:/mnt/qtang/llm-test/qwen2.5-vl-3b# ./run_qwen2_5_vl_video.sh
189
- [I][ Init][ 129]: LLM init start
190
- bos_id: -1, eos_id: 151645
191
- 2% | █ | 1 / 40 [0.00s<0.12s, 333.33 count/s] tokenizer init ok
 
192
  [I][ Init][ 26]: LLaMaEmbedSelector use mmap
193
- 100% | ████████████████████████████████ | 40 / 40 [40.05s<40.05s, 1.00 count/s] init vpm axmodel ok,remain_cmm(7680 MB)
194
- [I][ Init][ 277]: max_token_len : 1023
195
- [I][ Init][ 282]: kv_cache_size : 256, kv_cache_num: 1023
196
- [I][ Init][ 290]: prefill_token_num : 512
197
- [I][ Init][ 292]: vpm_height : 484,vpm_width : 392
198
- [I][ Init][ 301]: LLM init ok
 
 
 
 
 
 
 
 
 
 
 
 
199
  Type "q" to exit, Ctrl+c to stop current running
200
-
201
- prompt >> 描述下视频
202
- image >> video
203
  video/frame_0000.jpg
204
  video/frame_0008.jpg
205
  video/frame_0016.jpg
@@ -208,9 +233,29 @@ video/frame_0032.jpg
208
  video/frame_0040.jpg
209
  video/frame_0048.jpg
210
  video/frame_0056.jpg
211
- [I][ Encode][ 416]: image encode time : 1487.557007 ms, size : 991232
212
- [I][ Run][ 638]: ttft: 5488.29 ms
213
- 视频展示了两只松鼠在户外的场景。背景是模糊的山脉和蓝天,前景中有松鼠在互动。松鼠的毛色主要是棕色和白色,它们的爪子是橙色的。松鼠似乎在互相玩耍或争抢,它们的爪子和嘴巴都伸向对方。整个场景显得非常自然和生动。
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
214
  ```
215
 
216
  #### Inference with M.2 Accelerator card
@@ -269,7 +314,10 @@ image >> image/ssd_car.jpg
269
  [I][ Run][ 659]: input_num_token:128
270
  [I][ Run][ 659]: input_num_token:24
271
  [I][ Run][ 796]: ttft: 2067.18 ms
272
- 这张图片展示了一条繁忙的城市街道。前景中,一名女子站在人行道上,穿着黑色外套,面带微笑。她旁边是一辆红色的双层巴士,巴士上有一个广告,上面写着“THINGS GET MORE EXITING WHEN YOU SAY ‘YES’ VirginMoney.co.uk”。巴士的车牌号是“L15”。巴士旁边停着一辆黑色的面包车。背景中可以看到一些商店和行人,街道两旁有路灯和商店的招牌。整体环境显得非常繁忙和现代。
 
 
 
273
 
274
  [N][ Run][ 949]: hit eos,avg 4.12 token/s
275
  ```
@@ -328,7 +376,9 @@ video/frame_0056.jpg
328
  [I][ Run][ 659]: input_num_token:128
329
  [I][ Run][ 659]: input_num_token:125
330
  [I][ Run][ 796]: ttft: 3049.59 ms
331
- 视频展示了两只松鼠在户外的场景。背景是模糊的山脉和蓝天,前景中有松鼠在互动。松鼠的毛色是棕色和灰色的混合,它们的爪子是橙色的。松鼠似乎在互相玩耍或争抢,它们的爪子和嘴巴都伸向对方。整个场景显得非常自然和生动。
 
 
332
 
333
  [N][ Run][ 949]: hit eos,avg 4.15 token/s
334
  ```
 
40
  - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
41
 
42
  **Image Process**
43
+ |Chips| input size | image num | image encoder | ttft(384 tokens) | w8a16 | DDR | Flash |
44
  |--|--|--|--|--|--|--|--|
45
+ |AX650| 448*448 | 1 | 780 ms | 1651 ms | 5.9 tokens/sec| 4.3 GiB | 4.6 GiB |
46
 
47
  **Video Process**
48
  |Chips| input size | image num | image encoder |ttft(512 tokens) | w8a16 | DDR | Flash |
49
  |--|--|--|--|--|--|--|--|
50
+ |AX650| 308*308 | 8 | 1400 ms | 2455 ms | 5.9 tokens/sec| 4.4 GiB | 4.7 GiB |
51
 
52
  The DDR capacity refers to the CMM memory that needs to be consumed. Ensure that the CMM memory allocation on the development board is greater than this value.
53
 
 
141
  ![](./image/ssd_car.jpg)
142
 
143
  ```
144
+ (base) root@ax650:~/AXERA-TECH/Qwen2.5-VL-3B-Instruct# ./run_qwen2_5_vl_image.sh
145
+ [I][ Init][ 134]: LLM init start
146
+ [I][ Init][ 136]: Total CMM:7478 MB
147
+ tokenizer_type = 1
148
+ 2% | █ | 1 / 39 [0.31s<12.21s, 3.19 count/s] tokenizer init ok
149
  [I][ Init][ 26]: LLaMaEmbedSelector use mmap
150
+ 5% | ██ | 2 / 39 [0.31s<6.10s, 6.39 count/s] embed_selector init ok
151
+ [I][ Init][ 181]: attr.axmodel_num:36
152
+ 102% | █████████████████████████████████ | 40 / 39 [17.30s<16.86s, 2.31 count/s] init vpm axmodel ok,remain_cmm(2939 MB)
153
+ [I][ Init][ 287]: image encoder output float32
154
+
155
+ [I][ Init][ 317]: max_token_len : 1023
156
+ [I][ Init][ 322]: kv_cache_size : 256, kv_cache_num: 1023
157
+ [I][ Init][ 330]: prefill_token_num : 128
158
+ [I][ Init][ 334]: grp: 1, prefill_max_token_num : 1
159
+ [I][ Init][ 334]: grp: 2, prefill_max_token_num : 128
160
+ [I][ Init][ 334]: grp: 3, prefill_max_token_num : 256
161
+ [I][ Init][ 334]: grp: 4, prefill_max_token_num : 384
162
+ [I][ Init][ 334]: grp: 5, prefill_max_token_num : 512
163
+ [I][ Init][ 338]: prefill_max_token_num : 512
164
+ [E][ load_config][ 277]: config file(post_config.json) open failed
165
+ [W][ Init][ 351]: load postprocess config(post_config.json) failed
166
+ [I][ Init][ 355]: LLM init ok
167
+ [I][ Init][ 357]: Left CMM:2939 MB
168
  Type "q" to exit, Ctrl+c to stop current running
169
+ prompt >> what in the images?
 
 
 
 
 
 
 
 
170
  image >> image/ssd_car.jpg
171
+ [I][ EncodeImage][ 432]: pixel_values size 1
172
+ [I][ EncodeImage][ 433]: grid_h 32 grid_w 32
173
+ [I][ EncodeImage][ 460]: image encode time : 781.932983 ms, size : 1
174
+ [I][ Encode][ 513]: input_ids size:282
175
+ [I][ Encode][ 521]: offset 15
176
+ [I][ Encode][ 537]: img_embed.size:1, 524288
177
+ [I][ Encode][ 553]: out_embed size:577536
178
+ [I][ Encode][ 554]: input_ids size 282
179
+ [I][ Encode][ 556]: position_ids size:282
180
+ [I][ Run][ 575]: input token num : 282, prefill_split_num : 3
181
+ [I][ Run][ 609]: input_num_token:128
182
+ [I][ Run][ 609]: input_num_token:128
183
+ [I][ Run][ 609]: input_num_token:26
184
+ [I][ Run][ 798]: ttft: 1651.51 ms
185
+
186
+ The image shows a red double-decker bus on a city street. The bus has an advertisement on its side that reads,
187
+ "THINGS GET MORE EXITING WHEN YOU SAY 'YES' VirginMoney.co.uk." The bus is parked on the side of the road,
188
+ and there is a person standing next to it. The background features a building with large windows and a few pedestrians walking on the sidewalk.
189
+ The street appears to be in an urban area, possibly in a city like London.
190
+
191
+ [N][ Run][ 924]: hit eos,avg 5.83 token/s
192
  ```
193
 
194
  #### Video understand demo
195
 
196
  Please pre-process the image of the video file into a 308x308 size picture
197
 
 
 
 
 
 
 
198
  ##### run image understand demo
199
 
200
  ```
201
+ (base) root@ax650:~/AXERA-TECH/Qwen2.5-VL-3B-Instruct# ./run_qwen2_5_vl_video.sh
202
+ [I][ Init][ 134]: LLM init start
203
+ [I][ Init][ 136]: Total CMM:7478 MB
204
+ tokenizer_type = 1
205
+ 2% | █ | 1 / 39 [0.32s<12.36s, 3.15 count/s] tokenizer init ok
206
  [I][ Init][ 26]: LLaMaEmbedSelector use mmap
207
+ 5% | ██ | 2 / 39 [0.32s<6.20s, 6.29 count/s] embed_selector init ok
208
+ [I][ Init][ 181]: attr.axmodel_num:36
209
+ 102% | █████████████████████████████████ | 40 / 39 [17.79s<17.35s, 2.25 count/s] init vpm axmodel ok,remain_cmm(3094 MB)
210
+ [I][ Init][ 287]: image encoder output float32
211
+
212
+ [I][ Init][ 317]: max_token_len : 1023
213
+ [I][ Init][ 322]: kv_cache_size : 256, kv_cache_num: 1023
214
+ [I][ Init][ 330]: prefill_token_num : 128
215
+ [I][ Init][ 334]: grp: 1, prefill_max_token_num : 1
216
+ [I][ Init][ 334]: grp: 2, prefill_max_token_num : 128
217
+ [I][ Init][ 334]: grp: 3, prefill_max_token_num : 256
218
+ [I][ Init][ 334]: grp: 4, prefill_max_token_num : 384
219
+ [I][ Init][ 334]: grp: 5, prefill_max_token_num : 512
220
+ [I][ Init][ 338]: prefill_max_token_num : 512
221
+ [E][ load_config][ 277]: config file(post_config.json) open failed
222
+ [W][ Init][ 351]: load postprocess config(post_config.json) failed
223
+ [I][ Init][ 355]: LLM init ok
224
+ [I][ Init][ 357]: Left CMM:3094 MB
225
  Type "q" to exit, Ctrl+c to stop current running
226
+ prompt >> what is this?
227
+ video >> video
 
228
  video/frame_0000.jpg
229
  video/frame_0008.jpg
230
  video/frame_0016.jpg
 
233
  video/frame_0040.jpg
234
  video/frame_0048.jpg
235
  video/frame_0056.jpg
236
+ [I][ EncodeImage][ 432]: pixel_values size 4
237
+ [I][ EncodeImage][ 433]: grid_h 22 grid_w 22
238
+ [I][ EncodeImage][ 460]: image encode time : 1484.067993 ms, size : 4
239
+ [I][ Encode][ 513]: input_ids size:509
240
+ [I][ Encode][ 521]: offset 15
241
+ [I][ Encode][ 537]: img_embed.size:4, 247808
242
+ [I][ Encode][ 544]: offset:136
243
+ [I][ Encode][ 544]: offset:257
244
+ [I][ Encode][ 544]: offset:378
245
+ [I][ Encode][ 553]: out_embed size:1042432
246
+ [I][ Encode][ 554]: input_ids size 509
247
+ [I][ Encode][ 556]: position_ids size:509
248
+ [I][ Run][ 575]: input token num : 509, prefill_split_num : 4
249
+ [I][ Run][ 609]: input_num_token:128
250
+ [I][ Run][ 609]: input_num_token:128
251
+ [I][ Run][ 609]: input_num_token:128
252
+ [I][ Run][ 609]: input_num_token:125
253
+ [I][ Run][ 798]: ttft: 2455.20 ms
254
+
255
+ This image shows two ground squirrels, also known as marmots, engaging in a playful interaction.
256
+ They are standing on their hind legs and appear to be playfully biting or nipping at each other. The background features a scenic mountain landscape with a clear blue sky.
257
+
258
+ [N][ Run][ 924]: hit eos,avg 5.82 token/s
259
  ```
260
 
261
  #### Inference with M.2 Accelerator card
 
314
  [I][ Run][ 659]: input_num_token:128
315
  [I][ Run][ 659]: input_num_token:24
316
  [I][ Run][ 796]: ttft: 2067.18 ms
317
+
318
+ 这张图片展示了一条繁忙的城市街道。前景中,一名女子站在人行道上,穿着黑色外套,面带微笑。她旁边是一辆红色的双层巴士,
319
+ 巴士上有一个广告,上面写着“THINGS GET MORE EXITING WHEN YOU SAY ‘YES’ VirginMoney.co.uk”。巴士的车牌号是“L15”。
320
+ 巴士旁边停着一辆黑色的面包车。背景中可以看到一些商店和行人,街道两旁有路灯和商店的招牌。整体环境显得非常繁忙和现代。
321
 
322
  [N][ Run][ 949]: hit eos,avg 4.12 token/s
323
  ```
 
376
  [I][ Run][ 659]: input_num_token:128
377
  [I][ Run][ 659]: input_num_token:125
378
  [I][ Run][ 796]: ttft: 3049.59 ms
379
+
380
+ 视频展示了两只松鼠在户外的场景。背景是模糊的山脉和蓝天,前景中有松鼠在互动。松鼠的毛色是棕色和灰色的混合,它们的爪子是橙色的。松鼠似乎在互相玩耍或争抢,
381
+ 它们的爪子和嘴巴都伸向对方。整个场景显得非常自然和生动。
382
 
383
  [N][ Run][ 949]: hit eos,avg 4.15 token/s
384
  ```