yongqiang commited on
Commit
138454d
·
1 Parent(s): 964caa9

add internvl3-5 c++ demo with Xet storage

Browse files
.gitattributes CHANGED
@@ -42,3 +42,5 @@ main_axcl_x86 filter=lfs diff=lfs merge=lfs -text
42
  *.jpg filter=lfs diff=lfs merge=lfs -text
43
  *.mp4 filter=lfs diff=lfs merge=lfs -text
44
  internvl3-5_tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
 
42
  *.jpg filter=lfs diff=lfs merge=lfs -text
43
  *.mp4 filter=lfs diff=lfs merge=lfs -text
44
  internvl3-5_tokenizer/tokenizer.json filter=lfs diff=lfs merge=lfs -text
45
+ main filter=lfs diff=lfs merge=lfs -text
46
+ main_api filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -78,6 +78,85 @@ pip install transformers==4.57.1
78
 
79
  #### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650 DEMO Board
80
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
  Interactive conversations using the `Gradio API`:
82
 
83
  ```bash
 
78
 
79
  #### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650 DEMO Board
80
 
81
+
82
+ Interactive conversations using the `C++ Demo`:
83
+
84
+ ```sh
85
+ ./run_internvl_3-5_1b_448_ax650.sh
86
+ ```
87
+
88
+ The log information is as follows:
89
+
90
+ ```bash
91
+ root@ax650 ~/yongqiang/push_hugging_face/InternVL3_5-1B_GPTQ_INT4 # ./run_internvl_3-5_1b_448_ax650.sh
92
+ [I][ Init][ 135]: LLM init start
93
+ [I][ Init][ 137]: Total CMM:7915 MB
94
+ tokenizer_type = 3
95
+ 3% | ██ | 1 / 31 [0.71s<21.92s, 1.41 count/s] tokenizer init ok[I][ Init][ 26]: LLaMaEmbedSelector use mmap
96
+ 6% | ███ | 2 / 31 [0.71s<11.05s, 2.81 count/s] embed_selector init ok[I][ Init][ 182]: attr.axmodel_num:28
97
+ 100% | ████████████████████████████████ | 31 / 31 [2.06s<2.06s, 15.03 count/s] init post axmodel ok,remain_cmm(6940 MB)[I][ Init][ 240]: image encoder feature outputs:0
98
+ 103% | ██████████████████████████████████ | 32 / 31 [2.32s<2.25s, 13.79 count/s] init vpm axmodel ok,remain_cmm(6588 MB)[I][ Init][ 280]: image encoder input nhwc@uint8
99
+ [I][ Init][ 305]: image encoder output float32
100
+
101
+ [I][ Init][ 335]: max_token_len : 2047
102
+ [I][ Init][ 340]: kv_cache_size : 1024, kv_cache_num: 2047
103
+ [I][ Init][ 348]: prefill_token_num : 128
104
+ [I][ Init][ 352]: grp: 1, prefill_max_token_num : 1
105
+ [I][ Init][ 352]: grp: 2, prefill_max_token_num : 128
106
+ [I][ Init][ 352]: grp: 3, prefill_max_token_num : 256
107
+ [I][ Init][ 352]: grp: 4, prefill_max_token_num : 384
108
+ [I][ Init][ 352]: grp: 5, prefill_max_token_num : 512
109
+ [I][ Init][ 352]: grp: 6, prefill_max_token_num : 640
110
+ [I][ Init][ 352]: grp: 7, prefill_max_token_num : 768
111
+ [I][ Init][ 352]: grp: 8, prefill_max_token_num : 896
112
+ [I][ Init][ 352]: grp: 9, prefill_max_token_num : 1024
113
+ [I][ Init][ 356]: prefill_max_token_num : 1024
114
+ [I][ load_config][ 281]: load config:
115
+ {
116
+ "enable_repetition_penalty": true,
117
+ "enable_temperature": true,
118
+ "enable_top_k_sampling": true,
119
+ "enable_top_p_sampling": false,
120
+ "penalty_window": 30,
121
+ "repetition_penalty": 1.2,
122
+ "temperature": 0.7,
123
+ "top_k": 10,
124
+ "top_p": 0.9
125
+ }
126
+
127
+ [I][ Init][ 373]: LLM init ok
128
+ [I][ Init][ 375]: Left CMM:6588 MB
129
+ Type "q" to exit, Ctrl+c to stop current running
130
+ prompt(输入q退出) >> 介绍一下你自己
131
+ image(回车键跳过) >>
132
+ [I][ Run][ 713]: input token num : 21, prefill_split_num : 1
133
+ [I][ Run][ 747]: input_num_token:21
134
+ [I][ Run][ 976]: ttft: 83.79 ms
135
+ 我被称为"语言模型-1.0",来自上海人工智能实验室。我的开发团队致力于为用户提供高效、准确和个性化的AI服务。作为一款先进的自然语言处理(NLP)模型,我旨在帮助用户解决各种语言相关问题,并提供有用的信息和建议。我的设计目标是能够以自然流畅的方式与人类进行交互,无论是回答问题、提供建议还是执行任务。
136
+
137
+ [N][ Run][1102]: hit eos,avg 19.79 token/s
138
+
139
+ prompt(输入q退出) >> 请你详细描述下面这幅图
140
+ image(回车键跳过) >> assets/image_1.jpg
141
+ [I][ EncodeImage][ 481]: image encode time : 408.467987 ms, size : 1
142
+ [I][ Encode][ 636]: input_ids size:284
143
+ [I][ Encode][ 644]: offset 15
144
+ [I][ Encode][ 673]: img_embed.size:1, 262144
145
+ [I][ Encode][ 689]: out_embed size:290816
146
+ [I][ Encode][ 690]: input_ids size 284
147
+ [I][ Encode][ 692]: position_ids size:284
148
+ [I][ Run][ 713]: input token num : 284, prefill_split_num : 3
149
+ [I][ Run][ 747]: input_num_token:128
150
+ [I][ Run][ 747]: input_num_token:128
151
+ [I][ Run][ 747]: input_num_token:28
152
+ [I][ Run][ 976]: ttft: 270.76 ms
153
+ 这是一幅生动的图片,展示了一只大熊猫正在自然环境中觅食的情景。画面中,大熊猫正低头在植物丛中寻找食物。它的毛发呈白色,背部和腹部有黑色斑点。周围绿意盎然,各种灌木和植物环绕着它,显得生机勃勃。背景的木质结构可能是一把竹竿或长椅��进一步暗示这可能是动物园或野生动物保护区。整个场景充满了自然的气息,让人感受到大自然的可爱与生机。
154
+
155
+ [N][ Run][1102]: hit eos,avg 19.86 token/s
156
+
157
+ prompt(输入q退出) >>
158
+ ```
159
+
160
  Interactive conversations using the `Gradio API`:
161
 
162
  ```bash
assets/image_1.jpg ADDED

Git LFS Details

  • SHA256: 08487494b8dc08d44bc36491adf3ab89ff30d13a3122da86f3cd67cad89eeee8
  • Pointer size: 131 Bytes
  • Size of remote file: 126 kB
internvl3-5-1b_tokenizer.txt ADDED
The diff for this file is too large to render. See raw diff
 
main ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:398f459f13ef57ca361ebc356cae1c51420175c66dc3e0b5431a3696d1554022
3
+ size 6804064
main_api ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c996e8093a3411655b8c3cfd45f340a94546ab2ebeb3cf2b0e2cc9150d58dd18
3
+ size 6938952
post_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "enable_temperature" : true,
3
+ "temperature" : 0.7,
4
+
5
+ "enable_repetition_penalty" : true,
6
+ "repetition_penalty" : 1.2,
7
+ "penalty_window" : 30,
8
+
9
+ "enable_top_p_sampling" : false,
10
+ "top_p" : 0.9,
11
+
12
+ "enable_top_k_sampling" : true,
13
+ "top_k" : 10
14
+ }
run_internvl_3-5_1b_448_ax650.sh ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AXMODEL_DIR=./internvl3-5_axmodel/
2
+
3
+ ./main \
4
+ --template_filename_axmodel "${AXMODEL_DIR}qwen3_p128_l%d_together.axmodel" \
5
+ --axmodel_num 28 \
6
+ --filename_image_encoder_axmodedl "./vit-models/internvl_vit_model_1x448x448x3.axmodel" \
7
+ --bos 0 --eos 0 \
8
+ --dynamic_load_axmodel_layer 0 \
9
+ --use_mmap_load_embed 1 \
10
+ --filename_tokenizer_model "internvl3-5-1b_tokenizer.txt" \
11
+ --filename_post_axmodel "${AXMODEL_DIR}/qwen3_post.axmodel" \
12
+ --use_topk 0 \
13
+ --filename_tokens_embed "${AXMODEL_DIR}/model.embed_tokens.weight.bfloat16.bin" \
14
+ --tokens_embed_num 151936 \
15
+ --tokens_embed_size 1024 \
16
+ --patch_size 14 \
17
+ --use_mrope 0 \
18
+ --temporal_patch_size 1 \
19
+ --live_print 1 \
20
+ --continue 1 \
21
+ --video 0 \
22
+ --img_width 448 \
23
+ --img_height 448 \
24
+ --vision_start_token_id 151652 \
25
+ --use_mrope 0 \
26
+ --post_config_path post_config.json
vit-models/internvl_vit_model_1x448x448x3.axmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb52f267bfeb722a12f34a4750bc85f933fe6a224a7a3e95ff2d581fd50bd330
3
+ size 364894240