lihongjie commited on
Commit
ea06959
·
1 Parent(s): dda8dc3

添加视频理解

Browse files
.gitattributes CHANGED
@@ -39,3 +39,12 @@ images/ filter=lfs diff=lfs merge=lfs -text
39
  images/attractions filter=lfs diff=lfs merge=lfs -text
40
  Qwen2.5-VL-7B-Instruct-AX650-chunk_prefill_1280/model.embed_tokens.weight.float32.bin filter=lfs diff=lfs merge=lfs -text
41
  Qwen2.5-VL-7B-Instruct-AX650-chunk_prefill_1280/model.embed_tokens.weight.npy filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
39
  images/attractions filter=lfs diff=lfs merge=lfs -text
40
  Qwen2.5-VL-7B-Instruct-AX650-chunk_prefill_1280/model.embed_tokens.weight.float32.bin filter=lfs diff=lfs merge=lfs -text
41
  Qwen2.5-VL-7B-Instruct-AX650-chunk_prefill_1280/model.embed_tokens.weight.npy filter=lfs diff=lfs merge=lfs -text
42
+ Qwen2.5-VL-7B-Instruct-AX650-chunk_prefill_1280/Qwen2.5-VL-7B-Instruct_vision_video.axmodel filter=lfs diff=lfs merge=lfs -text
43
+ video/frame_0040.jpg filter=lfs diff=lfs merge=lfs -text
44
+ video/frame_0048.jpg filter=lfs diff=lfs merge=lfs -text
45
+ video/frame_0056.jpg filter=lfs diff=lfs merge=lfs -text
46
+ video/frame_0000.jpg filter=lfs diff=lfs merge=lfs -text
47
+ video/frame_0008.jpg filter=lfs diff=lfs merge=lfs -text
48
+ video/frame_0016.jpg filter=lfs diff=lfs merge=lfs -text
49
+ video/frame_0024.jpg filter=lfs diff=lfs merge=lfs -text
50
+ video/frame_0032.jpg filter=lfs diff=lfs merge=lfs -text
Qwen2.5-VL-7B-Instruct-AX650-chunk_prefill_1280/Qwen2.5-VL-7B-Instruct_vision_video.axmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:515bee1a5f016714ab231f78bd9b3c002a599c26006bde73a1bf7820142ead9c
3
+ size 749446691
README.md CHANGED
@@ -161,3 +161,81 @@ images/attractions/recoAll_attractions_4.jpg
161
 
162
  ```
163
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
161
 
162
  ```
163
 
164
+ #### Video understand demo
165
+
166
+ Please pre-process the image of the video file into a 308x308 size picture
167
+
168
+ ##### start tokenizer server for image understand demo
169
+
170
+ ```
171
+ python qwen2_tokenizer_video_308.py --port 12345
172
+ ```
173
+
174
+ ##### run video understand demo
175
+
176
+ ```
177
+ (base) axera@dell:~/lhj/Qwen2.5-VL-7B-Instruct$ bash run_qwen2_5vl_video.sh
178
+ [I][ Init][ 162]: LLM init start
179
+ [I][ Init][ 267]: IMAGE_CONTEXT_TOKEN: 151656, IMAGE_START_TOKEN: 151652
180
+ [I][ Init][ 328]: image encoder output float32
181
+
182
+ [I][ Init][ 340]: max_token_len : 2047
183
+ [I][ Init][ 343]: kv_cache_size : 512, kv_cache_num: 2047
184
+ [I][ Init][ 351]: prefill_token_num : 128
185
+ [I][ Init][ 355]: grp: 1, prefill_max_token_num : 1
186
+ [I][ Init][ 355]: grp: 2, prefill_max_token_num : 128
187
+ [I][ Init][ 355]: grp: 3, prefill_max_token_num : 256
188
+ [I][ Init][ 355]: grp: 4, prefill_max_token_num : 384
189
+ [I][ Init][ 355]: grp: 5, prefill_max_token_num : 512
190
+ [I][ Init][ 355]: grp: 6, prefill_max_token_num : 640
191
+ [I][ Init][ 355]: grp: 7, prefill_max_token_num : 768
192
+ [I][ Init][ 355]: grp: 8, prefill_max_token_num : 896
193
+ [I][ Init][ 355]: grp: 9, prefill_max_token_num : 1024
194
+ [I][ Init][ 355]: grp: 10, prefill_max_token_num : 1152
195
+ [I][ Init][ 355]: grp: 11, prefill_max_token_num : 1280
196
+ [I][ Init][ 359]: prefill_max_token_num : 1280
197
+ [I][ load_config][ 282]: load config:
198
+ {
199
+ "enable_repetition_penalty": false,
200
+ "enable_temperature": true,
201
+ "enable_top_k_sampling": true,
202
+ "enable_top_p_sampling": false,
203
+ "penalty_window": 30,
204
+ "repetition_penalty": 2,
205
+ "temperature": 0.1,
206
+ "top_k": 10,
207
+ "top_p": 0.8
208
+ }
209
+
210
+ [I][ Init][ 456]: LLM init ok
211
+ Type "q" to exit, Ctrl+c to stop current running
212
+ prompt >> 描述这个视频的内容
213
+ image >> video
214
+ video/frame_0000.jpg
215
+ video/frame_0008.jpg
216
+ video/frame_0016.jpg
217
+ video/frame_0024.jpg
218
+ video/frame_0032.jpg
219
+ video/frame_0040.jpg
220
+ video/frame_0048.jpg
221
+ video/frame_0056.jpg
222
+ [I][ Encode][ 528]: pixel_values,size:4
223
+ [I][ Encode][ 554]: image encode time : 1546.058960 ms, size : 4
224
+ [I][ Encode][ 596]: input_ids size:509
225
+ [I][ Encode][ 604]: offset 15
226
+ [I][ Encode][ 620]: img_embed.size:4, 433664
227
+ [I][ Encode][ 625]: offset:136
228
+ [I][ Encode][ 625]: offset:257
229
+ [I][ Encode][ 625]: offset:378
230
+ [I][ Encode][ 634]: out_embed size:1824256
231
+ [I][ Encode][ 636]: position_ids size:509
232
+ [I][ Run][ 655]: input token num : 509, prefill_split_num : 4
233
+ [I][ Run][ 689]: input_num_token:128
234
+ [I][ Run][ 689]: input_num_token:128
235
+ [I][ Run][ 689]: input_num_token:128
236
+ [I][ Run][ 689]: input_num_token:125
237
+ [I][ Run][ 826]: ttft: 5081.97 ms
238
+ 这张图片展示了两只土拨鼠在户外的山地环境中进行互动。它们似乎在进行一种类似打斗的行为,可能是在争夺领地或展示攻击性。背景是蓝天和山脉,环境看起来非常自然和开阔。土拨鼠的毛色主要是棕色和灰色,带有白色的斑纹。它们的姿势和动作显示出它们正在积极地互动。
239
+
240
+ [N][ Run][ 979]: hit eos,avg 2.08 token/s
241
+ ```
main_axcl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ae3a919e04631a954bb3fe7162d9ebf024ca32dccc960f3f1f6fc6bd7d84a326
3
  size 1893800
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:679931c70377d4bba0b3eb9a7e7be8b51289ad7ae23e96092bffd8019b1719ee
3
  size 1893800
qwen2_tokenizer_video_308.py ADDED
@@ -0,0 +1,243 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import AutoTokenizer, PreTrainedTokenizerFast
2
+ from transformers.tokenization_utils_base import AddedToken
3
+ from http.server import HTTPServer, BaseHTTPRequestHandler
4
+ import json
5
+ import argparse
6
+
7
+ def _prompt_split_image(
8
+ image_seq_len,
9
+ image_rows,
10
+ image_cols,
11
+ fake_token_around_image,
12
+ image_token,
13
+ global_img_token,
14
+ ):
15
+ """Prompt with expanded image tokens for when the image is split into patches."""
16
+ text_split_images = ""
17
+ for n_h in range(image_rows):
18
+ for n_w in range(image_cols):
19
+ text_split_images += (
20
+ f"{fake_token_around_image}"
21
+ + f"<row_{n_h + 1}_col_{n_w + 1}>"
22
+ + f"{image_token}" * image_seq_len
23
+ )
24
+ text_split_images += "\n"
25
+
26
+ text_split_images += (
27
+ f"\n{fake_token_around_image}"
28
+ + f"{global_img_token}"
29
+ + f"{image_token}" * image_seq_len
30
+ + f"{fake_token_around_image}"
31
+ )
32
+ return text_split_images
33
+
34
+
35
+ def _prompt_single_image(
36
+ image_seq_len, fake_token_around_image, image_token, global_img_token
37
+ ):
38
+ """Prompt with expanded image tokens for a single image."""
39
+ return (
40
+ f"{fake_token_around_image}"
41
+ + f"{global_img_token}"
42
+ + f"{image_token}" * image_seq_len
43
+ + f"{fake_token_around_image}"
44
+ )
45
+
46
+
47
+ def get_image_prompt_string(
48
+ image_rows,
49
+ image_cols,
50
+ image_seq_len,
51
+ fake_token_around_image,
52
+ image_token,
53
+ global_img_token,
54
+ ):
55
+ if image_rows == 0 and image_cols == 0:
56
+ return _prompt_single_image(
57
+ image_seq_len,
58
+ fake_token_around_image=fake_token_around_image,
59
+ image_token=image_token,
60
+ global_img_token=global_img_token,
61
+ )
62
+ return _prompt_split_image(
63
+ image_seq_len,
64
+ image_rows,
65
+ image_cols,
66
+ fake_token_around_image,
67
+ image_token,
68
+ global_img_token,
69
+ )
70
+
71
+ class Tokenizer_Http():
72
+
73
+ def __init__(self):
74
+
75
+ path = 'qwen2_5_vl_7b_tokenizer'
76
+ self.tokenizer = AutoTokenizer.from_pretrained(path,
77
+ trust_remote_code=True,
78
+ use_fast=False)
79
+
80
+ def encode(self, content):
81
+ text = [f'<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n{content}<|im_end|>\n<|im_start|>assistant\n']
82
+ input_ids = self.tokenizer(text)
83
+ return input_ids["input_ids"][0]
84
+
85
+ def encode_vpm(self, content="描述一下这个视频的内容"):
86
+
87
+ # official implementation
88
+ text = f"<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|video_pad|><|vision_end|>{content}<|im_end|>\n<|im_start|>assistant\n"
89
+
90
+
91
+ output_kwargs = {'text_kwargs': {'padding': True, 'return_tensors': 'pt'}, 'images_kwargs': {'return_tensors': 'pt'}, 'audio_kwargs': {'padding': True, 'return_tensors': 'pt'}, 'videos_kwargs': {'return_tensors': 'pt'}, 'common_kwargs': {'return_tensors': 'pt'}}
92
+
93
+ text_inputs = self.tokenizer(text, **output_kwargs["text_kwargs"])
94
+ return text_inputs["input_ids"].tolist()[0]
95
+
96
+ def decode(self, token_ids):
97
+ return self.tokenizer.decode(token_ids,
98
+ clean_up_tokenization_spaces=False)
99
+
100
+ @property
101
+ def bos_id(self):
102
+ return self.tokenizer.bos_token_id
103
+
104
+ @property
105
+ def eos_id(self):
106
+ return self.tokenizer.eos_token_id
107
+
108
+ @property
109
+ def bos_token(self):
110
+ return self.tokenizer.bos_token
111
+
112
+ @property
113
+ def eos_token(self):
114
+ return self.tokenizer.eos_token
115
+
116
+ @property
117
+ def img_start_token(self):
118
+ return self.tokenizer.encode("<|vision_start|>")[0]
119
+
120
+ @property
121
+ def img_context_token(self):
122
+ return self.tokenizer.encode("<|video_pad|>")[0]
123
+
124
+ tokenizer = Tokenizer_Http()
125
+
126
+ print(tokenizer.bos_id, tokenizer.bos_token, tokenizer.eos_id,
127
+ tokenizer.eos_token)
128
+ token_ids = tokenizer.encode_vpm()
129
+ # [151644, 8948, 198, 56568, 104625, 100633, 104455, 104800, 101101, 32022, 102022, 99602, 100013, 9370, 90286, 21287, 42140, 53772, 35243, 26288, 104949, 3837, 105205, 109641, 67916, 30698, 11, 54851, 46944, 115404, 42192, 99441, 100623, 48692, 100168, 110498, 1773, 151645, 151644, 872, 198,
130
+ # 151646,
131
+ # 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648, 151648,
132
+ # 151647,
133
+ # 198, 5501, 7512, 279, 2168, 19620, 13, 151645, 151644, 77091, 198]
134
+ # 118
135
+ print(token_ids)
136
+ print(len(token_ids))
137
+ token_ids = tokenizer.encode("hello world")
138
+ # [151644, 8948, 198, 56568, 104625, 100633, 104455, 104800, 101101, 32022, 102022, 99602, 100013, 9370, 90286, 21287, 42140, 53772, 35243, 26288, 104949, 3837, 105205, 109641, 67916, 30698, 11, 54851, 46944, 115404, 42192, 99441, 100623, 48692, 100168, 110498, 1773, 151645, 151644, 872, 198, 14990, 1879, 151645, 151644, 77091, 198]
139
+ # 47
140
+ print(token_ids)
141
+ print(len(token_ids))
142
+
143
+
144
+ class Request(BaseHTTPRequestHandler):
145
+ #通过类继承,新定义类
146
+ timeout = 5
147
+ server_version = 'Apache'
148
+
149
+ def do_GET(self):
150
+ print(self.path)
151
+ #在新类中定义get的内容(当客户端向该服务端使用get请求时,本服务端将如下运行)
152
+ self.send_response(200)
153
+ self.send_header("type", "get") #设置响应头,可省略或设置多个
154
+ self.end_headers()
155
+
156
+ if self.path == '/bos_id':
157
+ bos_id = tokenizer.bos_id
158
+ # print(bos_id)
159
+ # to json
160
+ if bos_id is None:
161
+ msg = json.dumps({'bos_id': -1})
162
+ else:
163
+ msg = json.dumps({'bos_id': bos_id})
164
+ elif self.path == '/eos_id':
165
+ eos_id = tokenizer.eos_id
166
+ if eos_id is None:
167
+ msg = json.dumps({'eos_id': -1})
168
+ else:
169
+ msg = json.dumps({'eos_id': eos_id})
170
+ elif self.path == '/img_start_token':
171
+ img_start_token = tokenizer.img_start_token
172
+ if img_start_token is None:
173
+ msg = json.dumps({'img_start_token': -1})
174
+ else:
175
+ msg = json.dumps({'img_start_token': img_start_token})
176
+ elif self.path == '/img_context_token':
177
+ img_context_token = tokenizer.img_context_token
178
+ if img_context_token is None:
179
+ msg = json.dumps({'img_context_token': -1})
180
+ else:
181
+ msg = json.dumps({'img_context_token': img_context_token})
182
+ else:
183
+ msg = 'error'
184
+
185
+ print(msg)
186
+ msg = str(msg).encode() #转为str再转为byte格式
187
+
188
+ self.wfile.write(msg) #将byte格式的信息返回给客户端
189
+
190
+ def do_POST(self):
191
+ #在新类中定义post的内容(当客户端向该服务端使用post请求时,本服务端将如下运行)
192
+ data = self.rfile.read(int(
193
+ self.headers['content-length'])) #获取从客户端传入的参数(byte格式)
194
+ data = data.decode() #将byte格式转为str格式
195
+
196
+ self.send_response(200)
197
+ self.send_header("type", "post") #设置响应头,可省略或设置多个
198
+ self.end_headers()
199
+
200
+ if self.path == '/encode':
201
+ req = json.loads(data)
202
+ print(req)
203
+ prompt = req['text']
204
+ b_img_prompt = False
205
+ if 'img_prompt' in req:
206
+ b_img_prompt = req['img_prompt']
207
+ if b_img_prompt:
208
+ token_ids = tokenizer.encode_vpm(prompt)
209
+ else:
210
+ token_ids = tokenizer.encode(prompt)
211
+
212
+ if token_ids is None:
213
+ msg = json.dumps({'token_ids': -1})
214
+ else:
215
+ msg = json.dumps({'token_ids': token_ids})
216
+
217
+ elif self.path == '/decode':
218
+ req = json.loads(data)
219
+ token_ids = req['token_ids']
220
+ text = tokenizer.decode(token_ids)
221
+ if text is None:
222
+ msg = json.dumps({'text': ""})
223
+ else:
224
+ msg = json.dumps({'text': text})
225
+ else:
226
+ msg = 'error'
227
+ print(msg)
228
+ msg = str(msg).encode() #转为str再转为byte格式
229
+
230
+ self.wfile.write(msg) #将byte格式的信息返回给客户端
231
+
232
+
233
+ if __name__ == "__main__":
234
+
235
+ args = argparse.ArgumentParser()
236
+ args.add_argument('--host', type=str, default='localhost')
237
+ args.add_argument('--port', type=int, default=8080)
238
+ args = args.parse_args()
239
+
240
+ host = (args.host, args.port) #设定地址与端口号,'localhost'等价于'127.0.0.1'
241
+ print('http://%s:%s' % host)
242
+ server = HTTPServer(host, Request) #根据地址端口号和新定义的类,创建服务器实例
243
+ server.serve_forever() #开启服务
run_qwen2_5vl_image.sh CHANGED
@@ -5,7 +5,7 @@ AXMODEL_DIR=./Qwen2.5-VL-7B-Instruct-AX650-chunk_prefill_1280
5
  --axmodel_num 28 \
6
  --filename_image_encoder_axmodedl "${AXMODEL_DIR}/Qwen2.5-VL-7B-Instruct_vision.axmodel" \
7
  --use_mmap_load_embed 1 \
8
- --filename_tokenizer_model "http://10.122.86.184:8091" \
9
  --filename_post_axmodel "${AXMODEL_DIR}/qwen2_5_vl_post.axmodel" \
10
  --filename_tokens_embed "${AXMODEL_DIR}/model.embed_tokens.weight.bfloat16.bin" \
11
  --tokens_embed_num 152064 \
@@ -20,4 +20,4 @@ AXMODEL_DIR=./Qwen2.5-VL-7B-Instruct-AX650-chunk_prefill_1280
20
 
21
 
22
  # What are these attractions? Please give their names in Chinese and English
23
- # assets/attractions
 
5
  --axmodel_num 28 \
6
  --filename_image_encoder_axmodedl "${AXMODEL_DIR}/Qwen2.5-VL-7B-Instruct_vision.axmodel" \
7
  --use_mmap_load_embed 1 \
8
+ --filename_tokenizer_model "http://127.0.0.1:8091" \
9
  --filename_post_axmodel "${AXMODEL_DIR}/qwen2_5_vl_post.axmodel" \
10
  --filename_tokens_embed "${AXMODEL_DIR}/model.embed_tokens.weight.bfloat16.bin" \
11
  --tokens_embed_num 152064 \
 
20
 
21
 
22
  # What are these attractions? Please give their names in Chinese and English
23
+ # images/attractions
run_qwen2_5vl_video.sh ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ AXMODEL_DIR=./Qwen2.5-VL-7B-Instruct-AX650-chunk_prefill_1280
2
+
3
+ ./main_axcl \
4
+ --template_filename_axmodel "${AXMODEL_DIR}/qwen2_5_vl_p128_l%d_together.axmodel" \
5
+ --axmodel_num 28 \
6
+ --filename_image_encoder_axmodedl "${AXMODEL_DIR}/Qwen2.5-VL-7B-Instruct_vision_video.axmodel" \
7
+ --use_mmap_load_embed 1 \
8
+ --filename_tokenizer_model "http://127.0.0.1:8090" \
9
+ --filename_post_axmodel "${AXMODEL_DIR}/qwen2_5_vl_post.axmodel" \
10
+ --filename_tokens_embed "${AXMODEL_DIR}/model.embed_tokens.weight.bfloat16.bin" \
11
+ --tokens_embed_num 152064 \
12
+ --tokens_embed_size 3584 \
13
+ --live_print 1 \
14
+ --video 1 \
15
+ --img_width 308 \
16
+ --img_height 308 \
17
+ --vision_start_token_id 151652 \
18
+ --post_config_path post_config.json \
19
+ --devices 0,1,2,3,4,5,6,7
20
+
video/frame_0000.jpg ADDED

Git LFS Details

  • SHA256: d0cea2769fd052ce3b24c3982a17135dbffd600cd612014c3cffe014c0224ffa
  • Pointer size: 130 Bytes
  • Size of remote file: 54.1 kB
video/frame_0008.jpg ADDED

Git LFS Details

  • SHA256: c812aed3407b41d474d859fedd4d9eaab971482e1dd0e22c5da16a627a740394
  • Pointer size: 130 Bytes
  • Size of remote file: 52.7 kB
video/frame_0016.jpg ADDED

Git LFS Details

  • SHA256: 3cc72377820bd9c47a41ebcae744acd8b3952b54e02854a9cf0b4a70e49def60
  • Pointer size: 130 Bytes
  • Size of remote file: 48.9 kB
video/frame_0024.jpg ADDED

Git LFS Details

  • SHA256: afee75df68ffda9f5ae59b0ba3badf29e56a60acce64554ecc9e49f20854c47c
  • Pointer size: 130 Bytes
  • Size of remote file: 49.2 kB
video/frame_0032.jpg ADDED

Git LFS Details

  • SHA256: 1cea98a54747fb32c1bf7375aae020b3703ee70da6eb967d1a7d590d9f997038
  • Pointer size: 130 Bytes
  • Size of remote file: 49.1 kB
video/frame_0040.jpg ADDED

Git LFS Details

  • SHA256: dc03d027d92549acc1164f01b8450623093b76e3945b9c2eaaa7f0073b827cf5
  • Pointer size: 130 Bytes
  • Size of remote file: 45.5 kB
video/frame_0048.jpg ADDED

Git LFS Details

  • SHA256: f1832ad904c7d25423b1389c769a2287815d1b62acee9474403caa18069d7c52
  • Pointer size: 130 Bytes
  • Size of remote file: 44.9 kB
video/frame_0056.jpg ADDED

Git LFS Details

  • SHA256: 2a000328952b1f092c438f687a55dfaeb822d763b68eb3685ee403c9859d5ebd
  • Pointer size: 130 Bytes
  • Size of remote file: 42.8 kB