qqc1989 commited on
Commit
bede825
Β·
verified Β·
1 Parent(s): 7873640

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +137 -3
README.md CHANGED
@@ -1,3 +1,137 @@
1
- ---
2
- license: bsd-3-clause
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: bsd-3-clause
4
+ base_model:
5
+ - OpenGVLab/InternVL2_5-1B
6
+ tags:
7
+ - InternVL2_5
8
+ - InternVL2_5-1B
9
+ - Int8
10
+ - VLM
11
+ ---
12
+
13
+ # InternVL2_5-1B-Int8
14
+
15
+ This version of InternVL2_5-1B has been converted to run on the Axera NPU using **w8a16** quantization.
16
+
17
+ This model has been optimized with the following LoRA:
18
+
19
+ Compatible with Pulsar2 version: 3.3
20
+
21
+ ## Convert tools links:
22
+
23
+ For those who are interested in model conversion, you can try to export axmodel through the original repo :
24
+ https://huggingface.co/OpenGVLab/InternVL2_5-1B
25
+
26
+ [Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html)
27
+
28
+ [AXera NPU HOST LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/internvl2)
29
+
30
+ [AXera NPU AXCL LLM Runtime](https://github.com/AXERA-TECH/ax-llm/tree/axcl-llm-internvl)
31
+
32
+ ## Support Platform
33
+
34
+ - AX650
35
+ - AX650N DEMO Board
36
+ - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html)
37
+ - [M.2 Accelerator card](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html)
38
+ - AX630C
39
+ - *developing*
40
+
41
+ |Chips|image encoder 448|ttft|w8a16|
42
+ |--|--|--|--|
43
+ |AX650| 350 ms | 420 ms |32 tokens/sec|
44
+
45
+ ## How to use
46
+
47
+ Download all files from this repository to the device
48
+
49
+ ```
50
+ root@ax650:/mnt/qtang/llm-test/temp/InternVL2_5-1B# tree -L 1
51
+ .
52
+ |-- config.json
53
+ |-- internvl2_5_1b_448_ax650
54
+ |-- internvl2_5_tokenizer
55
+ |-- internvl2_5_tokenizer_448.py
56
+ |-- main_internvl2_5_448_prefill
57
+ |-- run_internvl2_5_448_ax650.sh
58
+ `-- ssd_car.jpg
59
+ ```
60
+
61
+ #### Install transformer
62
+
63
+ ```
64
+ pip install transformers==4.41.1
65
+ ```
66
+
67
+ #### Start the Tokenizer service
68
+
69
+ ```
70
+ root@ax650:/mnt/qtang/llm-test/temp/InternVL2_5-1B# python3 internvl2_5_tokenizer_448.py --port 12345
71
+ None None 151645 <|im_end|>
72
+ [151644, 8948, 198, 56568, 104625, 100633, 104455, 104800, 101101, 32022, 102022, 99602, 100013, 9370, 90286, 21287,
73
+ 42140, 53772, 35243, 26288, 104949, 3837, 105205, 109641, 67916, 30698, 11, 54851, 46944, 115404, 42192, 99441, 100623,
74
+ 48692, 100168, 110498, 1773, 151645, 151644, 872, 198, 151665, 151667, 151667, 151667, 151667, 151667, 151667, 151667,
75
+ 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667,
76
+ ......
77
+ 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667, 151667,
78
+
79
+ 198, 5501, 7512, 279, 2168, 19620, 13, 151645, 151644, 77091, 198]
80
+ 310
81
+ [151644, 8948, 198, 56568, 104625, 100633, 104455, 104800, 101101, 32022, 102022, 99602, 100013, 9370, 90286, 21287,
82
+ 42140, 53772, 35243, 26288, 104949, 3837, 105205, 109641, 67916, 30698, 11, 54851, 46944, 115404, 42192, 99441, 100623,
83
+ 48692, 100168, 110498, 1773, 151645, 151644, 872, 198, 14990, 1879, 151645, 151644, 77091, 198]
84
+ 47
85
+ http://localhost:12345
86
+ ```
87
+
88
+ #### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board
89
+
90
+ - input text
91
+
92
+ ```
93
+ Describe the picture
94
+ ```
95
+
96
+ - input image
97
+
98
+ ![](./ssd_car.jpg)
99
+
100
+ Open another terminal and run `./run_internvl2_5_448_ax650.sh`
101
+
102
+ ```
103
+ root@ax650:/mnt/qtang/llm-test/temp/InternVL2_5-1B# ./run_internvl2_5_448_ax650.sh
104
+ [I][ Init][ 127]: LLM init start
105
+ bos_id: -1, eos_id: 151645
106
+ 3% | β–ˆβ–ˆ | 1 / 28 [0.01s<0.14s, 200.00 count/s] tokenizer init ok
107
+ [I][ Init][ 26]: LLaMaEmbedSelector use mmap
108
+ 100% | β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 28 / 28 [1.42s<1.42s, 19.66 count/s] init vpm axmodel ok,remain_cmm(2859 MB)B)
109
+ [I][ Init][ 275]: max_token_len : 1023
110
+ [I][ Init][ 280]: kv_cache_size : 128, kv_cache_num: 1023
111
+ [I][ Init][ 288]: prefill_token_num : 320
112
+ [I][ Init][ 290]: vpm_height : 448,vpm_width : 448
113
+ [I][ Init][ 299]: LLM init ok
114
+ Type "q" to exit, Ctrl+c to stop current running
115
+ prompt >> Describe the picture
116
+ image >> ssd_car.jpg
117
+ [I][ Encode][ 358]: image encode time : 362.987000 ms, size : 229376
118
+ [I][ Run][ 569]: ttft: 426.75 ms
119
+
120
+ The image depicts a scene on a city street with a prominent red double-decker bus in the background.
121
+ The bus is adorned with an advertisement that reads, "THINGS GET MORE EXCITING WHEN YOU SAY YES."
122
+ The bus is traveling on a road with a white bicycle lane marked on it. The street is lined with buildings,
123
+ and there is a black car parked on the side of the road. A woman is standing in the foreground, smiling at the camera.
124
+ She is wearing a black jacket and a scarf. The overall atmosphere suggests a typical urban setting,
125
+ possibly in a city known for its iconic double-decker buses.
126
+
127
+ [N][ Run][ 708]: hit eos,avg 31.90 token/s
128
+
129
+ prompt >> q
130
+ root@ax650:/mnt/qtang/llm-test/temp/InternVL2_5-1B#
131
+ ```
132
+
133
+ #### Inference with M.2 Accelerator card
134
+
135
+ [What is M.2 Accelerator card?](https://axcl-docs.readthedocs.io/zh-cn/latest/doc_guide_hardware.html), Show this DEMO based on Raspberry PI 5.
136
+
137
+ *TODO*